intensive course in econometrics slides...intensive course in econometrics — section 1.2 — ur...
TRANSCRIPT
Intensive Course in Econometrics
Slides
Rolf Tschernig & Harry Haupt
University of Regensburg University of Bielefeld
March 20091
1We are greatly indebted to Kathrin Kagerer, Joachim Schnurbus, and Roland Weigand who helped us enormously to improve and correct this course
material. Of course, the usual disclaimer applies.
c© These slides may be printed and reproduced for individual or instructional use, but may not be printed for commercial purposes.
i
Contents
1 Introduction: What is Econometrics? 4
1.1 A Trade Example: What Determines Trade Flows? . . . 4
1.2 Economic Models and the Need for Econometrics . . . 13
1.3 Causality and Experiments . . . . . . . . . . . . . . . 21
1.4 Types of Economic Data . . . . . . . . . . . . . . . . 24
2 The Simple Regression Model 29
2.1 The Population Regression Model . . . . . . . . . . . 30
ii
2.2 The Sample Regression Model . . . . . . . . . . . . . 44
2.3 The OLS Estimator . . . . . . . . . . . . . . . . . . . 47
2.4 Best Linear Prediction, Correlation, and Causality . . . 63
2.5 Algebraic Properties of the OLS Estimator . . . . . . . 69
2.6 Parameter Interpretation and Functional Form . . . . . 73
2.7 Statistical Properties: Expected Value and Variance . . 84
2.8 Estimation of the Error Variance . . . . . . . . . . . . 91
3 Multiple Regression Analysis: Estimation 94
3.1 Motivation: The Trade Example Continued . . . . . . . 94
3.2 The Multiple Regression Model of the Population . . . 98
3.3 The OLS Estimator: Derivation and Algebraic Properties 111
3.4 The OLS Estimator: Statistical Properties . . . . . . . 123
3.5 Model Specification I: Model Selection Criteria . . . . . 153
iii
4 Multiple Regression Analysis: Hypothesis Testing 163
4.1 Basics of Statistical Tests . . . . . . . . . . . . . . . . 163
4.2 Probability Distribution of the OLS Estimator . . . . . 193
4.3 The t Test in the Multiple Regression Model . . . . . . 200
4.4 Empirical Analysis of a Simplified Gravity Equation . . . 208
4.5 Confidence Intervals . . . . . . . . . . . . . . . . . . 217
4.6 Testing a Single Linear Combination of Parameters . . 227
4.7 The F Test . . . . . . . . . . . . . . . . . . . . . . . 233
4.8 Reporting Regression Results . . . . . . . . . . . . . . 259
5 Multiple Regression Analysis: Asymptotics 262
5.1 Large Sample Distribution of the Mean Estimator . . . 262
5.2 Large Sample Inference for the OLS Estimator . . . . . 277
iv
6 Multiple Regression Analysis: Interpretation 282
6.1 Level and Log Models . . . . . . . . . . . . . . . . . 282
6.2 Data Scaling . . . . . . . . . . . . . . . . . . . . . . 283
6.3 Dealing with Nonlinear or Transformed Regressors . . . 290
6.4 Regressors with Qualitative Data . . . . . . . . . . . . 301
7 Multiple Regression Analysis: Prediction 317
7.1 Prediction and Prediction Error . . . . . . . . . . . . . 317
7.2 Statistical Properties of Linear Predictions . . . . . . . 324
8 Multiple Regression Analysis: Heteroskedasticity 325
8.1 Consequences of Heteroskedasticity for OLS . . . . . . 328
8.2 Heteroskedasticity-Robust Inference after OLS . . . . . 330
8.3 The General Least Squares (GLS) Estimator . . . . . . 333
8.4 Feasible Generalized Least Squares (FGLS) . . . . . . . 341
v
9 Multiple Regression Analysis: Model Diagnostics 363
9.1 The RESET Test . . . . . . . . . . . . . . . . . . . . 363
9.2 Heteroskedasticity Tests . . . . . . . . . . . . . . . . 366
9.3 Model Specification II: Useful Tests . . . . . . . . . . 386
10 Appendix I
10.1 A Condensed Introduction to Probability . . . . . . . . I
10.2 Important Rules of Matrix Algebra . . . . . . . . . . . XXI
10.3 Rules for Matrix Differentiation . . . . . . . . . . . . . XXVIII
10.4 Data for Estimating Gravity Equations . . . . . . . . . XXX
vi
Organisation
Contact
Prof. Dr. Rolf Tschernig
Building RW(L), 5th floor, room 514
Universitatsstr. 31, 93040 Regensburg, Germany
Tel. (+49) 941/943 2737, Fax (+49) 941/943 4917
Email: [email protected]
http://www.wiwi.uni-regensburg.de/tschernig/
1
Intensive Course in Econometrics — Organisation — UR March 2009 — R. Tschernig
Schedule and Location
Date of Course: February 16 to February 27, 2009
Osteuropa-Institut and University of Regensburg
Lectures: every morning 8.30 - 10.00 W 113 Tschernig
every morning 10.30 - 12.00 W 113 Tschernig
Exercises: every afternoon 14.00 - 17.00 W 113 or PC-room Kagerer, Schnurbus, Weigand
Exam
no exam in this course
2
Intensive Course in Econometrics — Organisation — UR March 2009 — R. Tschernig
Required Text
Wooldridge, J.M. (2009). Introductory Econometrics. A Modern
Approach, 4th ed., Thomson South-Western.
Additional Reading
Stock, J.H. and Watson, M.W. (2007). Introduction to Economet-
rics, 2nd ed., Pearson, Addison-Wesley.
plus what will be announced during the course.
3
1 Introduction: What is Econometrics?
1.1 A Trade Example: What Determines Trade Flows?
Goal/Research Question: Identify the factors that influence imports
to Kazakhstan and quantify their impact.
4
Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig
• Three basic questions that have to be answered during the anal-
ysis:
1. Which (economic) relationships could be / are “known” to be
relevant for this question?
2. Which data can be useful for checking the possibly relevant eco-
nomic conjectures/theories?
3. How to decide about which economic conjecture to reject or to
follow?
• Let’s have a first look at some data of interest: the imports (in
current US dollars) to Kazakhstan from 55 originating countries in
2004.
5
Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig
Imports to Kazakhstan in 2004 in current US dollars
0
200000000
400000000
600000000
800000000
1000000000
1200000000
1400000000
ALB
AUT
BEL
BIH
CAN
CHN
CZE
ESP
FIN
GBR
GER
HKG
HUN
ISL
JPN
KGZ
LTU
MDA
MLT
NOR
PRT
RUS
SVN
THA
TKM
TWN
USA
YUG
The original data are from
UN Commodity Trade Statistics Database (UN COMTRADE)
6
Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig
• See section 10.4 in the Appendix for detailed data descriptions.
Data are provided in the EViews file Kazakhstan imports 2004.wf1.
We thank Richard Frensch, Osteuropa-Institut, Regensburg, Germany, who pro-
vided all data throughout this course for analyzing trade flows.
• A first attempt to answer the three basic questions:
1. Ignore for the moment all existing economic theory and simply
hypothesize that observed imports depend somehow on the GDP
of the exporting country.
2. Collect GDP data for the countries of origin, e.g. from the
International Monetary Fund (IMF) – World Economic Outlook Database
3. Plot the data, e.g. by using a scatter plot.
Can you decide whether there is a relationship between trade
flows from and the GDP of exporting countries?
7
Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig
A scatter plot
0.00E+00
2.00E+08
4.00E+08
6.00E+08
8.00E+08
1.00E+09
1.20E+09
0 2000 4000 6000 8000 10000 12000
WEO_GDPCR_O
TR
AD
E_
0_
D_
O
Some questions:
• What do you see?
• Is there a relationship?
• If so, how to quantify it?
• Is there a causal relationship
- what determines what?
• By how much do the im-
ports from Germany change
if the GDP in Germany
changes by 1%?
• Are there other relevant
factors determining imports,
e.g. distance?
• Is it possible to forecast fu-
ture trade flows?8
Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig
• What have you done?
– You tried to simplify reality
– by building some kind of (economic) model.
• An (economic) model
– has to reduce the complexity of reality such that it is useful for
answering the question of interest;
– is a collection of cleverly chosen assumptions from which implica-
tions can be inferred (using logic) — Example: Heckscher-Ohlin
model;
– should be as simple as possible and as complex as necessary;
– cannot be refuted or “validated” without empirical data of some
kind.
9
Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig
• Let us consider a simple formal model for the relationship between
imports and GDP of the originating countries
importsi = β0 + β1gdpi, i = 1, . . . , 55.
– Does this make sense?
– How to determine the values of the so called parameters β0 and
β1?
– Fit a straight line through the cloud!
0.00E+00
2.00E+08
4.00E+08
6.00E+08
8.00E+08
1.00E+09
1.20E+09
0 2000 4000 6000 8000 10000 12000
WEO_GDPCR_O
TR
AD
E_
0_
D_
O
10
Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig
0.00E+00
2.00E+08
4.00E+08
6.00E+08
8.00E+08
1.00E+09
1.20E+09
0 2000 4000 6000 8000 10000 12000
W E O _ G D P C R _ O
TR
AD
E_
0_
D_
OTRADE_0_D_O vs. WEO_GDPCR_O
More questions:
– How to fit a line through
the cloud of points?
– Which properties does the
fitted line have?
– What to do with other
relevant factors that are
currently neglected in the
analysis?
– Which criteria to choose
for identifying a potential
relationship?
11
Intensive Course in Econometrics — Section 1.1 — UR March 2009 — R. Tschernig
0
200,000,000
400,000,000
600,000,000
800,000,000
1,000,000,000
1,200,000,000
0 4,000 8,000 12,000
WEO_GDPCR_O
TRADE_0_D_OTRADE_0_D_F_LEVTRADE_0_D_F_LEV2
Further questions:
– Is the potential relation-
ship really linear? Com-
pare it to the green points
of a nonlinear relationship.
– And: how much may re-
sults change with a differ-
ent sample, e.g. for 2003?
12
Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig
1.2 Economic Models and the Need for Econometrics
• Standard problems of economic models:
– The conjectured economic model is likely to neglect some factors.
– Numeric results to the numerical questions posed depend in gen-
eral on the choice of a data set. A different data set leads to
different numerical results.
=⇒ Numeric answers always have some uncertainty.
13
Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig
• Econometrics
– offers solutions for dealing with unobserved factors in economic
models,
– provides “both a numerical answer to the question and a measure
how precise the answer is (Stock & Watson 2007, p. 7)”,
– as will be seen later, provides tools that allow to refute economic
hypotheses using statistical techniques by confronting theory with
data and to quantify the probability of such decisions to be wrong,
– as will be seen later as well, allows to quantify risks of forecasts,
decisions, and even of its own analysis.
14
Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig
• Therefore:
Econometrics can also be useful for providing answers to questions
like:
– How reliable are predicted growth rates or returns?
– How likely is it that the value realizing in the future will be close
to the predicted value? In other words, how precise are the pre-
dictions?
• Main tool: Multiple regression model
It allows to quantify the effect of a change in one variable on another
variable, holding other things constant (ceteris paribus analysis).
15
Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig
• Steps of an econometric analysis:
1. Careful formulation of question/problem/task of interest.
2. Specification of an economic model.
3. Careful selection of a class of econometric models.
4. Collecting data.
5. Selection and estimation of an econometric model.
6. Diagnostics of correct model specification.
7. Usage of the model.
Note that there exists a large variety of econometric models and
model choice depends very much on the research question, the un-
derlying economic theory, availability of data, and the structure of
the problem.
16
Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig
• Goals of this course:
providing you with basic econometric tools such that you can
– successfully carry out simple empirical econometric analyzes and
provide quantitative answers to quantitative questions,
– recognize ill-conducted econometric studies and their consequences,
– recognize when to ask for help of an expert econometrician,
– attend courses for advanced econometrics / empirical economics,
– study more advanced econometric techniques.
17
Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig
Some Definitions of Econometrics
– “... discover empirical relation between economic variables, pro-
vide forecast of various economic quantities of interest ... (First
issue of volume 1, Econometrica, 1933).”
– “The science of model building consists of a set of quantitative
tools which are used to construct and then test mathematical rep-
resentations of the real world. The development and use of these
tools are subsumed under the subject heading of econometrics
(Pindyck & Rubinfeld 1998).”
18
Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig
– “At a broad level, econometrics is the science and art of using eco-
nomic theory and statistical techniques to analyze economic data.
Econometric methods are used in many branches of economics,
including finance, labor economics, macroeconomics, microeco-
nomics, marketing, and economic policy. Econometric methods
are also commonly used in other social sciences, including political
science and sociology (Stock & Watson 2007, p. 3).”
So, some may also say: “Alchemy or Science?”, “Economic-
tricks”, “Econo-mystiques”.
– “Econometrics is based upon the development of statistical meth-
ods for estimating economic relationships, testing economic the-
ories, and evaluating and implementing government and business
policy (Wooldridge 2009, p. 1).”
19
Intensive Course in Econometrics — Section 1.2 — UR March 2009 — R. Tschernig
• Summary of tasks for econometric methods
– In brief: econometrics can be useful whenever you en-
counter (economic) data and you want to make sense
out of them.
– In detail:
∗ Providing a formal framework for falsifying postulated
economic relationships by confronting economic theory with
economic data using statistical methods: Economic hypotheses
are formulated and statistically tested on basis of adequately
(and repeatedly) collected data such that test results may fal-
sify the postulated hypotheses.
∗ Analyzing the effects of policy measures.
∗ Forecasting.
20
Intensive Course in Econometrics — Section 1.3 — UR March 2009 — R. Tschernig
1.3 Causality and Experiments
• Common understanding: “causality means that a specific action”
(touching a hot stove) “leads to a specific, measurable consequence”
(get burned) (Stock & Watson 2007, p. 8).
• How to identify causality? Observe repeatedly an action and its
consequence!
• Thus, in science one aims at repeating an action and its conse-
quences under identical conditions. How to generate repetitions
of actions?
21
Intensive Course in Econometrics — Section 1.3 — UR March 2009 — R. Tschernig
• Randomized controlled experiments:
– there is a control group that receives no treatment (e.g. fertil-
izer) and a treatment group that receives treatment, and
– where treatment is assigned randomly in order to eliminate
any possible systematic relationship between the treatment and
other possible influences.
• Causal effect:
A “causal effect is defined to be an effect on an outcome of a given
action or treatment, as measured in an ideal randomized controlled
experiment (Stock & Watson 2007, p. 9).”
• In economics randomized controlled experiments are very often dif-
ficult or impossible to conduct. Then a randomized controlled ex-
periment provides a theoretical benchmark and econometric analysis
22
Intensive Course in Econometrics — Section 1.3 — UR March 2009 — R. Tschernig
aims at mimicking as closely as possible the conditions of a random-
ized controlled experiment using actual data.
• Note that for forecasting knowledge of causal effects is not nec-
essary.
• Warning: in general multiple regression models do not allow con-
clusions about causality!
23
Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig
1.4 Types of Economic Data
1. Cross-Sectional Data
• are collected across several units at a single point or period of
time.
• Units: “economic agents”, e.g. individuals, households, investors,
firms, economic sectors, cities, countries.
• In general: the order of observations has no meaning.
• Popular to use index i.
• Optimal: the data are a random sample of the underlying popu-
lation, see Section 2.1 for details.
• Cross-Sectional data allow to explain differences between individ-
ual units.
24
Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig
• Example: sample of countries that export to Kazakhstan in 2004
of Section 1.1.
2. Time Series Data
• are sampled across differing points/periods of time.
• Popular to use index t.
• Sampling frequency is important:
– variable versus fixed;
– fixed: annually, quarterly, monthly, weekly, daily, intradaily;
– variable: ticker data, duration data (e.g. unemployment spells).
• Time series data allow the analysis of dynamic effects.
• Univariate versus multivariate time series data.
25
Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig
• Example: Trade flow from Germany to Kazakhstan and GDP in
Germany (in current US dollars), 1990 - 2007, T = 18.
0.0E+00
1.0E+08
2.0E+08
3.0E+08
4.0E+08
5.0E+08
6.0E+08
7.0E+08
8.0E+08
1990 1992 1994 1996 1998 2000 2002 2004 2006
TRADE_0_D_O
1.60E+12
1.80E+12
2.00E+12
2.20E+12
2.40E+12
2.60E+12
2.80E+12
3.00E+12
1990 1992 1994 1996 1998 2000 2002 2004 2006
WDI_GDPUSDCR_O
3. Panel data
• are a collection of cross-sectional data for at least two different
points/periods of time.
• Individual units remain identical in each cross-sectional sample
(except if units vanish).
26
Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig
• Use of double index: it where i = 1, . . . , N and t = 1, . . . , T .
• Typical problem: missing values - for some units and periods there
are no data.
• Example: growth rate of imports from 55 different countries to
Kazakhstan from 1991 to 2008 where all 55 countries were chosen
for the sample 1990 and kept fixed for all subsequent years
(T = 18, N = 55).
4. Pooled Cross Sections
• also a collection of cross-sectional data, however, allowing for
changing units across time.
• Example: in 1995 countries of origin are Germany, France, Russia
and in 1996 countries of origin are Poland, US, Italy.
27
Intensive Course in Econometrics — Section 1.4 — UR March 2009 — R. Tschernig
In this course: focus on the analysis of cross-sectional data and
specific types of time series data:
• simple regression model → Chapter 2,
• multiple regression model → Chapters 3 to 9.
• Time series analysis requires advanced econometric techniques that
are beyond the scope of this course (given the time constraints).
Recall the arithmetic quality of data:
• quantitative variables,
• qualitative or categorical variables.
Reading: Sections 1.1-1.3 in Wooldridge (2009).
28
2 The Simple Regression Model
Distinguish between the
• population regression model and the
• sample regression model.
29
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
2.1 The Population Regression Model
• In general:
y and x are two variables that describe properties of the population
under consideration for which one wants “to explain y in terms of
x” or “to study how y varies with changes in x” or “to predict y
for given values of x”.
Example: By how much changes the hourly wage for an additional
year of schooling keeping all other influences fixed?
• If we knew everything, then the relationship between y and x
may formally be expressed as
y = m(x, z1, . . . , zs) (2.1)
where z1, . . . , zs denote s additional variables that in addition to
years of schooling x influence the hourly wage y.
30
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• For practical application it is possible
– that relationship (2.1) is too complicated to be useful,
– that there does not exist an exact relationship, or
– that there exists an exact relationship for which, however, not all
s influential variables z1, . . . , zs can be observed, or
– one has no idea about the structure of the function m(·).• Our solution:
– build a useful model, cf. Section 1.1,
– which focuses on a relationship that holds on “average”. What
do we mean by “average”?
31
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• Crucial building blocks for our model:
– Consider the variable y as random. You may think of y
denoting the value of the variable of a random choice out of all
units in the population. Furthermore, in case of discrete values of
the random variable y, a probability is assigned to each value
of y. (If the random variable y is continuous, a density value is
assigned.)
In other words: apply probability theory. See Appendices B
and C in Wooldridge (2009).
Examples:
∗ The population consists of all apartments in Almaty. The vari-
able y denotes the rent of a single apartment randomly chosen
from all apartments in Almaty.
32
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
∗ The population consists of all possible values of imports to
Kazakhstan from a specific country and period.
∗ For a dice the population consists of all numbers that are writ-
ten on each side although in this case statisticians prefer to
talk about a sample space.
– In terms of probability theory the “average” of a variable y is
given by the expected value of this variable. In case of discrete
y one has
E[y] =∑
j ∈ all different yj in population
yjProb(y = yj
)
– Sometimes one may only look at a subset of the population,
namely all y that have the same value for another variable x.
33
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
Example: one only considers the rents of all apartments in Al-
maty of size x = 75 m2.
– If the “average” is conditioned on specific values of another vari-
able x, then one considers the conditional expected value of
y for a given x: E[y|x]. For discrete random variables y one has
E[y|x] =∑
j ∈ all different yj in population
yjProb(y = yj|x
)
(See Appendix 10.1 for a brief introduction to probability theory
and corresponding definitions for continuous random variables.)
Example continued: the conditional expectation E[y|x = 75]
corresponds to the average rent of all apartments in Almaty of
size x = 75 m2.
34
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
– Note that the variable x can be random, too. Then, the condi-
tional expectation E[y|x] is a function of the (random) variable
x
E[y|x] = g(x)
and therefore a random variable itself.
– From the identity
y = E[y|x] + (y − E[y|x]) (2.2)
one defines the error term or disturbance term as
u ≡ y − E[y|x]
so that one obtains a simple regression model of the pop-
ulation
y = E[y|x] + u (2.3)
35
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• Interpretation:
– The random variable y varies randomly around the conditional
expectation E[y|x]:
y = E[y|x] + u.
– The conditional expectation E[y|x] is called the systematic
part of the regression.
– The error term u is called the unsystematic part of the regres-
sion.
• So instead of trying the impossible, namely specifying m(x, . . .)
given by (2.1), one focuses the analysis on the “average” E[y|x].
36
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• How to determine the conditional expectation?
– This step requires assumptions!
– To keep things simple we make Assumption (A) given by
E[y|x] = β0 + β1x. (2.4)
– Discussion of Assumption (A):
∗ It restricts the flexibility of g(x) = E[y|x] such that g(x) =
β0 + β1x has to be linear in x. So if E[y|x] = δ0 + δ1 log x,
Assumption A is wrong.
∗ It can be fulfilled if there are other variables influencing y lin-
early. For example, consider
E[y|x, z] = δ0 + δ1x + δ2z.
37
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
Then, by the law of iterated expectations one obtains
E[y|x] = δ0 + δ1x + δ2E[z|x]
If E[z|x] is linear in x, one obtains
E[y|x] = δ0 + δ1x + δ2(α0 + α1x)
= γ0 + γ1x (2.5)
with γ0 = δ0 + δ2α0 und γ1 = δ1 + δ2α1. Note, however,
that in this case E[y|x, z] 6= E[y|x] in general. Then model
choice depends on the goal of the analysis: the smaller model
can sometimes be preferable for prediction, the larger model is
needed if controlling for z is important ⇔ controlled random
experiments, see Section 1.3.
∗ In general, Assumption (A) is violated if (2.5) does not hold
e.g. if E[z|x] is nonlinear in x. Then the linear population
model is called misspecified. More on that in Section 3.4.
38
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• Properties of the error term u: From Assumption (A)
1. E[u|x] = 0,
2. E[u] = 0,
3. Cov(x, u) = 0.
39
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• An alternative set of assumptions:
The above result E[u|x] = 0 together with the identity (2.3) al-
lows to rewrite Assumption (A) in terms of the following two
assumptions:
1. Assumption SLR.1
(Linearity in the Parameters)
y = β0 + β1x + u, (2.6)
2. Assumption SLR.4
(Zero Conditional Mean)
E[u|x] = 0.
40
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• Linear Population Regression Model:
The simple linear population regression model is given by equation
(2.6)
y = β0 + β1x + u
and obtained by specifying the conditional expectation in the regres-
sion model (2.3) by a linear function (linear in the parameters).
The parameters β0 and β1 are called the intercept parameter
and slope parameter, respectively.
41
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• Some terminology for regressions
y x
Dependent variable Independent variable
Explained variable Explanatory variable
Response variable Control variable
Predicted variable Predictor variable
Regressand Regressor
Covariate
42
Intensive Course in Econometrics — Section 2.1 — UR March 2009 — R. Tschernig
• A simple example: a game of dice
Let the random numbers x and u denote the throws of two fair
dices with x, u = −2.5,−1.5,−0.5, 0.5, 1.5, 2.5. Based on both
throws the random number y denotes the following sum
y = 2︸︷︷︸β0
+ 3︸︷︷︸β1
x + u.
This completely describes the population regression model.
– Derive the systematic relationship between y and x holding x
fixed.
– Interpret the systematic relationship.
– How can you obtain the values of the parameters β0 = 2 and
β1 = 3 if those values are unknown?
Next section: How can you determine/estimate β0 and β1?
43
Intensive Course in Econometrics — Section 2.2 — UR March 2009 — R. Tschernig
2.2 The Sample Regression Model
Estimators and Estimates
• In practice one has to estimate the unknown parameters β0 and β1
of the population regression model using a sample of observations.
• The sample has to be representative and has to be collected/drawn
from the population.
• A sample of the random numbers x and y of size n is given by
(xi, yi) : i = 1, . . . , n.• Now we require an estimator that allows us — given the sample
observations (xi, yi) : i = 1, . . . , n — to compute estimates for
the unknown parameters β0 and β1 of the population.
44
Intensive Course in Econometrics — Section 2.2 — UR March 2009 — R. Tschernig
• Note:
– If we want to construct an estimator for the unknown parameters,
we have not yet observed a sample. An estimator is a function
that contains the sample values as arguments.
– Once we have an estimator and observe a sample, we can compute
estimates (=numerical values) for the unknown quantities.
• For estimating the unknown parameters there exist many different
estimators that differ with respect to their statistical properties (sta-
tistical quality)!
Example: Two different estimators for estimating the mean: 1n
∑ni=1 yi
and 12 (y1 + yn).
45
Intensive Course in Econometrics — Section 2.2 — UR March 2009 — R. Tschernig
• If you denote estimators of the parameters β0 and β1 in the popu-
lation regression model
y = β0 + β1x + u
by β0 and β1, then the sample regression model is given by
yi = β0 + β1xi + ui, i = 1, . . . , n.
It consists of
– the sample regression function or regression line
yi = β0 + β1xi,
– the fitted values yi, and
– the residuals ui = yi − yi, i = 1, . . . , n.
With which method can we estimate?
46
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
2.3 The Ordinary Least Squares Estimator (OLS) Estimator
• The ordinary least squares estimator is frequently abbreviated as
OLS estimator. The OLS estimator goes back to C.F. Gauss (1777-
1855).
• It is derived by choosing the values β0 and β1 such that the sum
of squared residuals (SSR)n∑
i=1
u2i =
n∑
i=1
(yi − β0 − β1xi
)2
is minimized.
47
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
• One computes the first partial derivatives with respect to β0 and β1
and sets them equal to zero:n∑
i=1
(yi − β0 − β1xi
)= 0, (2.7)
n∑
i=1
xi
(yi − β0 − β1xi
)= 0. (2.8)
The equations (2.7) and (2.8) are called normal equations.
48
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
From (2.7) one obtains
β0 = n−1n∑
i=1
yi − β1n−1
n∑
i=1
xi
β0 = y − β1x, (2.9)
where z = n−1∑ni=1 zi denotes the estimated mean of zi, i =
1, . . . , n.
Inserting (2.9) into the normal equation (2.8) deliversn∑
i=1
xi
(yi −
(y − β1x
)− β1xi
)= 0.
Moving terms leads ton∑
i=1
xi(yi − y) = β1
n∑
i=1
xi(xi − x).
49
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
Note thatn∑
i=1
xi(yi − y) =
n∑
i=1
(xi − x)(yi − y),
n∑
i=1
xi(xi − x) =
n∑
i=1
(xi − x)2,
such that:
β1 =
∑ni=1(xi − x)(yi − y)∑n
i=1(xi − x)2. (2.10)
50
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
• Terminology:
– The sample functions (2.9) and (2.10)
β0 = y − β1x,
β1 =
∑ni=1(xi − x)(yi − y)∑n
i=1(xi − x)2
are called the ordinary least squares (OLS) estimators for
β0 and β1.
– For a given sample, the quantities β0 and β1 are called the OLS
estimates for β0 and β1.
51
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
– The OLS sample regression function or OLS regression
line for the simple regression model is given by
yi = β0 + β1xi (2.11)
with residuals ui = yi − yi.
– The OLS sample regression model is denoted by
yi = β0 + β1xi + ui (2.12)
52
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
Note:
– The OLS estimator β1 only exists if the sample observations xi,
i = 1, . . . , n exhibit variation.
Assumption SLR.3
(Sample Variation in the Explanatory Variable):
In the sample the outcomes of the independent variable xi, i =
1, 2, . . . , n are not all the same.
– The derivation of the OLS estimator only requires assumption
SLR.3 but not the population Assumptions SLR.1 and SLR.4.
– In order to investigate the statistical properties of the OLS esti-
mator one needs further assumptions, see Sections 2.7, 3.4, 4.2.
– One also can derive the OLS estimator from the assumptions
about the population, see below.
53
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
• The OLS estimator as a Moment Estimator:
– Note that from Assumption SLR.4 E[u|x] = 0 one obtains two
conditions on moments: E[u] = 0 and Cov(x, u) = 0. Inserting
Assumption SLR.1 u = y − β0 − β1x defines moment condi-
tions for the model parameters
E(y − β0 − β1x) = 0 (2.13)
E[x(y − β0 − β1x)] = 0 (2.14)
– How to estimate the moment conditions using sample functions?
– Assumption SLR.2 (Random Sampling):
The sample of size n is obtained by random sampling that is, the
pairs (xi, yi) and (xj, yj), i 6= j, i, j = 1, . . . , n, are pairwise
identically and independently distributed following the population
model.
54
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
– An important result in statistics, see Section 5.1, says:
If Assumption SLR.2 holds, then the expected value can well be
estimated by the sample average. (Assumption SLR.2 can be
weakened, see e.g. Chapter 11 in Wooldridge (2009).)
– If one replaces the expected values in (2.13) and (2.14) by their
sample averages, one obtains
n−1n∑
i=1
(yi − β0 − β1xi
)= 0, (2.15)
n−1n∑
i=1
xi
(yi − β0 − β1xi
)= 0. (2.16)
By multiplying (2.15) (2.16) by n one obtains the normal equa-
tions (2.7) and (2.8).
55
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
The Trade Example Continued
Question:
Do imports to Kazakhstan increase if the exporting country experiences
an increase in GDP?
Scatter plot (from Section 1.1)
0.00E+00
2.00E+08
4.00E+08
6.00E+08
8.00E+08
1.00E+09
1.20E+09
0 2000 4000 6000 8000 10000 12000
WEO_GDPCR_O
TR
AD
E_
0_
D_
O
56
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
The OLS regression line is given by
importsi = 53441198 + 2.16 · 10−05gdpi, i = 1, . . . , 55,
and the sample regression model by
importsi = 53441198 + 2.16 · 10−05gdpi + ui, i = 1, . . . , 55.
0.00E+00
2.00E+08
4.00E+08
6.00E+08
8.00E+08
1.00E+09
1.20E+09
0 2000 4000 6000 8000 10000 12000
W E O _ G D P C R _ O
TR
AD
E_
0_
D_
OTRADE_0_D_O vs. WEO_GDPCR_O
57
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
====================================================================
Dependent Variable: TRADE_0_D_O
Method: Least Squares
Date: 02/09/09 Time: 17:12
Sample: 1 55
Included observations: 52
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C 53441198 25852541 2.067155 0.0439
WDI_GDPUSDCR_O 2.16E-05 1.37E-05 1.572746 0.1221
====================================================================
R-squared 0.047139 Mean dependent var 67801339
Adjusted R-squared 0.028081 S.D. dependent var 1.77E+08
S.E. of regression 1.74E+08 Akaike info criterion 40.82943
Sum squared resid 1.52E+18 Schwarz criterion 40.90448
Log likelihood -1059.565 F-statistic 2.473530
Durbin-Watson stat 2.297310 Prob(F-statistic) 0.122085
====================================================================
• For a data description see Appendix 10.4:
importsi (from country i) TRADE 0 D O
gdpi (in exporting country i) WDI GDPUSDCR O
58
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
• Potential interpretation of estimated slope parameter:
β1 =∆imports
∆gdp
indicates by how many US dollars imports in Kazakhstan increase if
GDP in an exporting country increases by 1 US dollar.
• Does this interpretation really make sense? Aren’t there other im-
portant influencing factors missing? What about using economic
theory as well?
• What about the quality of the estimates?
59
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
Example: Wage Regression
Question:
How does education influence the hourly wage of an employee?
• Data (Source: Example 2.4 in Wooldridge (2009)): Sample of U.S.
employees with n = 526 observations. Available data are:
– wage per hour in dollars and
– educ years of schooling of each employee.
• The OLS regression line is given by
ˆwagei = −0.90 + 0.54 educi, i = 1, . . . , 526.
The sample regression model is
wagei = −0.90 + 0.54 educ + ui, i = 1, . . . , 526
60
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
====================================================================
Dependent Variable: WAGE Method: Least Squares Sample: 1 526
Included observations: 526
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C -0.904852 0.684968 -1.321013 0.1871
EDUC 0.541359 0.053248 10.16675 0.0000
====================================================================
R-squared 0.164758 Mean dependent var 5.896103
Adjusted R-squared 0.163164 S.D. dependent var 3.693086
S.E. of regression 3.378390 Akaike info criterion 5.276470
Sum squared resid 5980.682 Schwarz criterion 5.292688
Log likelihood -1385.712 F-statistic 103.3627
Durbin-Watson stat 1.823686 Prob(F-statistic) 0.000000
====================================================================
• Interpretation of the estimated slope parameter:
β1 =∆wage
∆educindicates by how much the hourly wage changes if the years of
schooling increases by one year:
– An additional year in school or university increases the hourly
61
Intensive Course in Econometrics — Section 2.3 — UR March 2009 — R. Tschernig
wage by 54 cent.
– But: Somebody without any education earns an hourly wage of
-90 cent? Does this interpretation make sense?
• Is it always sensible to interpret the slope coefficient? Watch out
spurious causality, see next section.
• Are these estimates reliable or good in some sense? What do we
mean by “good” in econometrics and statistics? To get more insight
study
– the statistical properties of the OLS estimator and the OLS esti-
mates, see Section 2.7 and
– check the choice of the functional form for the conditional expec-
tation E[y|x], see Section 2.6.
62
Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig
2.4 Best Linear Prediction, Correlation, and Causality
Best Linear Prediction
• What does the OLS estimator estimate if Assumptions SLR.1 and
SLR. 4 (alias Assumption (A)) are not valid in the population
from which the sample is drawn?
• Note that SSR(γ0, γ1)/n =∑n
i=1 (yi − γ0 − γ1xi)2 /n is a sample
average and thus estimates the expected value
E[(y − γ0 − γ1x)2
](2.17)
if Assumption SLR.2 (or some weaker form) holds. (For existence
of (2.17) it is required that 0 < V ar(x) < ∞ and V ar(y) < ∞.)
Equation (2.17) is called the mean squared error of a linear
predictor
γ0 + γ1x.
63
Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig
• Mimicking minimizing SSR(γ0, γ1), the theoretically best fit of a
linear predictor γ0 + γ1x to y is obtained by minimizing its mean
squared error (2.17) with respect to γ0 and γ1. This leads (try to
derive it) to
γ∗0 = E[y] − γ∗1E[x], (2.18)
γ∗1 =Cov(x, y)
V ar(x)= Corr(x, y)
√V ar(y)
V ar(x)(2.19)
with
Corr(x, y) =Cov(x, y)√
V ar(x)V ar(y), −1 ≤ Corr(x, y) ≤ 1
denoting the correlation that measures the linear dependence be-
tween two variables in a population, here x and y.
The expression
γ∗0 + γ∗1x (2.20)
64
Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig
is called the best linear predictor of y where “best” is defined by
minimal mean squared error.
• Now observe that for the simple regression model
y = γ∗0 + γ∗1x + ε
one has Cov(x, ε) = 0, a weaker form of SLR.4, since
Cov(x, y) =Cov(x, y)
V ar(x)V ar(x) + Cov(x, ε).
This indicates that one can show that under Assumption SLR.2
and SLR.3 the OLS estimator estimates the parameters γ∗0and γ∗1 of the best linear predictor. Observe also that the
OLS estimator (2.10) for the slope coefficient consists of the sample
averages of the moments defining γ∗1
γ1 =
∑ni=1(xi − x)(yi − y)∑n
i=1(xi − x)2
65
Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig
• Rewriting γ1 as
β1 = Corr(x, y)
√∑ni=1(yi − y)2∑ni=1(xi − x)2
using the empirical correlation coefficient
Corr(x, y) =
∑ni=1(xi − x)(yi − y)√∑n
i=1(xi − x)2∑n
i=1(yi − y)2
shows that the estimated slope coefficient is non-zero if there is
sample correlation between x and y.
66
Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig
Causality
• Recall Section 1.3.
• Be aware that the slope coefficient of the best linear pre-
dictor γ∗1 and its OLS estimate γ1 cannot be automatically
interpreted in terms of a causal relationship since estimating
the best linear predictor
– only captures correlation but not direction,
– may not estimate the model of interest, e.g. if Assumptions SLR.1
and SLR.4 are violated and β1 6= γ∗1 ,
– may produce garbage if Corr(x, y) estimates spurious correlation
(Corr(x, y) = 0 and Assumption SLR.2 (or its weaker versions)
are violated),
– or relevant control variables are missing in the simple regression
67
Intensive Course in Econometrics — Section 2.4 — UR March 2009 — R. Tschernig
model such that the results cannot represent results of a fictive
randomized controlled experiment, see Chapter 3 onwards.
Therefore, before any causal interpretation takes place one has to
use specification and diagnostic techniques for regression models.
Frequently one needs economic theory to identify causal relation-
ships.
68
Intensive Course in Econometrics — Section 2.5 — UR March 2009 — R. Tschernig
2.5 Algebraic Properties of the OLS Estimator
Basic properties:
•∑n
i=1 ui = 0, because of normal equation (2.7),
•∑n
i=1 xiui = 0, because of normal equation (2.8).
• The point (x, y) lies on the regression line.
Can you provide some intuition for these properties?
69
Intensive Course in Econometrics — Section 2.5 — UR March 2009 — R. Tschernig
• Total sum of squares (SST)
SST ≡n∑
i=1
(yi − y)2
• Explained sum of squares (SSE)
SSE ≡n∑
i=1
(yi − y)2
• Sum of squared residuals (SSR)
SSR ≡n∑
i=1
u2i
• The decomposition SST = SSE + SSR holds if the regression
model contains an intercept β0.
70
Intensive Course in Econometrics — Section 2.5 — UR March 2009 — R. Tschernig
• Coefficient of Determination R2 or (R-squared)
R2 =SSE
SST.
– Interpretation: share of variation of yi that is explained by the
variation of xi.
– If the regression model contains an intercept term β0, then
R2 =SSE
SST= 1 − SSR
SSTdue to the decomposition SST = SSE + SSR, and therefore
0 ≤ R2 ≤ 1.
– Later we will see: Choosing regressors with R2 is in general mis-
leading.
71
Intensive Course in Econometrics — Section 2.5 — UR March 2009 — R. Tschernig
Reading:
• Sections 1.4 and 2.1-2.3 in Wooldridge (2009) and Appendix 10.1
if needed.
• 2.4 and 2.5 in Wooldridge (2009).
72
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
2.6 Parameter Interpretation and Functional Form and
Data Transformation
• The term linear in “simple linear regression models” does not imply
that the relationship between the explained and the explanatory
variable is linear. Instead it refers to the fact that the parameters
β0 and β1 enter the model linearly.
• Examples for regression models that are linear in their parameters:
yi = β0 + β1xi + ui,
yi = β0 + β1 ln xi + ui,
ln yi = β0 + β1 ln xi + ui,
ln yi = β0 + β1xi + ui,
yi = β0 + β1x2i + ui.
73
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
The Natural Logarithm in Econometrics
Frequently variables are transformed by taking the natural logarithm
ln. Then the interpretation of the slope coefficient has to be ad-
justed accordingly.
Taylor approximation of the logarithmic function:
ln(1 + z) ≈ z if z is close to 0.
Using this approximation one can derive a popular approximation of
growth rates or returns
(∆xt)/xt−1 ≡ (xt − xt−1)/xt−1
≈ ln (1 + (xt − xt−1)/xt−1) ,
(∆xt)/xt−1 ≈ ln(xt) − ln(xt−1).
which approximates well if the relative change ∆xt/xt−1 is close to 0.
74
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
One obtains percentages by multiplying with 100:
100∆ ln(xt) ≈ %∆xt = 100(xt − xt−1)/xt−1.
Thus, the percentage change for small ∆xt/xt−1 can be well ap-
proximated by 100[ln(xt) − ln(xt−1)].
• Examples of models that are nonlinear in the parameters (β0, β1, γ, λ, π, δ):
yi = β0 + β1xγi + ui,
yγi = β0 + β1 ln xi + ui,
yi = β0 + β1xi +1
1 + exp(λ(xi − π))(γ + δxi) + ui.
• The last example allows for smooth switching between two linear
regimes. The possibilities for formulating nonlinear regression mod-
els are huge. However, their estimation requires more advanced
methods such as nonlinear least squares that are beyond the scope
of this course.
75
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
• Note, however, that linear regression models allow for a wide range
of nonlinear relationships between the dependent and independent
variables, some of which were listed at the beginning of this section.
Economic Interpretation of OLS Parameters
• Consider the ratio of relative changes of two non-stochastic
variables y and x
∆yy
∆xx
=%change of y
%change of x=
%∆y
%∆x.
If ∆y → 0 and ∆x → 0, then it can be shown that ∆y∆x → dy
dx.
• If this result is applied to the ratio above, one obtains the elasticity
η(x) =dy
dx
x
y.
76
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
• Interpretation: If the relative change of x is 0.01, then the relative
change of y given by 0.01η(x).
In other words: If x changes by 1%, then y changes by η(x)%.
• If y, x are random variables, then the elasticity is defined with
respect to the conditional expectation of y given x:
η(x) =dE[y|x]
dx
x
E[y|x].
This can be derived fromE[y|x1=x0+∆x]−E[y|x0]
E[y|x0]
∆xx0
=
E[y|x1 = x0 + ∆x] − E[y|x0]
∆x
x0
E[y|x0]
and letting ∆x → 0.
77
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
Different Models and Interpretations of β1
For each model it is assumed that SLR.1 and SLR.4 hold.
• Models that are linear with respect to their variables
(level-level models)
y = β0 + β1x + u.
It holds thatdE[y|x]
dx= β1
and thus
∆E[y|x] = β1∆x.
In words:
The slope coefficient denotes the absolute change in the conditional
expectation of the dependent variable y for a one-unit change in the
independent variable x.
78
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
• Level-log models
y = β0 + β1 ln x + u.
It holds thatdE[y|x]
dx= β1
1
xand thus
∆E[y|x] ≈ β1∆ ln x =β1
100100∆ ln x ≈ β1
100%∆x.
In words:
The conditional expectation of y changes by β1/100 units if x
changes by 1%.
79
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
• Log-level models
ln y = β0 + β1x + u
or
y = eln y = eβ0+β1x+u = eβ0+β1xeu.
Thus
E[y|x] = eβ0+β1xE[eu|x].
If E[eu|x] is constant, then
dE[y|x]
dx= β1 eβ0+β1xE[eu|x]︸ ︷︷ ︸
E[y|x]
= β1E[y|x].
One obtains the approximation
∆E[y|x]
E[y|x]≈ β1∆x, or %∆E[y|x] ≈ 100β1∆x
In words: The conditional expectation of y changes by 100 β1% if
x changes by one unit.
80
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
• Log-log models
are frequently called loglinear models or constant-elasticity
models and are very popular in empirical work
ln y = β0 + β1 ln x + u.
Similar to above one can show that
dE[y|x]
dx= β1
E[y|x]
x, and thus β1 = η(x)
if E[eu|x] is constant.
In these models the slope coefficient is interpreted as the elasticity
between the level variables y and x.
81
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
The Trade Example Continued====================================================================
Dependent Variable: LOG(TRADE_0_D_O)
Method: Least Squares
Date: 02/08/09 Time: 12:29
Sample: 1 55
Included observations: 52
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C -3.460963 3.280047 -1.055157 0.2964
LOG(WDI_GDPUSDCR_O) 0.770127 0.129507 5.946600 0.0000
====================================================================
R-squared 0.414260 Mean dependent var 15.97292
Adjusted R-squared 0.402545 S.D. dependent var 2.613094
S.E. of regression 2.019797 Akaike info criterion 4.281574
Sum squared resid 203.9791 Schwarz criterion 4.356622
Log likelihood -109.3209 F-statistic 2.65E-07
====================================================================
Note the very different interpretation of the estimated slope coeffi-
cient β1:
– Level-level model (Section 2.3): an increase in GDP in the ex-
porting country by 1 billion US dollars corresponds to an increase
82
Intensive Course in Econometrics — Section 2.6 — UR March 2009 — R. Tschernig
of imports to Kazakhstan by 0.216 million US dollars.
– Log-log model: an 1%-increase of GDP in the exporting country
corresponds to an increase of imports by 0.77%.
But wait before you take these numbers seriously.
83
Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig
2.7 Statistical Properties of the OLS Estimator: Expected
Value and Variance
• Some preparatory transformations (all sums are indexed by i =
1, . . . , n):
β1 =
∑ni=1 (xi − x) (yi − y)∑n
i=1 (xi − x)2=
∑ni=1 (xi − x) yi∑nj=1
(xj − x
)2
=
n∑
i=1
(xi − x)∑n
j=1
(xj − x
)2
︸ ︷︷ ︸wi
yi =∑
wiyi
where it can be shown that (try it):∑wi = 0,
∑wixi = 1 and
∑w2
i = 1∑nj=1(xj−x)2.
84
Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig
• Unbiasedness of the OLS estimator:
If Assumptions SLR.1 to SLR.4 hold, then
E[β0] = β0,
E[β1] = β1.
Interpretation:
If one keeps repeatedly drawing new samples and estimating the re-
gression parameters, then the average of all obtained OLS parameter
estimates roughly corresponds to the population parameters.
The property of unbiasedness is a property of the sample distribution
of the OLS estimators for β0 and β1. It does not imply that the
population parameters are perfectly estimated for a specific sample.
85
Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig
Proof for β1 (clarify where each SLR assumption is needed):
1. E[β1
∣∣∣ x1, . . . , xn
]can be manipulated as follows:
= E[∑
wiyi
∣∣∣x1, . . . , xn
]
= E[∑
wi(β0 + β1xi + ui)∣∣∣x1, . . . , xn
]
=∑
E [wi(β0 + β1xi + ui)| x1, . . . , xn]
= β0
∑wi + β1
∑wixi +
∑E [wiui| x1, . . . , xn]
= β1 +∑
wiE [ui| x1, . . . , xn]
= β1 +∑
wiE [ui| xi]
= β1.
2. From E[β1] = E[E[β1|x1, . . . , xn]] one obtains unbiasedness
E[β1] = β1.
86
Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig
• Variance of the OLS estimator
In order to determine the variance of the OLS estimators β0 and β1
we need another assumption,
Assumption SLR.5 (Homoskedasticity):
V ar(u|x) = σ2.
• Variances of parameter estimators
conditional on the sample observations
If Assumptions SLR.1 to SLR.5 hold, then
V ar(β1
∣∣∣ x1, . . . , xn
)= σ2 1
∑ni=1 (xi − x)2
,
V ar(β0
∣∣∣ x1, . . . , xn
)= σ2 n−1∑x2
i∑ni=1 (xi − x)2
.
87
Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig
Proof (for the conditional variance of β1):
V ar(β1
∣∣∣x1, . . . , xn
)
= V ar(∑
wiui
∣∣∣x1, . . . , xn
)
=∑
V ar (wiui| x1, . . . , xn)
=∑
w2iV ar (ui| x1, . . . , xn)
=∑
w2iV ar (ui| xi)
=∑
w2iσ
2
= σ2∑
w2i
= σ2 1∑
(xi − x)2.
88
Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig
• Covariance between the intercept and the slope estimator:
Cov(β0, β1|x1, . . . , xn) = −σ2 x∑n
i=1 (xi − x)2.
Proof: Cov(β0, β1 |x1, . . . , xn) can be manipulated as follows:
= Cov(y − β1x, β1
∣∣∣ x1, . . . , xn
)
= Cov(u, β1
∣∣∣x1, . . . , xn
)
︸ ︷︷ ︸=0 see below
−Cov(β1x, β1
∣∣∣x1, . . . , xn
)
= −xCov(β1, β1
∣∣∣ x1, . . . , xn
)
= −xV ar(β1
∣∣∣ x1, . . . , xn
)
= −σ2 x∑
(xi − x)2.
89
Intensive Course in Econometrics — Section 2.7 — UR March 2009 — R. Tschernig
Cov(u, β1
∣∣∣ x1, . . . , xn
)
= Cov(u,∑
wiui
∣∣∣ x1, . . . , xn
)
= Cov (u, w1u1| x1, . . . , xn) + · · · + Cov (u, wnun|x1, . . . , xn)
= w1Cov (u, u1| x1, . . . , xn) + · · · + wnCov (u, un|x1, . . . , xn)
=∑
wiCov (u, ui| x1, . . . , xn)
= Cov (u, u1|x1, . . . , xn)∑
wi
= 0.
90
Intensive Course in Econometrics — Section 2.8 — UR March 2009 — R. Tschernig
2.8 Estimation of the Error Variance
• One estimator for the error variance σ2 is given by
σ2 =1
n
n∑
i=1
u2i ,
where the ui’s denote the residuals of the OLS estimator.
Disadvantage: The estimator σ2 does not take into account that 2
restrictions were imposed on obtaining the OLS residuals, namely:∑ui = 0,
∑uixi = 0.
This leads to biased estimates, E[σ2|x1, . . . , xn] 6= σ2.
• Unbiased estimator for the error variance:
σ2 =1
n − 2
n∑
i=1
u2i .
91
Intensive Course in Econometrics — Section 2.8 — UR March 2009 — R. Tschernig
• If Assumptions SLR.1 to SLR.5 hold, then
E[σ2|x1, . . . , xn] = σ2.
• Standard error of the regression, standard error of the es-
timate or root mean squared error:
σ =√
σ2 .
• In the formulas for the variances of and covariance between the
parameter estimators β0 und β1 the variance estimator σ2 can be
used for estimating the unknown error variance σ.
Example:
V ar(β1|x1, . . . , xn) =σ2
∑(xi − x)2
.
92
Intensive Course in Econometrics — Section 2.8 — UR March 2009 — R. Tschernig
Denote the standard deviation as
sd(β1|x1, . . . , xn) =
√V ar(β1|x1, . . . , xn),
then
sd(β1|x1, . . . , xn) =σ
(∑(xi − x)2
)1/2
is frequently called the standard error of β1 and reported in the
output of software packages.
Reading: Sections 2.4 and 2.5 in Wooldridge (2009) and Appendix
10.1 if needed.
93
3 Multiple Regression Analysis: Estimation
3.1 Motivation for Multiple Regression: The Trade
Example Continued
• In Section 2.6 two simple linear regression models for explaining
imports to Kazakhstan were estimated (and interpreted): a level-
level model and a log-log model.
• It is hardly credible that imports to Kazakhstan only depend on the
GDP of the exporting country. What about, for example, distance,
94
Intensive Course in Econometrics — Section 3.1 — UR March 2009 — R. Tschernig
borders, and other factors causing trading costs?
• Such quantities have been found to be relevant in the empirical
literature on gravity equations for explaining intra- and interna-
tional trade. In general, bi-directional trade flows are considered.
Here we consider only one-directional trade flows, namely exports
to Kazakhstan in 2004. Such a simplified gravity equation reads as
ln(importsi) = β0 + β1 ln(gdpi) + β2 ln(distancei) + ui. (3.1)
Standard gravity equations are based on bilateral imports and ex-
ports over a number of years and thus require panel data techniques
that are beyond the scope of this course. However, in Section 4.4
we will consider cross-section data on both imports and exports for
2004. For a brief introduction to gravity equations see e.g. Fratianni
(2007). A recent theoretic underpinning of gravity equations was
provided by Anderson & Wincoop (2003).
95
Intensive Course in Econometrics — Section 3.1 — UR March 2009 — R. Tschernig
• If relevant variables are neglected, Assumptions SLR.1 and/or SLR.4
could be violated and in this case interpretation of causal effects
can be highly misleading, see Section 3.4. To avoid this trap, the
multiple regression model can be useful.
• To get an idea about the change in the elasticity parameter due to a
second independent variable, like e.g. distance, inspect the following
OLS estimate of the simple import equation (3.1):Dependent Variable: LOG(TRADE_0_D_O), Method: Least Squares, Date: 02/13/09 Time: 14:31
Sample: 1 55, Included observations: 52
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C 4.800950 3.497341 1.372743 0.1761
LOG(WDI_GDPUSDCR_O) 1.088546 0.137001 7.945508 0.0000
LOG(CEPII_DIST) -1.970804 0.480555 -4.101103 0.0002
====================================================================
R-squared 0.563937 Mean dependent var 15.97292
Adjusted R-squared 0.546138 S.D. dependent var 2.613094
S.E. of regression 1.760423 Akaike info criterion 4.024946
Sum squared resid 151.8554 Schwarz criterion 4.137518
Log likelihood -101.6486 F-statistic 31.68448
Durbin-Watson stat 2.117895 Prob(F-statistic) 0.000000
96
Intensive Course in Econometrics — Section 3.1 — UR March 2009 — R. Tschernig
Instead of an estimated elasticity of 0.77, see Section 2.6, one ob-
tains a value of 1.09. Furthermore, the R2 increases from 0.41 to
0.56, indicating a much better statistical fit. Finally, a 1% increase
in distance reduces imports by almost 2%. Is this model then better?
Or is it (also) misspecified?
To answer these questions we have to study the linear multiple re-
gression model first.
97
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
3.2 The Multiple Regression Model of the Population
• Assumptions:
The Assumptions SLR.1 and SLR.4 of the simple linear regression
model have to be adapted accordingly to the multiple linear regres-
sion model (MLR) for the population (see Section 3.3 in Wooldridge
(2009)):
– MLR.1 (Linearity in the Parameters)
The multiple regression model allows for more than one, say
k, explanatory variables
y = β0 + β1x1 + β2x2 + · · · + βkxk + u (3.2)
and the model is linear in its parameters.
Example: the import equation (3.1).
98
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
– MLR.4 (Zero Conditional Mean)
E[u|x1, . . . , xk] = 0 for all x.
Observe that all explanatory variables of the multiple regression
(3.2) must be included in the conditioning set. Sometimes the
conditioning set is called information set.
• Remarks:
– To see the need for MLR.4, take the conditional expectation of
y in (3.2) given all k regressors
E[y|x1, x2, . . . , xk] = β0 + β1x1 + · · · βkxk + E[u|x1, x2, . . . , xk].
If E[u|x1, x2, . . . , xk] 6= 0 for some x, then the systematic part
β0+β1x1+· · · βkxk does not model the conditional expectations
E[y|x1, . . . , xk] correctly.
99
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
– If MLR.1 and MLR.4 are fulfilled, then equation (3.2)
y = β0 + β1x1 + β2x2 + · · · + βkxk + u
is also called the linear multiple regression model for the
population. Frequently it is also called the true model (even
if any model may be fare from truth). Alternatively, one may
think of equation (3.2) as the data generating mechanism
(although, strictly speaking, a data generating mechanism also
requires specification of the probability distributions of all regres-
sors and the error).
• To guarantee nice properties of the OLS estimator and the sample
regression model, we adapt SLR.2 and SLR.3 accordingly:
– MLR.2 (Random Sampling)
The sample of size n is obtained by random sampling, that is
the observations (xi1, . . . , xik, yi) : i = 1, . . . , n are pairwise
100
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
independently and identically distributed following the population
model in MLR.1.
– MLR.3 (No Perfect Collinearity)
(more on MLR.3 in Section 3.3)
• Interpretation:
– If Assumptions MLR.1 and MLR.4 are correct and the popula-
tion regression model allows for a causal interpretation, then the
multiple regression model is a great tool for ceteris paribus
analysis. It allows to hold the values of all explanatory variables
fixed except one and check how the conditional expectation of
the explained variable changes. This resembles changing one con-
trol variable in a randomized control experiment. Let xj be the
control variable of interest.
101
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
– Taking conditional expectations of the multiple regression (3.2)
and applying Assumption MLR.4 delivers
E[y|x1, . . . , xj, . . . , xk] = β0 +β1x1 + · · ·+βjxj + · · ·+βkxk.
– Consider a change in xj: xj + ∆xj
E[y|x1, . . . , xj+∆xj, . . . , xk] = β0+β1x1+· · ·+βj(xj+∆xj)+· · ·+βkxk.
∗ Ceteris-paribus effect:
In (3.2) the absolute change due to a change of xj by ∆xj is
given by
∆E[y|x1, . . . , xj, . . . , xk] ≡E[y|x1, . . . , xj−1, xj + ∆xj, xj+1, . . . , xk]
− E[y|x1, . . . , xj−1, xj, xj+1, . . . , xk] = βj∆xj,
102
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
where βj corresponds to the first partial derivative
∂E[y|x1, . . . , xj−1, xj, xj+1, . . . , xk]
∂xj= βj.
The parameter βj gives the partial effect of changing xj on
the conditional expectation of y while all other regressors are
held constant.
∗ Total effect:
Of course one can also consider simultaneous changes in the
regressors, for example ∆x1 and ∆xk. For this case one obtains
∆E[y|x1, . . . , xk] = β1∆x1 + βk∆xk.
– Note that the specific interpretation of βj depends on how
variables enter, e.g. as log variables. In a ceteris paribus analysis
the results of Section 2.6 remain valid.
103
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
Trade Example Continued
• Considering the log-log model (3.1)
ln(importsi) = β0 + β1 ln(gdpi) + β2 ln(distancei) + ui
a 1% increase in distance leads to a increase of β2% in imports
keeping GDP fixed. In other words, one can separate the effect
of distance on imports from the effect of economic size. From
the output table in Section 3.1 one obtains that a 1% increase in
distance decreases imports by about 2%.
• Keep in mind that determining distances between countries is a
complicated matter and results may change with the choice of the
method for computing distances. Our data are from CEPII, see also
Appendix 10.4.
• There may still be missing variables, see also Section 4.4.
104
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
Wage Example Continued
• In Section 2.3 it was assumed that hourly wage is determined by
wage = β0 + β1 educ + u.
Instead of a level-level model one may also consider a log-level model
ln(wage) = β0 + β1 educ + u. (3.3)
However, since we expect that experience also matters for hourly
wages, we want to include experience as well. We obtain
ln(wage) = β0 + β1 educ + β2 exper + v. (3.4)
What about the expected log wage given the variables educ and
exper?
E[ln(wage)|educ, exper] = β0 + β1 educ + β2 exper +E[v|educ, exper]
E[ln(wage)|educ, exper] = β0 + β1 educ + β2 exper,
105
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
where the second equation only holds if MLR.4 holds, that is if
E[v|educ, exper] = 0.
• Note that if instead of (3.4) one investigates the simple linear log-
level model (3.3) although the population model contains
exper one obtains
E[ln(wage)|educ] = β0 + β1 educ + β2 E[exper|educ] +E[v|educ]
indicating misspecification of the simple model since it ignores the
influence of exper via β2. Thus, the smaller model suffers from
misspecification if
E[ln(wage)|educ] 6= E[ln(wage)|educ, exper]
for some values of educ or exper.
106
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
• Empirical results:
See Example 2.10 in Wooldridge (2009), file: wage1.wf1, output
from EViews 6:– Simple log-level model
Dependent Variable: LOG(WAGE)
Method: Least Squares, Date: 02/17/09 Time: 15:59
Sample: 1 526, Included observations: 526
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C 0.583773 0.097336 5.997510 0.0000
EDUC 0.082744 0.007567 10.93534 0.0000
====================================================================
R-squared 0.185806 Mean dependent var 1.623268
Adjusted R-squared 0.184253 S.D. dependent var 0.531538
S.E. of regression 0.480079 Akaike info criterion 1.374061
Sum squared resid 120.7691 Schwarz criterion 1.390279
Log likelihood -359.3781 Hannan-Quinn criter. 1.380411
F-statistic 119.5816 Durbin-Watson stat 1.801328
Prob(F-statistic) 0.000000
====================================================================
107
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
ln(wagei) = 0.5838 + 0.0827 educi + ui, i = 1, . . . , 526,
R2 = 0.1858.
If SLR.1 to SLR.4 are valid, then each additional year of schooling
is estimated to increase hourly wages by 8.3% on average. The
sample regression model explains about 18.6% of the variation of
the dependent variable ln(wage).
108
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
– Multivariate log-level model:Dependent Variable: LOG(WAGE), Method: Least Squares
Date: 02/17/09 Time: 15:48
Sample: 1 526, Included observations: 526
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C 0.216854 0.108595 1.996909 0.0464
EDUC 0.097936 0.007622 12.84839 0.0000
EXPER 0.010347 0.001555 6.653393 0.0000
====================================================================
R-squared 0.249343 Mean dependent var 1.623268
Adjusted R-squared 0.246473 S.D. dependent var 0.531538
S.E. of regression 0.461407 Akaike info criterion 1.296614
Sum squared resid 111.3447 Schwarz criterion 1.320940
Log likelihood -338.0094 Hannan-Quinn criter. 1.306139
F-statistic 86.86167 Durbin-Watson stat 1.789452
Prob(F-statistic) 0.000000
====================================================================
ln(wagei) = 0.2169 + 0.0979 educi + 0.0103 experi + ui,
i = 1, . . . , 526,
R2 = 0.2493.
109
Intensive Course in Econometrics — Section 3.2 — UR March 2009 — R. Tschernig
∗ Ceteris-paribus interpretation: If MLR.1 to MLR.4 are cor-
rect, then the expected increase in hourly wages due to an ad-
ditional year of schooling is about 9.8% and thus slightly larger
than obtained from the simple regression model.
An additional year of experience corresponds to an in increase
in expected hourly wages by 1%.
∗ Model fit:
The model explains 24.9% of the variation of the independent
variable. Does this imply that the multivariate model is better
than the simple regression model with an R2 of 18.6%? Be
careful with your answer and wait until we investigate model
selection criteria.
110
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
3.3 The OLS Estimator: Derivation and Algebraic
Properties
• For an arbitrary estimator the sample regression model for a
sample (yi, xi1, . . . , xik), i = 1, . . . , n, is given by
yi = β0 + β1xi1 + β2xi2 + · · · + βkxik + ui, i = 1, . . . , n.
• Recall the idea of the OLS estimator: Choose β0, . . . , βk such that
the sum of squared residuals (SSR)
SSR(β0, . . . , βk) =
n∑
i=1
u2i =
n∑
i=1
(yi − β0 − β1xi1 − · · · − βkxik
)2
is minimized. Taking first partial derivatives of SSR(β0, . . . , βk)
with respect to all k + 1 parameters and setting them to zero yields
111
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
the first order conditions of a minimum:n∑
i=1
(yi − β0 − β1xi1 − · · · − βkxik
)= 0 (3.5a)
n∑
i=1
xi1
(yi − β0 − β1xi1 − · · · − βkxik
)= 0 (3.5b)
... ...n∑
i=1
xik
(yi − β0 − β1xi1 − · · · − βkxik
)= 0 (3.5c)
This system of normal equations contains k + 1 unknown pa-
rameters and k + 1 equations. Under some further conditions (see
below) it has a unique solution.
Solving this set of equations becomes cumbersome if k is large. This
can be circumvented if the normal equations are written in matrix
notation.
112
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
• The Multiple Regression Model in Matrix Form
Using matrix notation the multiple regression model can be rewritten
as (Wooldridge 2009, Appendix E)
y = Xβ + u, (3.6)
where
y1
y2
...
yn
︸ ︷︷ ︸y
=
x10 x11 x12 · · · x1k
x20 x21 x22 · · · x2k
... ... ... ...
xn0 xn1 xn2 · · · xnk
︸ ︷︷ ︸X
β0
β1
β2
...
βk
︸ ︷︷ ︸β
+
u1
u2
...
un
︸ ︷︷ ︸u
.
The matrix X is called the regressor matrix and has n rows and
k + 1 columns. The column vectors y and u have n rows each, the
column vector β has k + 1 rows.
113
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
• Derivation: The OLS Estimator in Matrix Notation
– One possibility to derive the OLS estimator in matrix notation is
to rewrite the normal equations (3.5) in matrix notation. We do
this explicitly for the j-th equationn∑
i=1
xij
(yi − β0xi0 − β1xi1 − · · · − βkxik
)= 0
that is manipulated ton∑
i=1
(xijyi − β0xijxi0 − β1xijxi1 − · · · − βkxijxik
)= 0
and further ton∑
i=1
(β0xijxi0 + β1xijxi1 + · · · + βkxijxik
)=
n∑
i=1
xijyi.
114
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
By factoring out we have
n∑
i=1
xijxi0
β0+
n∑
i=1
xijxi1
β1+· · ·+
n∑
i=1
xijxik
βk =
n∑
i=1
xijyi.
Similarly, rearranging all other equations and collecting all k + 1
equations in a vector delivers
(∑n
i=1 xi0xi0) β0 + (∑n
i=1 xi0xi1) β1 + · · · + (∑n
i=1 xi0xik) βk...
(∑n
i=1 xikxi0) β0 + (∑n
i=1 xikxi1) β1 + · · · + (∑n
i=1 xikxik) βk
=
∑ni=1 xi0yi
...∑n
i=1 xikyi
.
115
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
Applying the rules for matrix multiplication yields
(∑n
i=1 xi0xi0) (∑n
i=1 xi0xi1) · · · (∑n
i=1 xi0xik)
... ... . . . ...
(∑n
i=1 xikxi0) (∑n
i=1 xikxi1) · · · (∑n
i=1 xikxik)
︸ ︷︷ ︸X′X
β0
...
βk
︸ ︷︷ ︸β
=
∑ni=1 xi0yi
...∑n
i=1 xikyi
︸ ︷︷ ︸X′y
as well as the normal equations in matrix notation
(X′X)β = X′y. (3.7)
116
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
– Note: The matrix X′X has k + 1 columns and rows so that it is
a square matrix.
The inverse (X′X)−1 exists if all columns (and rows) are linearly
independent. This can be shown to be the case if all columns of
X are linearly independent.
This is exactly what the next assumption states.
Assumption MLR.3 (No Perfect Collinearity):
In the sample none of the regressors can be expressed as an exact
linear combination of one or more of the other regressors.
Is this a restrictive assumption?
117
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
– Finally, multiply the normal equation (3.7) by (X′X)−1 from the
left and obtain the OLS estimator in matrix notation:
β = (X′X)−1X′y. (3.8)
This is the compact notation for
β0
...
βk
︸ ︷︷ ︸β
=
(∑n
i=1 xi0xi0) (∑n
i=1 xi0xi1) · · · (∑n
i=1 xi0xik)
... ... . . . ...
(∑n
i=1 xikxi0) (∑n
i=1 xikxi1) · · · (∑n
i=1 xikxik)
−1
︸ ︷︷ ︸(X′X)−1
∑ni=1 xi0yi
...∑n
i=1 xikyi
︸ ︷︷ ︸X′y
.
118
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
Algebraic Properties of the OLS Estimator
• X′u = 0, that is∑n
i=1 xijui = 0 for j = 0, . . . , k.
Proof: Plugging y = Xβ + u into the normal equation yields
(X′X)β = (X′X)β + X′u and hence X′u = 0.
• If xi0 = 1, i = 1, . . . , n, it follows that∑n
i=1 ui = 0.
• For the special case k = 1, the algebraic properties of the simple
linear regression model follow immediately.
• The point (y, x1, . . . , xk) is always located on the regression hyper-
plane if there is a constant in the model.
• The definitions for SST, SSE and SSR are like in the simple regres-
sion.
• If a constant term is included in the model, we can decompose
SST = SSE + SSR.
119
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
• The Coefficient of Determination:
R2 is defined as in the SLR case as
R2 =SSE
SSTor, if there is an intercept in the model,
R2 = 1 − SSR
SST.
It can be shown that the R2 is the squared empirical coefficient of
correlation between the observed yi’s and the explained yi’s, namely
R2 =
(∑ni=1 (yi − y)
(yi − ¯y
))2(∑n
i=1 (yi − y)2)(∑n
i=1
(yi − ¯y
)2)
=[Corr(y, y)
]2.
Note that[Corr(y, y)
]2can be used even when R2 is not useful.
120
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
• Adjusted R2:
If we rewrite R2 by expanding the SSR/SST term by n
R2 = 1 − SSR/n
SST/n,
we can interpret SSR/n and SST/n as estimators for σ2 and σ2y
respectively. They are biased estimators, however.
Using unbiased estimators thereof instead one obtains the “ad-
justed” R2
R2 = 1 − SSR/(n − k − 1)
SST/(n − 1)
= 1 − n − 1
n − k − 1· SSR
SST
121
Intensive Course in Econometrics — Section 3.3 — UR March 2009 — R. Tschernig
R2 = 1 − n − 1
n − k − 1
(1 − R2
)
=−k
n − k − 1+
n − 1
n − k − 1· R2
Properties of R2 (see Section 6.3 in Wooldridge (2009)):
– R2 can increase or fall when including an additional regressor.
– R2 always increases if an additional regressor reduces the unbi-
ased estimate of the error variance.
Attention: Analogously to R2 one may not compare R2 of regression
models with different y.
• The quantities R2 and R2 are both called goodness-of-fit mea-
sures.
122
Intensive Course in Econometrics — Section 3.4 — UR March 2009 — R. Tschernig
3.4 The OLS Estimator: Statistical Properties
Assumptions (Recap):
• MLR.1 (Linearity in the Parameters)
• MLR.2 (Random Sampling)
• MLR.3 (No Perfect Collinearity)
• MLR.4 (Zero Conditional Mean)
123
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
3.4.1 The Unbiasedness of Parameter Estimates
• Let MLR.1 through MLR.4 hold. Then we have E[β] = β
Proof:
β = (X′X)−1X′y MLR.3
= (X′X)−1X′ (Xβ + u) MLR.1
= (X′X)−1X′Xβ + (X′X)−1X′u= β + (X′X)−1X′u.
Taking conditional expectation
E[β|X] = β + E[(X′X)−1X′u|X]
= β + (X′X)−1X′E[u|X]
= β. MLR.2 and MLR.4
124
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
The last equality holds because
E[u|X] =
E[u1|X]
E[u2|X]
...
E[un|X]
=
0
0
...
0
,
where the latter follows from
E[ui|X] = E[ui|x11, . . . , x1k, . . . , xnk]
= E[ui|xi1, . . . , xik] MLR.2
= 0 MLR.4
for i = 1, . . . , n.
125
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
• The Danger of Omitted Variable Bias
We partition the k + 1 regressors in a (n × k) matrix XA and a
(n × 1) vector xa. This yields
y = XAβA + xaβa + u. (3.9)
In the following it is assumed that the population regression model
has the same structure as (3.9).
Trade Example Continued (from Section 3.2):
Assume that in the population imports depend on gdp, distance,
and whether the trading countries share to some extent the same
language
ln(importsi) = β0 + β1 ln(gdpi) + β2 ln(distancei)
+ β3 ln(languagei) + ui.(3.10)
so that XA includes the constant, gdpi, and distancei and xa
126
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
includes languagei, each i = 1, . . . , n.
Imagine now that you are only interested in the values of βA (the
parameters for the constant, gdp, and distance), and that the re-
gressor vector xa has to be omitted because you have, for instance,
no data.
Which effect has the omission of the variable xa on the es-
timation of βA if, for example, the model
y = XAβA + w (3.11)
is considered? Model (3.11) is frequently called the smaller model.
Or, stated differently, which estimation properties does the OLS
estimator for βA have on basis of the smaller model (3.11)?
127
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
Derivation:
– Denote the OLS estimator for βA from the small model by βA.
Following the proof of unbiasedness for the small model but re-
placing y with the true population model (3.9) delivers
βA = (X′AXA)−1X′
Ay
= (X′AXA)−1X′
A(XAβA + xaβa + u)
= βA + (X′AXA)−1X′
Axaβa + (X′AXA)−1X′
Au.
– By the law of iterated expectations E[u|XA] = E [E[u|XA,xa]|XA]
and therefore E[u|XA] = E[0|XA] = 0 by validity of MLR.4 for
the population model (3.9).
128
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
– Compute the conditional expectation of βA. Treating the (un-
observed) xa in the same way as XA one obtains
E[βA|XA,xa
]= βA + (X′
AXA)−1X′Axaβa.
Therefore the estimator βA is unbiased only if
(X′AXA)−1X′
Axaβa = 0. (3.12)
Take a closer look at the term on the left hand side of (3.12), i.e.
(X′AXA)−1X′
Axaβa.
One observes that
δ = (X′AXA)−1X′
Axa
is the OLS estimator of δ in a regression of xa on XA
xa = XAδ + ε.
129
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
Condition (3.12) holds (and there is no bias) if
∗ δ = 0, so xa is uncorrelated with XA in the sample or
∗ βa = 0 holds and the smaller model is the population model.
If neither of these conditions holds, then βA is biased
E[βA|XA,xa] = βA + δβa.
This means that the OLS estimator βA is in general biased for
every parameter in the smaller model.
Since these biases are caused by using a regression model that
misses a variable that is relevant in the population model, this
kind of bias is called omitted variable bias and the smaller
model is said to be misspecified. (See Appendix 3A.4 in Wooldridge
(2009).)
130
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
– One may also ask about the unconditional bias. Applying the LIE
delivers
E[βA|XA
]= βA + E
[δ|XA
]βa,
E[βA
]= βA + E
[δ]βa
Interpretation: The second expression delivers the expected value
of the OLS estimator if one keeps drawing new samples for y
and XA. Thus, in repeated sampling there is only bias if there
is correlation in the population between the variables in XA and
xa since otherwise E[δ]
= 0, cf. 2.4.
131
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
• Wage Example Continued (from Section 3.2):
– If the observed regressor educ is correlated with the unobserved
variable ability, then the regressor xa = ability is missing in
the regression and the OLS estimators, e.g. for the effect of an
additional year of schooling, are biased.
– Interpretation of the various information sets for computing the
expectation of βeduc:
∗ First consider
E[βeduc|educ, exper, ability] = βeduc + δβability,
where
ability =(1 educ exper
)δ + ε.
Then the conditional expectation above indicates the average
of βeduc computed over many different samples where each
132
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
sample of workers is drawn in the following way: You always
guarantee that each sample has the same number of workers
with e.g. 10 years of schooling, 15 years of experience, and 150
units of ability and the same number of workers with 11 years of
schooling, etc., so that for each combination of characteristics
there is the same amount of workers although the workers are
not identical.
∗ Next consider
E[βeduc|educ, exper] = βeduc + E[δ|educ, exper] βability.
When drawing a new sample you only guarantee that the amounts
of workers with a specific number of years of schooling and
experience stay the same. In contrast to above, you do not
control ability.
133
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
∗ Finally consider
E[βeduc] = βeduc + E[δ] βability.
Here you simply draw new samples where everything is allowed
to vary. If you had, let’s say 50 workers with 10 years of school-
ing in one sample, you may have 73 workers with 10 years of
schooling in another sample. This possibility is excluded in the
two previous cases.
134
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
• Effect of omitted variables on the conditional mean:
– General terminology:
∗ If E[y|xA, xa] 6= E[y|xA],
then the smaller model omitting xa is misspecified and esti-
mation will suffer from omitted variable bias.
∗ If E[y|xA, xa] = E[y|xA],
then the variable xa in the larger model is redundant and
should be eliminated from the regression.
∗ Trade Example Continued: Assume that the population
regression model only contains the variables gdp and distance.
Then a simple regression model with gdp is misspecified and a
multiple regression model with gdp, distance, and language
contains the redundant variable language.
135
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
– It can happen that for a misspecified model Assump-
tions MLR.1 to MLR.4 are fulfilled.
To see this, consider only one variable in XA
E[y|xA, xa] = β0 + βAxA + βaxa.
Then, by the law of iterated expectations one obtains
E[y|xA] = β0 + βAxA + βaE[xa|xA].
If, in addition, E[xa|xA] is linear in xA
xa = α0 + α1xA + ε, E[ε|xA] = 0,
one obtains
E[y|xA] = β0 + βAxA + βa(α0 + α1xA)
= γ0 + γ1x
with γ0 = β0 + βaα0 und γ1 = βA + βaα1 being the parameters
of the best linear predictor, see Section 2.4.
136
Intensive Course in Econometrics — Section 3.4.1 — UR March 2009 — R. Tschernig
– Note that in this case SLR.1 and SLR.4 are fulfilled for the smaller
model although it is not the population model. However
E[y|xA, xa] 6= E[y|xA]
if βa 6= 0 and α1 6= 0.
– Thus, model choice matters, see Section 3.5. If controlling for
xa is important (controlled random experiments, see Section 1.3),
then the smaller model is of not much use if the differences be-
tween the expected values are large for some values of the regres-
sors.
If one needs a model for prediction, the smaller model may be
preferable since it exhibits smaller estimation variance, see Sec-
tions 3.4.3 and 3.5.
Reading: Section 3.3 in Wooldridge (2009).
137
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
3.4.2 The Variance of Parameter Estimates
• Assumption MLR.5 (Homoskedasticity):
V ar(ui|xi1, . . . , xik) = σ2 i = 1, . . . , n
• Assumptions MLR.1 bis MLR.5 are frequently called Gauss-Markov-
Assumptions.
• Note that by the Random Sampling assumption MLR.2 one has
Cov(ui, uj|xi1, . . . , xik, xj1, . . . , xjk) = 0 for all i 6= j, 1 ≤ i, j ≤ n,
Cov(ui, uj) = 0 for all i 6= j, 1 ≤ i, j ≤ n,
where for the latter equations the LIE was used. Because of MLR.2
one may also write
V ar(ui|xi1, . . . , xik) = V ar(ui|X), Cov(ui, uj|X) = 0, i 6= j.
138
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
One writes all n variances and all covariances in a matrix
V ar(u|X) ≡
V ar(u1|X) Cov(u1, u2|X) · · · Cov(u1, un|X)
Cov(u2, u1|X) V ar(u2|X) · · · Cov(u2, un|X)
... ... . . . ...
Cov(un, u1|X) Cov(un, u2|X) · · · V ar(un|X)
(3.13)
= σ2
1 0 · · · 0
0 1 · · · 0
... ... . . . ...
0 0 · · · 1
or short (MLR.2 and MLR.5 together)
V ar(u|X) = σ2I. (3.14)
139
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
• Variance of the OLS Estimator
Under the Gauss-Markov Assumptions MLR.1 to MLR.5 we have
V ar(βj|X) =σ2
SSTj(1 − R2j)
, xj not constant, (3.15)
where SSTj is the total sample variation (total sum of squares)
of the j-th regressor,
SSTj =
n∑
i=1
(xij − xj)2,
and the coefficient of determination R2j is taken from a regression
of the j-th regressor on all other regressors
xij = δ0xi0 + · · · + δj−1xi,j−1 + δj+1xi,j+1 + vi,
i = 1, . . . , n.(3.16)
(See Appendix 3A.5 in Wooldridge (2009) for the proof.)
140
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
Interpretation of the variance of the OLS estimator:
– The larger the error variance σ2, the larger is the variance of
βj.
Note: This is a property of the population so that this variance
component cannot be influenced by sample size. (In analogy to
the simple regression model.)
– The larger the total sample variation SSTj of the j-th
regressor xj is, the smaller is the variance V ar(βj|X).
Note: The total sample variation can always be increased by
increasing sample size since adding another observation increases
SSTj.
– If SSTj = 0, assumption MLR.3 fails to hold.
141
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
– The larger the coefficient of determination R2j from regression
(3.16) is, the larger is the variance of βj.
– The larger R2j, the better the variation in xj can be explained
by variation in the other regressors because in this case there is
a high degree of linear dependence between xj and the other
explanatory variables.
Then only a small part of the sample variation in xj is specific for
the j-th regressor (precisely the error variation in (3.16)). The
other part of the variation can be explained equally well by the
estimated linear combination of all other regressors. This effect
is not well attributable by the estimator to either variable xj or
the linear combination of all the remaining variables and thus the
estimator suffers from a larger estimation variance.
142
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
– Special cases:
∗ R2j = 0: Then xj and all other explanatory variables are empiri-
cally uncorrelated and the parameter estimator βj is unaffected
by all other regressors.
∗ R2j = 1: Then MLR.3 fails to hold.
∗ R2j near 1: This situation is called multi- oder near collinear-
ity. In this case V ar(βj|X) is very large.
– But: The multicollinearity problem is reduced in larger samples
because SSTj rises and hence variance decreases for a given value
of R2j. Therefore multicollinearity is always a problem of small
sample sizes, too.
143
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
• Estimation of the error variance σ2
– Unbiased estimation of the error variance σ2:
σ2 =u′u
n − (k + 1).
– Properties of the OLS estimator (continued):
Call sd(βj|X) =√
V ar(βj|X) the standard deviation, then
sd(βj|X) =σ
(SSTj(1 − R2
j))1/2
is the standard error of βj.
144
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
• Variance-covariance-matrix of the OLS estimator:
Basics: The covariance of jointly estimating βj and βl — between
the estimators of the j-th and the l-th parameter — is written as
Cov(βj, βl|X) = E[(βj −βj)(βl−βl)|X], j, l = 0, 1, . . . , k +1,
where unbiasedness of the estimators is assumed. Rewrite all vari-
ances and covariances in a ((k + 1) × (k + 1))-matrix:
V ar(β|X) ≡
=
Cov(β0, β0|X) Cov(β0, β1|X) · · · Cov(β0, βk|X)
Cov(β1, β0|X) Cov(β1, β1|X) · · · Cov(β1, βk|X)
... ... . . . ...
Cov(βk, β0|X) Cov(βk, β1|X) · · · Cov(βk, βk|X)
145
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
Rewriting yields
V ar(β|X) =
=
E[(β0 − β0)(β0 − β0)|X] · · · E[(β0 − β0)(βk − βk)|X]
E[(β1 − β1)(β0 − β0)|X] · · · E[(β1 − β1)(βk − βk)|X]
... . . . ...
E[(βk − βk)(β0 − β0)|X] · · · E[(βk − βk)(βk − βk)|X]
= E
(β0 − β0)
· · ·(βk − βk)
((β0 − β0) · · · (βk − βk)
)∣∣∣∣∣∣∣∣X
.
Next it will be shown that it holds:
V ar(β|X) = E[(β − β)(β − β)′|X
]= σ2(X′X)−1.
146
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
Proof:
Remember that correct model specification implies
β = (X′X)−1X′y = (X′X)−1X′(Xβ + u) = β + (X′X)−1X′u,
hence β−β = (X′X)−1X′u, which inserted into V ar(β|X) yields
E[(β − β)(β − β)′|X
]= E
[(X′X)−1X′u
((X′X)−1X′u
)′|X]
= E[(X′X)−1X′uu′X(X′X)−1|X
]
= (X′X)−1X′ E[uu′|X]︸ ︷︷ ︸σ2In
X(X′X)−1
= σ2(X′X)−1X′X(X′X)−1
= σ2(X′X)−1.
From the definition of V ar(β|X) above it can be seen that the
diagonal elements are the variances V ar(βj|X), j = 0, . . . , k.
147
Intensive Course in Econometrics — Section 3.4.2 — UR March 2009 — R. Tschernig
• Efficiency of OLS
Note: The OLS estimator is a linear estimator with respect to the
dependent variable because it holds for given X that
βj =
n∑
i=1
(vi∑ni=1 v2
i
)yi,
where vi are the residuals from regression (3.16). (For a derivation
without matrix algebra see Appendix 3A.2 in Wooldridge (2009).)
Further OLS is unbiased so E[βj] = βj.
Gauss-Markov Theorem: Under assumptions MLR.1 through
MLR.5 the OLS estimator is the best linear unbiased estimator
(BLUE).
“Best” means that the OLS estimator, that is unbiased since E[βj] =
βj, has minimal variance among linear unbiased estimators.
148
Intensive Course in Econometrics — Section 3.4.3 — UR March 2009 — R. Tschernig
3.4.3 Trade-off between Bias and Multicollinearity
• Example: Let the population model be
y = β0 + β1x1 + β2x2 + u.
– For a given sample let R21 be close to 1. Then β1 is estimated
with a large variance by (3.15).
– A possible solution? Leaving out the regressor x2 and estima-
tion of the simple regression. But then, as already shown, the
estimator of β1 is biased.
Hence: If there is correlation between x1 and x2 near 1 or -1, then
— for given sample size — one faces a trade-off between
variance and bias.
– What we observe is kind of a statistical uncertainty relation:
The sample does not provide sufficient information to precisely
149
Intensive Course in Econometrics — Section 3.4.3 — UR March 2009 — R. Tschernig
answer the formulated question.
– The only good solution: Increasing sample size.
– Alternative solution: Combining highly correlated variables.
• Variance of parameter estimates in misspecified models:
Again, there are different possibilities how incorrect regression mod-
els might be chosen (cf. Section 3.4.1):
– Too many variables: Parameters are estimated for variables that
do not play a role in the “true” data generation mechanism
(redundant variables).
– Too few variables: One or more variables are missing which
are relevant in the population regression model (omitted vari-
ables).
150
Intensive Course in Econometrics — Section 3.4.3 — UR March 2009 — R. Tschernig
– Wrong variables: A combination of both.
Effect on the variance of parameter estimators:
– Case 1 (redundant variables):
Consider the population model y = Xβ+u. Assume that instead
the following sample specification is chosen:
y = Xβ + zα + w,
where the vector z contains all sample observations of the variable
z. The variance of the parameter estimator βj is
V ar(βj|X) =σ2
SSTj(1 − R2j,X,z)
,
where now R2j,X,z is the coefficient of determination of a re-
gression of xj on all other variables in X and on z. It is easily
seen that R2j,X,z ≥ R2
j because less variables are included in the
151
Intensive Course in Econometrics — Section 3.4.3 — UR March 2009 — R. Tschernig
regression yielding the second R2.
Therefore: Including additional variables in a regression
model increases estimation variance or leaves it un-
changed.
– Case 2 (omitted variables):
The converse of case 1 holds: If a variable is omitted, it
can be shown that the estimation variance is smaller than
when using the true model.
– Case 3 (redundant and omitted variables):
Should really be avoided.
Correct model specification is crucial!
152
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
3.5 Model Specification I: Model Selection Criteria
• Goal of model selection:
– In principle: find the population model.
– In practice: find the “best” model for the purpose of the analysis.
– More specific: Under the assumption that the population model
is a multiple linear regression model find all regressors that are
included in the regression and their appropriate transformations
(log or level or ...). Avoid omitting variables and including irrel-
evant variables.
153
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
• Brief theory of model selection:
– There are two issues:
a) the variable (model) choice,
b) the estimation variance.
– Consider a): Choose a goal function to evaluate different models.
A popular goal function is the mean squared error (MSE).
For fixed parameters it is defined as
MSE = E[(y − β0x0 − β1x1 − · · · − βkxk)2
], (3.17)
see also equation (2.17) for the simple regression case.
Choose the model for which the MSE is minimal.
154
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
Important cases:
∗ If x0, . . . , xk include all relevant variables, the population
model is a multiple linear regression, and MSE is minimized
with respect to the parameters, then
MSE = E[u2]
= σ2.
∗ If relevant variables are missing, it can be shown that
the MSE decomposes into variance and squared bias. For
simplicity, omit all variables except x1 and fit the simple linear
regression
y = γ0 + γ1x1 + v.
Then
MSE1 = E[(y − E[y|x1, . . . , xk]) + (E[y|x1, . . . , xk] − E[y|x1])2
]
= σ2 + (E[y|x1, . . . , xk] − E[y|x1])2 .
155
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
Since the squared bias term is positive, (E[y|x1, . . . , xk] − E[y|x1])2 >
0, one clearly has MSE < MSE1.
– Consider a) and b): If parameters have to estimated, a
further term enters the mean squared error, namely the variances
and covariances for estimating the model parameters. One has
MSE = V ariance of population error + Bias of chosen model2
+ Estimation variance,
where the estimation variance in general increases with the num-
ber of variables. Now it can happen that for minimizing MSE it
is optimal to choose a model that omits variable(s). A typical
case is prediction.
– Therefore, a reliable method for estimating the MSE is needed.
156
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
• What does not work:
– Selecting the model with the smallest standard error of
the regression σ does not work.
∗ Why? It is always possible to select a model for which every
residual is zero, that is ui = 0 for all i = 1, . . . , n. Then σ = 0
as well although the error variance σ2 > 0 in the true model.
∗ How? Simply take k+1 = n regressors into the sample regres-
sion model which fulfil MLR.3 and solve the normal equations
(3.5). Then you obtain a perfect fit since you have a linear
equation system with n equations and n unknown parameters.
∗ Note that you can add any regressors that fulfil MLR.3 even if
they have nothing to do with the population regression model.
∗ Note also that SSR remains constant or decreases if for a given
157
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
sample of size n a further regressor variable is added since
the linear equation system obtains more flexibility to fit the
sample observations. Therefore σ2 = SSRn remains constant or
decreases as well.
∗ For the variance estimator σ2 = SSRn−k−1 there are opposing
effects: a decrease in SSR maybe offset by the decrease in
n − k − 1.
In sum, the standard error of regression tends to decrease when
additional regressors are added so that it is not suited for selecting
those variables that are part of the population model.
– Selecting the model with the largest R2 does not work either.
Why?
158
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
– Although the adjusted R2 may fall or increase with adding another
regressor, it screws up for k + 1 = n since R2 = 1 as well in this
case.
• Solution: Use model selection criteria
– Basic idea:
Selection criterion = lnu′un
+ (k + 1) · penalty function(n)
∗ First term: a variance estimator for ln(σ2) of the chosen
model.
Note that the estimated variance σ2 = u′u/n is reduced by
every additionally included independent variable.
∗ Second term: is a penalty term punishing the number of
parameters to avoid models that include redundant variables.
159
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
Because the true error variance is typically underestimated us-
ing σ2, the penalty term penalizes the inclusion of additional
regressors.
The penalty term increases with k and the penalty function
must be chosen such that is decreases with n such that a large
number of parameters matters less in large samples. Why?
∗ This implies a trade-off : Regressors are taken in the model,
if the penalty is smaller than the decrease in estimated MSE.
When choosing a criterion one determines how the trade-off is
shaped.
∗ Rule: Choose among all considered candidate models the spec-
ification for which the criterion is minimal.
160
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
– Popular model selection criteria:
∗ the Akaike Criterion (AIC)
AIC = lnu′un
+ (k + 1)2
n, (3.18)
∗ the Hannan-Quinn Criterion (HQ)
HQ = lnu′un
+ (k + 1)2 ln(ln n)
n, (3.19)
∗ the Schwarz / Bayesian Information Criterion (SC/BIC)
SC = lnu′un
+ (k + 1)ln n
n. (3.20)
It is advised always to check all criteria although the researcher
decides which to use. In nice cases, all criteria deliver the same
result. Note that for standard sample sizes SC punishes additional
parameters more than HQ, and HQ more than AIC
161
Intensive Course in Econometrics — Section 3.5 — UR March 2009 — R. Tschernig
• Trade Example Continued:
– Model 1
LOG(TRADE_0_D_O) = -3.4610 + 0.7701*LOG(WDI_GDPUSDCR_O)
AIC = 4.282, HQ = 4.310, SC = 4.357
– Model 2
LOG(TRADE_0_D_O) = 4.8009 + 1.0885*LOG(WDI_GDPUSDCR_O) - 1.9708*LOG(CEPII_DIST)
AIC = 4.025, HQ = 4.068, SC = 4.138
– Model 3
LOG(TRADE_0_D_O) = -9.5789+ 1.3566*LOG(WDI_GDPUSDCR_O) - 1.1442*LOG(CEPII_DIST)
+ 3.1265*CEPII_COMCOL_REV
AIC = 3.694, HQ = 3.751, SC = 3.844
– Model 4
Y_LOG = -13.0268 + 1.3176*LOG(WDI_GDPUSDCR_O) - 0.6249*LOG(CEPII_DIST)
+ 3.3512*CEPII_COMCOL_REV + 2.1096*CEPII_COMLANG_OFF
AIC = 3.679, HQ = 3.751, SC = 3.867
– Comparing all four models, SC selects model 3 with regressors gdp, distance and common colonizer while
AIC selects model 4 with additional regressor common official language. See Appendix 10.4 for more details
on variables. One can nicely see that SC punishes additional variables more than AIC. Statistical tests may
provide further information on which model to choose, see Sections 4.3 onwards.
162
4 Multiple Regression Analysis: Hypothesis Testing and
Confidence Intervals
4.1 Basics of Statistical Tests
Foundations of statistical hypothesis testing
• In general: Statistical hypothesis tests allow statistically sound and
unambiguous answers to yes-or-no questions:
– Do men and women earn equal income in Kazakhstan?
163
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
– Do certain political attempts lead to a decrease in unemployment
in 2010?
– Are imports to Kazakhstan influenced by the gdp of exporting
countries?
164
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
• Elements of a statistical test:
1. Two disjoint hypotheses about the value(s) of (a) parameter(s)
θ in a population.
That means that one of the two competing hypotheses has to
hold in the population:
– Null hypothesis H0
– Alternative hypothesis H1
2. A test statistic T that is a function of some or all sample values
(X,y). We will denote it as t(X,y).
3. A decision rule, stating for which values of t(X,y) the null
hypothesis H0 is rejected and for which values the null is
not rejected.
165
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
More precisely: Partition the domain of the test statistic T in
two disjoint regions:
– Rejection region, critical region CIf the test statistic t(X,y) is located in the critical region, H0
is rejected:
Reject H0 if t(X,y) ∈ C
– Non-rejection region
If the test statistic t(X,y) falls into the non-rejection region,
H0 is not rejected:
Do not reject H0 if t(X,y) 6∈ C
– Critical value c: Boundary between rejection and non-rejection
region.
166
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
• Properties of a test:
– Type I error, α error:
The type I error measures the probability (evaluated before the
sample is taken) of rejecting H0 though H0 is correct in the pop-
ulation,
α = P (Reject H0 |H0 is true) = P (T ∈ C|H0 is true).
The type I error is frequently called the significance level or
size of a test.
– Type II error, β error:
The type II error gives the probability of not rejecting H0 though
it is wrong,
β = P (Not reject H0|H1 is true).
167
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
– Size of a test: The significance level or size equal to α has to
be fixed by the researcher before the test is carried out.
– Power of a test: The power of a test gives the probability of re-
jecting a wrong null hypothesis. power = π = P (Reject H0 |H1 is true),
that is
π = 1 − P (Not reject H0|H1 is true) = 1 − β.
To calculate C for a given α one has to know the probability
distribution of the test statistic under H0.
168
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
Deriving Tests about the Sample Mean:
1. Consider two disjoint hypotheses about the mean of a sample.
(For example, the mean µ of hourly wages in the US in 1976.)
a) Null hypothesis
H0 : µ = µ0
(In our example: mean hourly wage is 6 US-$,
thus H0 : µ = 6)
b) Alternative hypothesis
H1 : µ 6= µ0
(In the example: mean hourly wages are not 6 US-$,
thus H1 : µ 6= 6)
169
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
2. Test statistic:
a) Choice of an estimator for the unknown mean µ, e.g. the OLS
estimator of a regression of hourly wages w on a constant:
Compute the sample mean
µ =1
n
n∑
i=1
wi.
out of a sample w1, . . . , wn with n observations.
b) Obtain the probability distribution of the estimator: For simplicity
assume that individual wages wi are jointly normally distributed
with expected value µ and variance σ2w, that is
wi ∼ N (µ, σ2w).
170
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
From the properties of jointly normally distributed random vari-
ables it follows that
µ ∼ N(µ, σ2
µ
),
where σ2µ = V ar(µ) = V ar(n−1∑wi) = n−1σ2
w.
c) In order to obtain a test statistic t(w1, . . . , wn) all unknown pa-
rameters have to be removed from the distribution. In this simple
case this can be achieved by standardizing µ
t(w1, . . . , wn) =µ − µ
σµ∼ N (0, 1).
d) The test statistic t(w1, . . . , wn) can be calculated if we know µ
and σµ. Assume for the moment that σµ is known.
171
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
Which value takes µ under H0?
H0 : µ = µ0.
Under H0 we can compute the test statistic for a given sample as
t(w1, . . . , wn) =µ − µ0
σµ∼ N (0, 1).
3. Decision rule:
When should we reject H0 and in which case shouldn’t we?
(Now the significance level α has to be chosen!)
If the deviation of µ from the null hypothesis value µ0 is large
enough one would reject H0.
172
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
0Region of rejection under H
f(t)
t
0
0
Critical value c
Probability of error a/2Probability of error a/2
0Region of rejection under H
0Non-rejection region under H
Intuition: If t is very large (or very small) then
a) the estimated mean µ is far from µ0 (under H0) and / or
b) the standard deviation σµ of the estimated deviation is small
relative to µ − µ0.
173
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
• When is |t| large enough (to reject H0)?
• Note: Under H0 it holds that
t(w1, . . . , wn) =µ − µ0
σµ∼ N (0, 1)
and hence for given α the rejection region C can be determined
(see figure).
• Formally:
P (T < −c|H0) + P (T > c|H0) = α
or in this case due to the symmetry of the normal distribution
P (T < −c|H0) =α
2und P (T > c|H0) =
α
2.
The values of −c and c are tabulated — they are the α/2 and
1 − α/2 quantiles of the standard normal distribution.
174
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
• Under H1 it holds that
µ − µ
σµ∼ N (0, 1).
Expanding yields
µ − µ + µ0 − µ0
σµ=
µ − µ0
σµ+
µ0 − µ
σµ=
µ − µ0
σµ︸ ︷︷ ︸t(w1,...,wn)
− µ − µ0
σµ︸ ︷︷ ︸m
and therefore we have under H1
t(w1, . . . , wn) =µ − µ0
σµ∼ N
(µ − µ0
σµ, 1
)
since X ∼ N (m, 1) is equivalent to X − m ∼ N (0, 1).
• Conclusion: If H1 is true, then the density of t(w1, . . . , wn) is
shifted by (µ − µ0)/σµ.
175
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
• In the figure exhibiting the density under H1 (for a specific value
of µ 6= µ0) the power can be seen as the sum of the two shaded
areas because π = P (t < −c|H1) + P (t > c|H1).
f(t)
t
0Non-rejection region of H
0Rejection region of H
0Rejection region of H
Critical value c
Power = Sum of the probabilites of rejection
Probability of rejectionProbability of rejection
0 0m-m
ms
• For a given σµ, the power of the test increases with the distance
between the null hypothesis µ0 and the true value µ.
• Recall that if H0 is true, then (µ − µ0)/σµ = 0 holds and one
176
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
obtains the distribution under H0.
• It can further be seen that the type II error — given as 1 − π =
1 − (1 − β) = β — does not equal zero!
4. There remains one problem: In real world applications we do not
know the standard deviation of the mean estimator σµ = σw/√
n.
Remedy: Estimation by
σµ =σw√
n.
Then one has the popular t statistic
t(w1, . . . , wn) =µ − µ0
σµ,
however, watch out!
177
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
The test statistic is no longer normally distributed but follows a t
distribution with n− 1 degrees of freedom (short tn−1). Therefore
t(w1, . . . , wn) =µ − µ0
σµ∼ tn−1.
To obtain the critical values
P (T < −c|H0) =α
2und P (T > c|H0) =
α
2,
the tables of the t distribution have to be considered (see Appendix
G, Table G.2 in Wooldridge (2009)).
Wage Example Continued:
Hourly wages wi, i = 1, . . . , 526 of US employees:
1. Hypotheses:
a) Null hypothesis: H0 : µ = 6
b) Alternative hypothesis: H1 : µ 6= 6
178
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
2. Estimation and calculation of the t statistic in EViews:====================================================================
Dependent Variable: WAGE, Method: Least Squares Sample: 1 526, Included
observations: 526
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C 5.896103 0.161026 36.61580 0.0000
====================================================================
R-squared 0.000000 Mean dependent var 5.896103
Adjusted R-squared 0.000000 S.D. dependent var 3.693086
S.E. of regression 3.693086 Akaike info criterion 5.452701
Sum squared resid 7160.414 Schwarz criterion 5.460810
Log likelihood -1433.060 Durbin-Watson stat 1.817647
====================================================================
Thus
µ = 5.896103, σµ = 0.161026
and
t(w1, . . . , w526) =5.896103 − 6
0.161026= −0.64521878.
179
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
3. Determination of critical values:
Suppose a significance level of α = 5%. Then the critical value
c = t525,0.05 can be obtained from the table for the t distribution
with n − 1 = 525 degrees of freedom: c = t525,0.05 = 1.96.
4. Test decision: Do not reject H0 : µ = 6 since
−c = −1.96 < t = −0.645 < c = 1.96,
and therefore t /∈ C (the test statistic is not contained in the rejec-
tion region).
5. However:
Do hourly wages wi really follow a normal distribution as assumed?
Examine the histogram of the sample observations wi (see the de-
scriptive statistics option in EViews):
180
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
Result:
0
20
40
60
80
100
120
140
0 2 4 6 8 10 12 14 16 18 20 22 24
Series: WAGE
Sample 1 526
Observations 526
Mean 5.896103
Median 4.650000
Maximum 24.98000
Minimum 0.530000
Std. Dev. 3.693086
Skewness 2.007325
Kurtosis 7.970083
Jarque-Bera 894.6195
Probability 0.000000
• The normality condition for our test does not seem to be fulfilled.
The test result could be misleading!
• There are also tests that work without the normality assumption,
see Section 5.1.
181
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
One- and two-sided hypothesis tests
• Two-sided tests
H0 : θ = θ0 versus H1 : θ 6= θ0
• One-sided tests
– Tests with left-sided alternative hypothesis
H0 : θ ≥ θ0 versus H1 : θ < θ0
Notice: Often, also in Wooldridge (2009), you can read H0 : θ =
θ0 versus H1 : θ < θ0. This notation, however, is somewhat
imprecise since either H0 or H1 has to be true. This is not made
clear by the latter notation.
182
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
H0 : θ ≥ θ0 versus H1 : θ < θ0
0 t
0Non-rejection region under H
Critical value c
f(t)
Probability of error a
0Region of rejection under H
∗ Decision rule:
t < c ⇒ Reject H0.
∗ You do not need a rejection region on the right hand side since
all θ > θ0 are elements of H0 and thus fall into the non-
rejection region.
183
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
∗ The critical value is obtained on basis of the density for θ = θ0
since then for a given critical value c the shaded area is larger
than for any θ > θ0 and one prefers a test for which the
maximum of the type I error is controlled.
184
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
Wage Example Continued:
(In the following we ignore that wages are not normally dis-
tributed.)
∗ The null hypothesis states that mean hourly wages are US-$ 6
or more (H1 says it is less than US-$ 6):
H0 : µ ≥ 6 versus H1 : µ < 6
∗ Calculation of the test statistic: as in the two-sided case, be-
cause again µ0 is the boundary between null and alternative
hypothesis:
t(w1, . . . , w526) =5.896103 − 6
0.161026= −0.64521878
185
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
∗ Calculation of the critical value: For α = 0.05 the critical value
(note: one-sided test) from the t distribution with 525 degrees
of freedom (df) is 1.645.
∗ Decision: Since
t = −0.64521878 > c = −1.645
the null hypothesis is not rejected.
186
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
– Test with right-sided alternative
H0 : θ ≤ θ0 versus H1 : θ > θ0f(t)
t
0
0
0Non-rejection region under H Region of rejection under H
Probability of error a
Critical value c
0
As with left-sided alternatives, but reversed.
• Why do we carry out one-sided tests? Consider the following
issue: Provide statistical evidence that the mean wage is above $
5.60.
– Since by using statistical tests we can never confirm but only
187
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
reject a hypothesis, we have to choose the alternative hypothesis
such that it reflects our conjecture. Here, this is a mean wage
larger than $ 5.60. Rejecting the null hypothesis then provides
statistical evidence for the alternative hypothesis. However, there
are exceptions to this rule, see e.g. Sections 4.6 and 4.7.
– We thus have to test if the mean wage is statistically significantly
larger than $ 5.60.
We therefore need a test with a one-sided alternative. Our pair
of hypotheses is
H0 : µ ≤ 5.60 versus H1 : µ > 5.60.
– For α = P (T > c|H0) = 0.05 the critical value is c = 1.645.
– Decision:
t =5.896103 − 5.60
0.161026= 1.8388521 > c = 1.645
188
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
⇒ Reject H0 (for size 5%) that means data confirm that the
mean wage is statistically significantly above $ 5.60.
– If, on the contrary, we want to examine whether mean wages
deviate from $ 5.60 in any direction, the pair of hypotheses is:
H0 : µ = 5.60 versus H1 : µ 6= 5.60.
Given the chosen significance level, α = 0.05, the critical values
are -1.96 and 1.96, respectively, and hence
−1.96 < 1.84 < 1.96.
Thus, the null hypothesis cannot be rejected.
– It is therefore easier to reject if one has knowledge about the
location of the alternative because then the region of rejection
can be made smaller and it is “easier” to reject the null hypothesis
if it is false.
189
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
p-values
• For every test statistic one can calculate the largest significance
level for which — given a sample of observations — the computed
test statistic would have just not led to a rejection of the null. This
probability is called p-value (probability value).
In case of a one-sided test with right-hand alternative one has
(Wooldridge 2009, Appendix C.6, p. 776)
P (T ≤ t(y)|H0) ≡ 1 − p
• Since P (T > t(y)|H0) = 1 − P (T ≤ t(y)|H0), one also has
P (T > t(y)|H0) = p
and thus it is common to say that the p-value is the smallest signif-
icance level at which the null can be rejected. Cf. Section 4.2, p.
133 in Wooldridge (2009)
190
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
• The decision rule of a test can also be stated in terms of p-
values:
Reject H0 if the p-value is smaller than the significance level α.
Note: In the figure t is shorthand for t(y).
Left-sided test: p = P (T < t(X,y)),
Right-sided test: p = P (T > t(X,y))
Two-sided test: p = P (T < −|t(X,y)|) + P (T > |t(X,y)|)
191
Intensive Course in Econometrics — Section 4.1 — UR March 2009 — R. Tschernig
• Software packages (e.g. EViews) often give p-values for
H0 : θ = 0 versus θ 6= 0.
Reading: Appendix C.6 in Wooldridge (2009).
192
Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig
4.2 Probability Distribution of the OLS Estimator
For the multiple regression model
y = Xβ + u
we assume MLR.1 to MLR.5, as we did in Sections 3.2 and 3.4.
• Recall from Section 3.4.1 that under MLR.1 the OLS estimator
β = (X′X)−1X′y
can be written as
β = β + (X′X)−1X′︸ ︷︷ ︸
W
u. (4.1)
193
Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig
• In order to derive the probability distribution of a test statistic one
needs the probability distribution of the underlying estimators since
the former is a function of the latter. Furthermore, the probability
distribution of the OLS estimator is necessary to construct interval
estimators, see Section 4.5.
Conditioning on the regressor matrix X, it follows from (4.1) that
the probability distribution of the OLS estimator only depends on
the error vector u. Similarly to the case of testing the mean we
make the assumption that the relevant random variables are nor-
mally distributed.
194
Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig
• Assumption MLR.6 (Normality of Errors):
Conditionally on the regressor matrix X, the vector of sample errors
u is stochastically independently and identically normally distributed
as
ui|xi1, . . . , xik ∼ i.i.d.N (0, σ2), i = 1, . . . , n.
Jointly with MLR.2, it can be equivalently written that u is multi-
variate normal with mean zero and variance-covariance matrix σ2I
u|X ∼ N (0, σ2I).
• Of course, one could assume for the errors u any other probability
distribution. However, assuming normally distributed errors has two
advantages:
1. The probability distribution of the OLS estimator and derived test
statistics can easily be derived, see the remaining sections.
195
Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig
2. Under certain conditions the resulting probability distribution for
the OLS estimator holds even if the errors are not normally dis-
tributed. Then it is called asymptotic distribution, see Chap-
ter 5.
See Appendix B and D in Wooldridge (2009) for rules and properties
of normally distributed random variables and vectors.
196
Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig
• Properties of the multivariate normal distribution:
– If Z ∼ N (µ, σ2), then aZ + b ∼ N (aµ + b, a2σ2).
– If the random numbers Z and V are jointly normally distributed,
then Z and V are stochastically independent if and only if
Cov(Z, V ) = 0.
– Every linear combination of a vector of identically and indepen-
dently normally distributed random variables z ∼ N (µ, σ2I) is
also normally distributed. Let
w =
w1
...
wn
, z =
z1
...
zn
.
Then∑n
j=1 wjzj = w′z ∼ N(w′µ, σ2w′w
).
197
Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig
More generally, it holds for z = (z1, . . . , zn)′ ∼ N (µ, σ2I) and
W =
w01 w02 · · · w0n
w11 w12 · · · w1n
w21 w22 · · · w2n
... ... ...
wk1 wk2 · · · wkn
∑nj=1 w0jzj
...∑n
j=1 wkjzj
= Wz ∼ N
(Wµ, σ2WW′
). (4.2)
• The property (4.2) for linear combinations of normally distributed
random numbers is very helpful for us since the OLS estimator (4.1)
is just such a linear combination.
198
Intensive Course in Econometrics — Section 4.2 — UR March 2009 — R. Tschernig
Thus, one obtains
β − β = Wu ∼ N(0, σ2WW′
).
Since WW′ = (X′X)−1X′X(X′X)−1 = (X′X)−1, one obtains
β ∼ N(β, σ2(X′X)−1
).
Similarly one can show that
βj ∼ N
(βj, σ
2βj
)(4.3)
with σ2βj
= σ2
SSTj(1−R2j)
(see (3.15) in Section 3.4).
• Note that (4.3) generalizes the example of Section 4.1 for testing
hypotheses on the mean. If X is a column vector of ones, then
β0 = µ.
199
Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig
4.3 The t Test in the Multiple Regression Model
• Derivation of the test statistic and its distribution
– From (4.3) βj ∼ N
(βj, σ
2βj
).
– Standardizing leads to
βj − βj
σβj
∼ N (0, 1) .
For estimated σ2 (no proof) the test statistic follows a t gdistribution
with n − k − 1 degrees of freedom. Estimating k + 1 regression
parameters implies k + 1 restrictions from the normal equations
t(X,y) =βj − βj
σβj
∼ tn−k−1.
200
Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig
• Critical region and decision rule
– Two-sided test
∗ Hypotheses:
H0 : βj = βj0 versus H1 : βj 6= βj0.
For a given significance level one obtains the critical values from
the table of the t distribution such that P (T < −c|H0) = α/2
and P (T > c|H0) = α/2 or equivalently 2 ·P (T > c|H0) = α.
∗ Decision rule:
· Reject H0 if |t(X,y)| > c, otherwise do not reject H0.
· Alternatively: Calculate p-value
p = P (|T | > |t(X,y)||H0) = 2 · P (T > t(X,y)|H0)
and reject H0 if p < α, otherwise do not reject H0.
201
Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig
– One-sided test with left-sided alternative
∗ Hypotheses:
H0 : βj ≥ βj0 versus H1 : βj < βj0.
For a given significance level one obtains the critical value from
the table of the t distribution such that
P (T < c|H0) = α.
∗ Decision rule:
· Reject H0 if t(X,y) < c, otherwise do not reject H0.
· Alternatively: Calculate p-value
p = P (T < t(X,y)|H0).
and reject H0 if p < α, otherwise do not reject H0.
202
Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig
– One-sided test with right-sided alternative
∗ Hypotheses:
H0 : βj ≤ βj0 versus H1 : βj > βj0.
For a given significance level one obtains the critical value from
the table of the t distribution such that
P (T > c|H0) = α.
∗ Decision rule:
· Reject H0 if t(X,y) > c, otherwise do not reject H0.
· Alternatively: Calculate p-value
p = P (T > t(X,y)|H0)
and reject H0 if p < α, otherwise do not reject H0.
203
Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig
• Economic versus statistical significance
– For a given (statistical) significance level α, the power of a test
increases with increasing sample size since σβj
in the denominator
of the test statistic decreases with sample size.
– Not being able to reject a null hypothesis may thus be simple
caused by a too small sample size (if the null hypothesis is wrong
in the population).
– On the other hand, if a variable has only weak influence in the
population, its parameter will be significantly different from zero
if the sample size is large enough. Thus, even if βjxj only has
small economic impact on the dependent variable, the variable
is statistically significant.
– Be careful: In order to avoid estimation bias due to too small
204
Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig
models, significant variables must be kept in the model, see Sec-
tion 3.4.1.
• Choice of significance level
– Two reasons for decreasing the significance level α with increasing
sample size n:
∗ Larger sample sizes make tests more powerful. Thus, one can
decide whether the benefits of a larger sample size is only at-
tributed to reducing the Type II error β = 1 − π or whether
one wants also to decrease the Type I error as well. In case
of standard significance testing, the type I error represents the
probability to include a variable in the model although it is irrel-
evant in the population model. Thus, it makes sense to reduce
this probability as well.
205
Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig
∗ In general one selects relevant variables from a large number
of possibly relevant variables. Since for each statistically sig-
nificant variable a significance level α holds, one includes er-
roneously on average about αK redundant variables where K
denotes the total number of variables considered. Since fre-
quently K is allowed to increase with sample size n, the sig-
nificance level α should fall in order to avoid αK to increase.
– If one uses the Hannan-Quinn (HQ) (3.19) or the Schwarz (SC)
(3.20) model selection criterion, then the significance level de-
creases with sample size. This is not the case for the AIC criterion
(3.18).
206
Intensive Course in Econometrics — Section 4.3 — UR March 2009 — R. Tschernig
• Insignificance, multicollinearity, and sample size
– Recall: The test statistic t(X,y) is small since
∗ the deviation between the true value and the null hypothesis is
small, for example between βj and βj0
∗ or the estimated standard error σβj
of βj is large.
The latter can also be caused by multicollinearity in X. Thus: A
high degree of multicollinearity makes it more unlikely to reject
the null hypothesis (since |t(X,y)| is small on average).
– For this reason one may keep insignificant variables in the regres-
sion. However, corresponding parameter estimates have then to
be interpreted with care.
Reading: Appendices C.5, E.3 in Wooldridge (2009) if needed.
207
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
4.4 Example of an Empirical Analysis I: A Simplified
Gravity Equation
Trade Example Continued (from Section 3.5):
Compare steps of an econometric analysis, see Section 1.2.
1. Question of interest:
Quantify impact of changes of gdp in exporting country and changes
in imports to Kazakhstan.
2. Economic model:
Under idealized assumptions including complete specialization in
production and identical consumption preferences among countries,
no trading costs, and focusing exclusively on imports, economic the-
ory implies (see Section II, equation (5) in Fratianni (2007))
importsi = A gdpi distanceβ2i , β2 < 0.
208
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
This implies a unit elasticity (elasticity of 1) of gdp on imports. This
means that a 1% change in gdp in the exporting country increases
imports by 1% as well.
This hypothesis can be statistically tested.
3. Econometric model:
The simplest econometric model is obtained by taking logs of the
economic model and adding an error term. This delivers
ln(importsi) = β0 + β1 ln(gdpi) + β2 ln(distancei) + ui.
4. Collecting data: see Appendix 10.4.
5. Selection and estimation of an econometric model:
In practice, there may be further variables influencing imports. Thus,
further control variables have to be added. Based on the Schwarz
criterion the model selection exercise in Section 3.5 suggested to
209
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
add the control variable common colonizer since 1945
(Model 3),
ln(importsi) = β0+β1 ln(gdpi)+β2 ln(distancei)+β3comcoli+ui.
====================================================================
Dependent Variable: LOG_IMP
Method: Least Squares
Sample: 1 55, Included observations: 52
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C -9.578911 4.273583 -2.241424 0.0297
LOG(WDI_GDPUSDCR_O) 1.356566 0.128793 10.53290 0.0000
LOG(CEPII_DIST) -1.144269 0.441293 -2.592991 0.0126
CEPII_COMCOL_REV 3.126534 0.674896 4.632616 0.0000
====================================================================
R-squared 0.698665 Mean dependent var 15.97292
Adjusted R-squared 0.679832 S.D. dependent var 2.613094
S.E. of regression 1.478578 Akaike info criterion 3.693842
Sum squared resid 104.9372 Schwarz criterion 3.843937
Log likelihood -92.03988 Durbin-Watson stat 2.044512
====================================================================
210
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
6. Model diagnostics: Check possible violation of MLR.5 (Homoskedas-
ticity) by plotting the residuals against the fitted values and possible
violation of MLR.6 (Normal errors) by plotting a histogram of the
residuals:
-4
-3
-2
-1
0
1
2
3
10 12 14 16 18 20 22
FIT_OLS_IMP_KAZ
RE
S_
OL
S_
IMP
_K
AZ
0
2
4
6
8
10
12
-4 -3 -2 -1 0 1 2 3
Series: RES_OLS_IMP_KAZ
Sample 1 55
Observations 52
Mean -2.95e-15
Median -0.103912
Maximum 2.909987
Minimum -3.625761
Std. Dev. 1.434431
Skewness -0.295179
Kurtosis 3.017182
Jarque-Bera 0.755772
Probability 0.685309
The scatter plot does not indicate a violation of MLR.5. Why?
Inspecting the box right to the histogram shows that the estimated
kurtosis is close to 3 which is the theoretical value implied by the
211
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
standard normal distribution.
Thus, we may continue to use this model.
7. Usage of the model: Conduct tests:
A two-sided test
• Now we can formulate the pair of statistical hypotheses:
H0 : The elasticity of imports to gdp is 1. versus H1 : The elasticity is unequal to 1.
H0 : β1 = 1 versus H1 : β1 6= 1.
• Compute t statistic from the relevant line of the outputVariable Coefficient Std. Error t-Statistic Prob.
LOG(WDI_GDPUSDCR_O) 1.356566 0.128793 10.53290 0.0000
t(X,y) =β1 − β10
σβ1
=1.356566 − 1
0.128793= 2.76852
212
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
• Choose a significance level, e.g. α = 0.05.
Compute critical values: The degrees of freedom are n−k−1 =
52 − 3 − 1 = 48. One may obtain an approximate critical value
from Table G.2 in Wooldridge (2009) or a precise critical value
e.g. from
– EViews using scalar crit = @qtdist(1-alpha/2,n-k-1)
in the command window or
– Excel using c =(TINV(alpha;n-k-1))=2.0106. (Note that
the Excel function already assumes a two-sided test.)
• Since
t(X,y) = 2.76852 > 2.0106 = c
one rejects the null hypothesis.
• p-values can be computed in EViews using
213
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
scalar pval = 2*(1-@ctdist(t,n-k-1))= 0.0080. Thus,
one can reject H0 even at the 1% significance level. The p-
value means that we would observe a t statistic of at least 2.787
in absolute value only in about 8 samples out of 1000 samples
drawn.
One-sided test
• Now we can formulate the pair of statistical hypotheses with re-
spect to the sign of β2, that is the impact of distance on imports.
Since we want to provide evidence for β2 < 0, we put this into
H1:
H0 : β2 ≥ 0 versus H1 : β2 < 0.
• Compute t statistic from the relevant line of the outputVariable Coefficient Std. Error t-Statistic Prob.
LOG(CEPII_DIST) -1.144269 0.441293 -2.592991 0.0126
214
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
t(X,y) =β2 − β20
σβ2
=−1.144269 − 0
0.441293= −2.592991
• Choosing again α = 0.05, we compute the critical value using the
EViews function scalar crit = @qtdist(alpha,n-k-1)=-
1.6772.
• Since
t(X,y) = −2.592991 < −1.6772 = c
one rejects the null hypothesis. Thus, log distance has a statis-
tically significant negative impact on imports at the given signif-
icance level.
• The corresponding p-value using EViews is
scalar pval = @ctdist(t,n-k-1)= 0.0063. Thus, distance
has a negative impact even at the 1% significance level.
215
Intensive Course in Econometrics — Section 4.4 — UR March 2009 — R. Tschernig
Interpretation of β3:
In principle, this parameter can be interpreted like in a log-level
model, see Section 2.6. However, since β3 is very different from
0, and because the regressor is a dummy variable, one should use
an exact formula to compute the relative change, see Section 6.3:
the precise value is eβ3 − 1 = 21.8. Thus, imports from countries
with colonial ties are about 22 times larger than from countries from
other countries keeping everything else fixed! These are very likely
the trading patterns from the former Soviet Union.
Note that we already considered other model specifications in Sec-
tion 3.5. It might be interesting to check whether these test results
are robust if other model specifications are used such as Model 2 or
Model 4.
216
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
4.5 Confidence Intervals
• How large is the probability that the estimated parameter value
corresponds to the true value?
• A parameter estimator — to be more precise, a point estimator —
does not allow any conclusions how “close” the estimate is to the
true value of the population.
• Following the position of Sir Karl Popper who advocated the crit-
ical rationalism in the philosophy of science, point estimates are
not very useful since it cannot be falsified. Instead, an empirical
hypothesis is only scientific if it is falsifiable.
• Example: Assume we predicted on basis of an econometric model
a price index and obtained a predicted value of 5.12. the realized
value, however, will be 5.24. → Then we made a wrong prediction
217
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
since it did not realize exactly.
This “error” can only have three reasons:
– The random error of the population regression model.
– The estimation error of the sample regression model.
– The regression model is not correct or (more realistic) it is a bad
approximation. At least one of our assumptions is not justified.
Problem:
From an subjective point of view one can have different opinions
about these “explanations”:
– One believes that the deviation is due to the random error.
– Another claims that the model is wrong.
218
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
Solution
One should specify objective criteria such that one can make a
scientific decision. These criteria should be determined before any
predicted value realizes.
Then one cannot escape a potential falsification of a hypothesis af-
terwards. This makes a hypothesis scientific in the sense of Popper.
• Let’s be more precise:
How large is the probability that the estimated value βj corresponds
exactly to the true value βj if, as was shown in Section 4.3,
βj ∼ N
(βj, σ
2βj
)
219
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
and (βj − βj)/σβj∼ N (0, 1), or if σ
βjis estimated,
βj − βj
σβj
∼ tn−k−1 ?
• Alternative question:
How large is the probability that the true value βj lies in the interval
[βj − c · σβj
, βj + c · σβj
]
where c is given?
Note that the endpoints of the interval are random prior to obtaining
a sample. Its location is random through βj and its length is
random through σβj
This interval is the most well known example of an interval esti-
mator.
220
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
• Answer for given σβj
:
How large is the probability that the true value βj is contained in
the interval [βj − c · σβj
, βj + c · σβj
] where the value c is chosen
by you?
221
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
– It is 2Φ(c) − 1 since
P
(βj − cσ
βj≤ βj ≤ βj + cσ
βj
)= P
(−cσ
βj≤ βj − βj ≤ cσ
βj
)
= P
−c ≤
βj − βj
σβj
≤ c
= P
−c ≤
βj − βj
σβj
≤ c
= Φ(c) − Φ(−c)
= Φ(c) − (1 − Φ(c))
= 2Φ(c) − 1.
– Example: For c = 1.96 one obtains Φ(1.96) − Φ(−1.96) =
0.975 − 0.025 = 0.95:
222
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
The true value βj will be with 95% probability within the interval
βj ± c · σβj
. One also relates this probability to α by writing
0.95 = 1 − α. Thus one has α = 0.05.
• Answer for estimated σβj
: The true value βj lies in the interval
βj±c·σβj
with probability 1−α. Note, however, that for computing
the probability one has to use the tn−k−1 distribution since
P
(βj − cσ
βj≤ βj ≤ βj + cσ
βj
)= P
−c ≤
βj − βj
σβj
≤ c
.
• The interval
[βj − c · σβj
, βj + c · σβj
]
is called confidence interval. One says that the confidence in-
terval contains the true value with a probability of confidence of
223
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
(1 − α)100%. The value (1 − α) is also called confidence level
or coverage probability of the confidence interval.
• In practice one determines the confidence level 1−α and then com-
putes the value c using the appropriate distribution: either N (0, 1)
or tn−k−1.
• Note:
– The constant c corresponds to the (upper) critical value of a
two-sided test with significance level α.
– Since the confidence interval is a random interval, its location
and length is in general different for each sample.
– The larger (1 − α), the smaller α, the larger is the confidence
interval. In other words: the more you want to be on the safe
side, the larger the confidence interval becomes. Why?
224
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
– A two-sided t test and a confidence interval contain the same
amount of information. The null hypothesis of a two-sided t
test is rejected if and only if the value of the null hypothesis lies
outside the confidence interval. Draw a graph to make this clear.
– If keep drawing new samples from a population, how many con-
fidence intervals do not contain the true value on average?
• Trade Example Continued (from Section 4.4):
– Compute a 95% confidence interval for the elasticity βgdp of im-
ports with respect to gdp.
– From Section 4.4 it can be justified that MLR.1 to MLR.6 hold
and imports are normally distributed.
– Since σβgdp
has to be estimated, one has to use the t distribution
with n − k − 1 = 48 degrees of freedom. For a confidence level
225
Intensive Course in Econometrics — Section 4.5 — UR March 2009 — R. Tschernig
of 0.95 one obtains α = 0.05 and thus c = 2.0106 (e.g. by using
EViews scalar crit = @qtdist(1-0.05/2,52-3-1)).
– The relevant line of output was, see Section 4.4:Variable Coeff. Std.Err. t-Stat. Prob.
LOG(WDI_GDPUSDCR_O) 1.356566 0.128793 10.53290 0.0000
– Therefore the 95% confidence interval is given by
[βj − c · σβj
, βj + c · σβj
]
[1.3566 − 2.0106 · 0.1288 , 1.3566 + 2.0106 · 0.1288]
[1.0976 , 1.6156].
– The elasticity of imports with respect to gdp falls with 95% prob-
ability within the range between 1.0976 and 1.6156. Note that 1
is not included in the confidence interval. This reflects the test
result in Section 4.4 of rejecting H0 : βgdp = 1.
226
Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig
4.6 Testing a Single Linear Combination of Parameters
• Example: Cobb-Douglas production function
log Y = β0 + β1 log K + β2 log L + u,
where Y denotes output, K and L denote the production factors
capital and labor, respectively. Note that β1 and β2 are elasticities
here.
If the restriction β1 + β2 = 1 holds true, the production function
has constant returns to scale, e.g. a 1% increase of labor and capital
leads to a 1% increase of output on average.
227
Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig
For an empirical test of constant returns to scale, we employ the
following pair of hypotheses:
H0 : β1 + β2 = 1 versus H1 : β1 + β2 6= 1.
• How to construct the test statistic:
1. First, define auxiliary parameters θ and θ0, where
θ = β1 + β2, θ0 = 1,
or, equivalently
H0 : θ = θ0 versus H1 : θ 6= θ0.
228
Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig
2. Second, solve θ for one of the parameters βi, here β1
β1 = θ − β2
and insert it into the initial regression equation and reformulate
the latter to
log Y = β0 + (θ − β2) log K + β2 log L + u
log Y = β0 + θ log K + β2 (log L − log K)︸ ︷︷ ︸new variable
+u. (4.4)
Then estimate (4.4) and obtain the test statistic
tθ =θ − θ0
σθ
which can be directly calculated from the estimation of (4.4).
229
Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig
Example:
In a classical marketing model we regress (the natural logarithm of)sales (S) of a consumer good on (the natural logarithm of) this good’sprice (P ) as well as on (the natural logarithms of) cross prices (PK1,PK2) of competing goods. The following regression output is calcu-lated from the data:
Dependent Variable: log(S), Method: Least Squares, Obs: 6917
Variable Coeff. Std.Err. t-Stat. Prob.
C 4.407786 0.079559 55.40268 0.0000
LOG(P) -3.955281 0.068095 -58.08499 0.0000
LOG(P_K1) 0.710274 0.073912 9.609683 0.0000
LOG(P_K2) 1.154163 0.079815 14.46046 0.0000
R-squared 0.332264 Mean dependent var 2.824244
Adj. R-squared 0.331974 S.D. dependent var 1.250189
S.E. regression 1.021815 Akaike info crit. 2.881617
Sum squared resid 7217.910 Schwarz criterion 2.885573
F-statistic 1146.631 Prob(F-statistic) 0.000000
230
Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig
We wish to test the following statement: the cross price elasticities are
identical, keeping everything else fixed (ceteris paribus) (though the
competing goods come from different market segments).
• The initial hypotheses are given by
H0 : βK1 = βK2 versus H1 : βK1 6= βK2.
We reformulate them by re-parametrization according to
θ = βK1 − βK2, θ0 = 0
H0 : θ = 0 versus H1 : θ 6= 0.
• Thus, due to βK1 = θ + βK2, the initial regression model
ln(S) = β1 + β2 ln(P ) + βK1 ln(PK1) + βK2 ln(PK2) + u
can be rendered to
ln(S) = β1 + β2 ln(P ) + θ ln(PK1) + βK2(ln(PK2) + ln(PK1)) + u.
231
Intensive Course in Econometrics — Section 4.6 — UR March 2009 — R. Tschernig
• Given the estimates of the last regression
Dependent Variable: log(S), Method: Least Squares, Obs: 6917
Variable Coeff. Std.Err. t-Stat. Prob.
C 4.407786 0.079559 55.40268 0.0000
LOG(P) -3.955281 0.068095 -58.08499 0.0000
LOG(P_K1) -0.443889 0.112543 -3.944165 0.0001
LOG(P_K1)+LOG(P_K2) 1.154163 0.079815 14.46046 0.0000
R-squared 0.332264 Mean dependent var 2.824244
Adj. R-squared 0.331974 S.D. dependent var 1.250189
S.E. regression 1.021815 Akaike info crit. 2.881617
Sum squared resid 7217.910 Schwarz criterion 2.885573
F-statistic 1146.631 Prob(F-statistic) 0.000000
calculate t statistic ast =
−0.443889− 0
0.112543≈ −3.94.
For a given significance level of α = 0.05, the critical values are
-1.96 and 1.96. Thus, we have to reject H0.
Reading: Sections 4.3-4.4 in Wooldridge (2009).
232
Intensive Course in Econometrics — Section 4.7 — UR March 2009 — R. Tschernig
4.7 Jointly Testing Several Linear Combinations of
Parameters: The F Test
Some examples of possible restrictions within the MLR framework:
1. H0 : β1 = 3
2. H0 : β2 = βk
3. H0 : β1 = 1, βk = 0
4. H0 : β1 = β3, β2 = β4
5. H0 : βj = 0, j = 1, . . . , k
6. H0 : βj + 2βl = 1, βk = 2
We can already check case 1. and case 2. by applying t tests. For all
other cases we need the F test.
233
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
4.7.1 Testing of Several Exclusion Restrictions
Trade Example Continued (from Section 4.5):
Consider Model 4 in Section 3.5:
====================================================================
Dependent Variable: LOG_IMP, Method: Least Squares
Sample: 1 55 Included observations: 52
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C -13.02677 4.726426 -2.756157 0.0083
LOG(WDI_GDPUSDCR_O) 1.317565 0.129079 10.20740 0.0000
LOG(CEPII_DIST) -0.624909 0.542325 -1.152279 0.2550
CEPII_COMCOL_REV 3.351230 0.678912 4.936177 0.0000
CEPII_COMLANG_OFF 2.109579 1.319296 1.599018 0.1165
====================================================================
R-squared 0.714213 Mean dependent var 15.97292
Adjusted R-squared 0.689890 S.D. dependent var 2.613094
S.E. of regression 1.455167 Akaike info criterion 3.679330
Sum squared resid 99.52303 Schwarz criterion 3.866950
Log likelihood -90.66258 F-statistic 29.36447
Durbin-Watson stat 2.195759 Prob(F-statistic) 0.000000
234
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
Are the control variables common colonizer since 1945 (CEPII COMCOL REV)
and common official language (CEPII COMLANG OFF) really needed in the
specification of Model 4 ?
To put it more precisely, are both parameters of two variables mentioned
jointly significantly different from zero?
H0 : βcomcol rev = 0 and βcomlang off = 0
versus
H1 : βcomcol rev 6= 0 and/or βcomlang off 6= 0
235
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
How can one jointly test several hypotheses?
• Note that SSR decreases (or stays constant) with an additional re-
gressor.
⇒ Idea: Compare the SSR of a model on which the null hypotheses
are imposed (restricted model) with the SSR of another model that
does not impose the joint restrictions (unrestricted model).
• The estimation under H0 is easy: simply exclude all regressors from
the regression whose parameters under H0 are set to zero and re-
estimate the restricted model.
In case of Model 4 for the trade example the OLS estimates are for
the restricted model (that corresponds to Model 2 in Section 3.5):
236
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
====================================================================
Dependent Variable: LOG_IMP
Method: Least Squares
Sample: 1 55 Included observations: 52
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C 4.800950 3.497341 1.372743 0.1761
LOG(WDI_GDPUSDCR_O) 1.088546 0.137001 7.945508 0.0000
LOG(CEPII_DIST) -1.970804 0.480555 -4.101103 0.0002
====================================================================
R-squared 0.563937 Mean dependent var 15.97292
Adjusted R-squared 0.546138 S.D. dependent var 2.613094
S.E. of regression 1.760423 Akaike info criterion 4.024946
Sum squared resid 151.8554 Schwarz criterion 4.137518
Log likelihood -101.6486 F-statistic 31.68448
Durbin-Watson stat 2.117895 Prob(F-statistic) 0.000000
====================================================================
237
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
Results:
– The R2 of the unrestricted model is 0.7142 while the R2 of the
restricted model is (only) 0.5640.
– Correspondingly, the standard error of regression σ increases from
1.4551 to 1.7604.
– Are these changes large? It looks like that but what does “large”
really mean here?
– Note that all three model selection criteria, AIC, HQ, and SC,
“prefer” the unrestricted model, see Section 3.5. Will this finding
be confirmed by the test?
238
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
• In order to be able to use a statistic (a function that can be com-
puted from sample values) as a test statistic, one has to know its
probability distribution under the null hypothesis H0.
One can show (→ advanced econometrics course or Section 4.4
in Davidson & MacKinnon (2004)) that the following test statistic
follows an F distribution
F =(SSRH0
− SSRH1)/q
SSRH1/(n − k − 1)
∼ Fq,n−k−1.
Therefore this test is called F test and the test statistic is abbre-
viated as F statistic.
• Note that the F distribution has two different degrees of freedom,
q degrees of freedom for the random variable in the numerator, and
n−k−1 degrees of freedom for the random variable in denominator.
The value q contains the number of restrictions that are jointly
239
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
tested.
• Details of the F statistic:
– Its minimum is 0 since SSRH0≥ SSRH1
and SSRH1> 0. (There-
fore the F statistic cannot be normally distributed!)
– There is no upper bound.
• When should the joint null hypothesis be rejected?
– The larger the absolute difference between the SSRs of the re-
stricted and the unrestricted model, SSRH0− SSRH1
, the more
likely is a violation of the exclusion restrictions.
– However, be aware that absolute differences do not say much.
Why?
240
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
– It makes much more sense to consider the relative difference
between the SSRs. This is exactly what the F statistic does.
It scales the difference in SSRs by the SSR of the unrestricted
model. If the relative difference is large, then the joint null hy-
pothesis is likely to be violated.
– On the other hand, if the relative difference is small, then it is
likely that the excluded variables do not have any relevant impact
in the unrestricted model since they can be neglected without any
noticeable effect.
• Decision rule:
Reject H0 if the test statistic is larger than the critical value:
Reject H0 if F > c.
Thus, the critical region is (c,∞).
241
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
Calculation of the critical region:
For a given significance level α, the critical value c is implicitly
defined by the probability
P (F > c|H0) = α.
The corresponding value for c given α can be found in tables on
the F distribution, e.g. Table G.3 in Appendix G in Wooldridge
(2009) or be computed in EViews or Excel. (For the latter one has
(Finv(0.05;q;n-k-1) for alpha=0.05).
242
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
Trade Example Continued (from the beginning of this section):
• The joint null hypothesis contains two exclusion restrictions, thus
the degree of freedoms for the numerator are two, q = 2. The
degrees of freedom for the denominator correspond to the degrees
of freedom of Model 4, n − k − 1 = 52 − 4 − 1 = 47. Choosing
a significance value of α = 0.05, we check Table G.3 in Appendix
G in Wooldridge (2009) for the appropriate critical value. Listed
values are F2,40 = 3.23 and F2,60 = 3.15. While the former implies
a true significance level smaller than 0.05, the latter implies one
above 0.05. If one is interested in an exact critical value, one can
obtain it from Excel as Finv(0.05;2;47) = 3.1951.
• Collecting the SSRs from the regression outputs for Model 4 and
Model 2 at the beginning of the section, the F statistic can be
243
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
computed as
F =(151.855 − 99.523)/2
99.523/(47)= 12.3570.
Since
F = 12.3570 > 3.1951 = c,
reject H0 on a significance level of 5%.
• Check that the same decision holds for a significance level of 1%.
The two variables common colonizer since 1945 (CEPII COMCOL REV)
and common official language (CEPII COMLANG OFF) are statistically
significant at the 1% significance level and thus at least one of
the two variables has an impact on imports on the 5% as well
as on the 1% significance level.
244
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
Calculation of p-values for F statistics:
• In empirical work one is frequently interested in the largest signifi-
cance level for which the null hypothesis can be rejected given the
observed test statistic.
As explained in Section 4.1, this information is provided by the p-
value. Alternatively, it is the smallest significance level at which the
null cannot be rejected.
Given the significance level that was chosen prior to any calculations,
the null hypothesis is rejected if the p-value is smaller than the given
significance level α.
245
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
• Trade Example Continued: The p-value can be computed (in
Excel =FVERT(K10;2;47)= 4.87099E-05= 4.87099 ·10−05. The
p-value can also be calculated in EViews, see below.
Thus, there is extremely strong statistical evidence against the null
hypothesis.
Direct Calculation of the F statistic in EViews:
• In the Equation window one clicks on View and then on
Representations where one can read how the parameter num-
bering relates to the regressor variables.
• One again clicks on View and then on Coefficient Tests and
further on Wald-Coefficient Restrictions .... Then one
enters all restrictions into the opened EViews window Wald Test.
For the test of the joint significance of the control variables in the
246
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
trade example one types C(4)=0,C(5)=0 and confirms:
Wald Test:
Equation: EQ_LN_LN_COL_R_LANG_OFF
====================================================================
Test Statistic Value df Probability
====================================================================
F-statistic 12.35704 (2, 47) 0.0000
Chi-square 24.71407 2 0.0000
====================================================================
Null Hypothesis Summary:
====================================================================
Normalized Restriction (= 0) Value Std. Err.
====================================================================
C(4) 3.351230 0.678912
C(5) 2.109579 1.319296
====================================================================
Restrictions are linear in coefficients.
Note that the Chi-square test statistic is kind of a variant of the
F statistic which is useful in large samples. Here, it will not be
further discussed.
247
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
Remarks:
• One can, of course, test the simple null hypothesis with a two-sided
alternative
H0 : βj = 0 versus H1 : βj 6= 0
by means of an F test.
It holds that the square of a random variable X that follows a t
distribution with n − k − 1 degrees of freedom just corresponds to
a random variable that follows an F distribution with (1, n−k− 1)
degrees of freedom
X ∼ tn−k−1 =⇒ X2 ∼ F1,n−k−1.
Therefore, a two-sided t test and an F test lead to exactly the same
result for the pair of hypotheses above.
248
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
• It may happen that each regressor tested by itself is not statisti-
cally significant but if they are jointly tested they are statistically
significant (at the same significance level). This is a sign of mul-
ticollinearity between the regressors considered. Then, the given
sample size is only sufficient for providing statistical significance
jointly for both regressors. However, it is not sufficient for providing
statistical evidence for each regressor separately. In such cases you
may check the covariance between the parameter estimates that are
included in the test (in EViews View → Covariance Matrix).
• It may also happen that one variable is statistically significant but if
jointly tested with other variables it becomes insignificant. This can
happen if the other variables that are included in the joint hypothesis
are redundant in the population regression. In this case, the power of
a single hypothesis test is weakened by the other irrelevant variables.
249
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
• Thus, there is no general rule on whether to prefer joint or single
tests results.
• Trade Example Continued (from the middle of this section):
Comparing four different model specifications using model selection
criteria, see Section 3.5, HQ and AIC favor Model 4. Inspecting its
parameter estimates at the beginning of this section, one finds two
parameters to be statistically insignificant even at the 10% level:
βdistance and βcommon off language.
Why, then, was this Model 4 found to be best by HQ and AIC?
Answer:
The parameter estimators for βdistance and βcom. off. lang. might
be highly correlated so that only a joint impact is significant. One
reason could be that a lot of variation of distance can be explained
by common off. language, among other things. In this case, it
250
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
can be expected that if both parameters are jointly tested by means
of an F test, they are statistically significant. Test the pair of
hypotheses:
H0 : βdistance = 0 and βcomlang off = 0
H1 : βdistance 6= 0 and/or βcomlang off 6= 0
The EViews output is:
Wald Test:
Equation: EQ_LN_LN_COL_LANG
Test Statistic Value df Probability
F-statistic 4.749269 (2, 47) 0.0132
Chi-square 9.498538 2 0.0087
Thus, reject H0 at the 5% significance level. The previous conjecture
is statistically confirmed.
251
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
The effect of multicollinearity can nicely be seen in
-2
-1
0
1
2
3
4
5
6
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
C(5
)C(3)
Note that C(3) and C(5) correspond to βdistance and βcom. off. lang., respectively. The ellipse
is a generalization of confidence intervals to two dimensions. Thus, all points outside the ellipse
are joint null hypotheses that are rejected. Note that the origin also lies outside while the zero
is included in each one-dimensional confidence interval. (Get the graph in EViews via Views →
Coefficient Tests→ Confidence Ellipse.) More on confidence ellipses, e.g. in Davidson
& MacKinnon (2004).
252
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
• R2 version of the F statistic:
If a regression model contains a constant, then the decomposition
SSR = SST(1−R2) holds. Inserting each SSR into the F statistic
delivers
F =(R2
H1− R2
H0)/q
(1 − R2H1
)/(n − (k + 1))∼ Fq,n−k−1.
Note:
– SST is canceled if the dependent variable y is the same under H0
and H1 as, for example, in case of exclusion restrictions. However,
this is not always true if general linear restrictions are tested.
– There can be slight differences between both versions of the F
statistic due to rounding errors.
253
Intensive Course in Econometrics — Section 4.7.1 — UR March 2009 — R. Tschernig
Overall F Test
Standard software packages (such as EViews) include in their OLS
output for the multiple regression model y = β0+β1x1+. . .+βkxk+u
the F statistic and its p-value for the pair of hypotheses:
“None of the (non-constant) regressors has impact on the dependent
variable and thus the corresponding parameters are all zero.”
H0 : β1 = · · · = βk = 0 (and y = β0 + u)
H1 : βj 6= 0 for at least one j = 1, . . . , k.
If H0 is not rejected, this possibly indicates that
- all regressors are possible badly/wrongly chosen,
- or at least a substantial number of regressors has no impact on y,
- or too many regressors were considered for given sample size n.
This test is a first rough check for the validity of the model.
254
Intensive Course in Econometrics — Section 4.7.2 — UR March 2009 — R. Tschernig
4.7.2 Testing of Several General Linear Restrictions
• Generalization of the F test for exclusion restrictions.
• Works equivalently by computing the relative change in the SSRs.
• R2 version cannot be used in this case!
Examples of possible pairs of hypotheses:
H0 : β2 = β3 = 1 versus H1 : β2 6= 1 and/or β3 6= 1,
H0 : β1 = 1, βj = 2βl versus H1 : β1 6= 1 and/or βj 6= 2βl.
Trade Example Continued (from previous subsection):
• One may conjecture that due to the multicollinearity between the es-
timates for distance and common official language the impact
of distance might be underestimated in absolute value while the
255
Intensive Course in Econometrics — Section 4.7.2 — UR March 2009 — R. Tschernig
impact of language is zero. Thus, consider the pair of hypotheses:
H0 : βdistance = −1 and βcomlang off = 0
H1 : βdistance 6= −1 and/or βcomlang off 6= 0
In order to compute the SSR under H0 impose these restrictions on
the regression as
log(imports) − (−1) log(distance) = β0 + βgdp log(gdp) + βcomcolcepii comcol rev + u
256
Intensive Course in Econometrics — Section 4.7.2 — UR March 2009 — R. Tschernig
The EViews output is:
Dependent Variable: LOG_IMP+LOG(CEPII_DIST)
Method: Least Squares
Sample: 1 55, Included observations: 52
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C -10.49513 3.196796 -3.283015 0.0019
LOG(WDI_GDPUSDCR_O) 1.344714 0.122454 10.98137 0.0000
CEPII_COMCOL_REV 3.215739 0.611625 5.257696 0.0000
====================================================================
R-squared 0.720026 Mean dependent var 24.24218
Adjusted R-squared 0.708599 S.D. dependent var 2.713964
S.E. of regression 1.465041 Akaike info criterion 3.657604
Sum squared resid 105.1709 Schwarz criterion 3.770176
Log likelihood -92.09771 F-statistic 63.00827
Durbin-Watson stat 2.053004 Prob(F-statistic) 0.000000
====================================================================
257
Intensive Course in Econometrics — Section 4.7.2 — UR March 2009 — R. Tschernig
This allows to compute the F statistic
F =
(SSRH0
− SSRH1
)/q
SSRH1/(n − k − 1)
=(105.1709 − 99.5230)/2
99.5230/47= 1.333618 < c = 3.195.
Alternatively, one can conduct this general F test directly in EViews
via the Wald test option C(3)=-1,C(5)=0:
Wald Test:
Equation: EQ_LN_LN_COL_LANG
====================================================================
Test Statistic Value df Probability
====================================================================
F-statistic 1.333603 (2, 47) 0.2733
Chi-square 2.667205 2 0.2635
→ The claim that a “common official language has no impact
while distance has an elasticity of −1” cannot be rejected at
any reasonable significance level since the p-value is about 27%.
258
Intensive Course in Econometrics — Section 4.8 — UR March 2009 — R. Tschernig
4.8 Reporting Regression Results
In general, empirical researchers investigate a number of different spec-
ifications of regression functions.
In order to make visible how robust the conclusions are with respect
to model choice it is good practice to report the results of the most
important specifications so that each reader can evaluate the findings
in her own manner.
This is most easily achieved by summarizing the relevant results in a
table, see the example below.
259
Intensive Course in Econometrics — Section 4.8 — UR March 2009 — R. Tschernig
For each specification a minimum number of results should be:
• OLS parameter estimates βj of the regression parameters βj, j =
0, 1, . . . , k (plus variable names),
• Standard error of βj, σβj
,
• Number of observations n,
• R2 and adjusted R2,
• Standard error of regression or estimated variance of the regression
error σ2.
If possible, one should also report
• Model selection criteria such as AIC, HQ or SC,
• Sum of squared residuals (SSR).
Based on the SSRs one can easily compute F tests.
260
Intensive Course in Econometrics — Section 4.8 — UR March 2009 — R. Tschernig
Trade Example Continued:
Dependent Variable: ln(imports to Kazakhstan)
Independent Variables/Model (1) (2) (3) (4)
constant -3.461 4.801 -9.579 -13.027
(3.280) (3.497) (4.274) (4.726)
ln(gdp) 0.770 1.089 1.357 1.318
(0.130) (0.137) (0.129) (0.129)
ln(distance) — -1.971 -1.144 -0.625
(0.481) (0.441) (0.542)
common ”colonizer” since 1945 — — 3.127 3.351
(0.675) (0.679)
common official language — — — 2.110
(1.319)
Number of observations 52 52 52 52
R2 0.414 0.564 0.699 0.714
Standard error of regression 2.020 1.760 1.479 1.455
Sum of squared residuals 203.980 151.855 104.937 99.523
AIC 4.2816 4.0249 3.6938 3.6793
HQ 4.3103 4.0681 3.7514 3.7513
SC 4.3566 4.1375 3.8439 3.8670
Reading: Sections 4.5-4.6 in Wooldridge (2009).
261
5 Multiple Regression Analysis: Asymptotics
The assumption of a normal (or gaussian) distribution MLR.6 is fre-
quently violated in empirical practice. How can we then proceed to
calculate test statistics or confidence intervals?
5.1 Large Sample Distribution of the Mean Estimator
• Example: Testing the mean of hourly wages: the empirical distri-
bution is steep at the left and skewed to the right (as is typical for
262
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
prices and wages which are not generated additively).
0
20
40
60
80
100
120
140
0 2 4 6 8 10 12 14 16 18 20 22 24
Series: WAGE
Sample 1 526
Observations 526
Mean 5.896103
Median 4.650000
Maximum 24.98000
Minimum 0.530000
Std. Dev. 3.693086
Skewness 2.007325
Kurtosis 7.970083
Jarque-Bera 894.6195
Probability 0.000000
• Examples of random variables with right-skewed distribution:
– A χ2(m) distributed random variable X is defined as the sum of
m squared i.i.d. standard normal random variables
X =
m∑
j=1
u2j, uj ∼ i.i.d.N (0, 1).
263
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
(Details on the χ2 distribution can be found in Appendix B in
Wooldridge (2009).)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 1 2 3 4 5 6 7 8
Chi-Quadrat(1)
Density of the χ2(1) distribution.
Moments of a χ2(1) distributed random variable:
E[X ] = E[u2]
= V ar(u) + E[u]2 = 1,
V ar(X) = E[X2]− E[X ]2 = E[u4] − 12 = 2,
u2 − 1√2
=X − 1√
2∼ (0, 1).
264
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
Note that for a standard normal random variable we have E[u4] =
3.
– Linear functions of a χ2(1) distributed random variable, e.g.
yi = ν + σyu2
i − 1√2
, ui ∼ i.i.d.N (0, 1). (5.1)
Moments:
E[yi] = ν,
V ar(yi) = V ar
(σy
u2i − 1√
2
)= σ2
yV ar
(u2
i√2
)= σ2
y.
265
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
• Expectation and variance of mean estimators
µn =1
n
n∑
i=1
yi.
E[µn] =1
n
n∑
i=1
E[yi] = ν,
V ar (µn) =1
n2
n∑
i=1
V ar(yi) =V ar(yi)
n=
σ2y
n,
sd (µn) =σy√n.
In this example the estimator is unbiased and the variance decreases
with rate n as sample size increases.
266
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
• Consistency of an estimator θn:
For every ǫ > 0 and δ > 0 there exists an N such that
P(|θn − θ| < ǫ
)> 1 − δ for all n > N.
Alternatively:
– limn→∞ P(|θn − θ| < ǫ
)= 1,
– plim θn = θ,
– θnp−→ θ.
The “plim” notation stands for probability limit. This concept
of convergence is usually denoted as convergence in probability or
(weak) consistency. Some notes on calculation rules for the “plim”
are given in Appendix C.3 in Wooldridge (2009).
267
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
A consistent estimator θn has the properties
– limn→∞ E[θn
]= θ and
– limn→∞ V ar(θn
)= 0.
If one of these conditions fails to hold, the estimator is called in-
consistent. In general:
• Weak law of large numbers (WLLN):
For yi ∼ i.i.d. with −∞ < E[yi] = µ < ∞, the mean estimator
µn = 1n
∑ni=1 yi is weakly consistent, that is
µnp−→ µ.
• Then we can consistently estimate the variance of i.i.d. random
variables wi ∼ i.i.d.(µw, σ2w) with σ2 = 1
n
∑ni=1(wi−µw)2. Why?
268
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
• But how can we derive the asymptotic probability distri-
bution of the mean estimator µn?
• Monte Carlo Simulation (MC):
The EViews-program mcarlo1 est mu.prg allows us to iteratively
draw R = 1000 samples of size n with elements y1, . . . , yn, where
yi ∼ i.i.d.(ν, σ2y) with ν = 3 and σ2
y = 1 and yi is generated from
(5.1). One frequently calls (5.1) the data generating process
(DGP). For every sample yr1, y
r2, . . . , y
rn generated in this way,
where r = 1, . . . , 1000, the mean estimator µr = 1n
∑ni=1 yr
i is
calculated and stored. After all iterations, a histogram is calculated
based on R estimates µ1, µ2, . . . , µR.
269
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
First, the results for the simulated moments:
Elements in sample average std. deviation true std. deviation
n of estimated means of MC DGP
10 2.993908 0.298423 0.3162278
30 3.003103 0.182773 0.1825742
50 3.001263 0.142403 0.1414214
100 3.001637 0.098750 0.1
500 2.999619 0.046355 0.04472136
1000 3.000503 0.031865 0.03162278
– The true moments are accurately estimated,
– and we can observe how the LLN works.
270
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
n = 10 n = 30
0
10
20
30
40
50
60
70
2.50 2.75 3.00 3.25 3.50 3.75 4.00
Series: MU_HAT
Sample 1 1000
Observations 1000
Mean 2.993908
Median 2.966680
Maximum 4.061612
Minimum 2.394145
Std. Dev. 0.298423
Skewness 0.621673
Kurtosis 3.250171
Jarque-Bera 67.02061
Probability 0.0000000
20
40
60
80
100
120
140
2.6 2.8 3.0 3.2 3.4 3.6 3.8
Series: MU_HAT
Sample 1 1000
Observations 1000
Mean 3.003103
Median 2.991643
Maximum 3.795529
Minimum 2.562306
Std. Dev. 0.182773
Skewness 0.572018
Kurtosis 3.450607
Jarque-Bera 62.99434
Probability 0.000000
n = 50 n = 100
0
40
80
120
160
2.6 2.8 3.0 3.2 3.4 3.6
Series: MU_HAT
Sample 1 1000
Observations 1000
Mean 3.001263
Median 2.992116
Maximum 3.650012
Minimum 2.595097
Std. Dev. 0.142403
Skewness 0.509651
Kurtosis 3.745789
Jarque-Bera 66.46577
Probability 0.0000000
20
40
60
80
100
120
2.750 2.875 3.000 3.125 3.250 3.375
Series: MU_HAT
Sample 1 1000
Observations 1000
Mean 3.001637
Median 2.994952
Maximum 3.434677
Minimum 2.721445
Std. Dev. 0.098750
Skewness 0.383232
Kurtosis 3.350859
Jarque-Bera 29.60699
Probability 0.000000
n = 500 n = 1000
0
20
40
60
80
100
2.90 2.95 3.00 3.05 3.10 3.15 3.20
Series: MU_HAT
Sample 1 1000
Observations 1000
Mean 2.999619
Median 2.998812
Maximum 3.201741
Minimum 2.861536
Std. Dev. 0.046355
Skewness 0.198532
Kurtosis 3.293122
Jarque-Bera 10.14917
Probability 0.0062540
20
40
60
80
100
120
140
2.90 2.95 3.00 3.05 3.10
Series: MU_HAT
Sample 1 1000
Observations 1000
Mean 3.000503
Median 2.999499
Maximum 3.108631
Minimum 2.905122
Std. Dev. 0.031865
Skewness 0.170954
Kurtosis 2.950809
Jarque-Bera 4.971676
Probability 0.083256
271
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
• Results for simulated distributions:
– Right-skewness decreases with increase in sample size n.
– A test for normality (Jarque-Bera-Test): null hypothesis of normal
distribution cannot be rejected for large n.
Theoretical explanation of this phenomenon: a cental limit
theorem holds under certain (rather weak) conditions that is one of
the most important tools in statistics!
• Central limit theorem (CLT):
For yi ∼ i.i.d.(µ, σ2) with 0 < σ2 < ∞, µn = 1n
∑ni=1 yi is
asymptotically normally distributed:√
n (µn − µ)d−→ N (0, σ2).
.
272
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
– Interpretation: the larger the number of sample elements n, the
more precise is the approximation of the exact distribution of µn
(see the MC results) by an exactly specified normal distribution.
Hence the label large sample distribution.
– But how good is the asymptotic approximation for a given sample
size n?
∗ The CLT is not informative on this question, though we may
get an answer by conducting MC simulations for certain cases
or by using rather involved finite sample statistics.
∗ Experience: as the distribution of the yi approaches the nor-
mal distribution, smaller and smaller n suffice for a very good
approximation. In some cases even n = 30 is enough.
Do some experiments using mcarlo1 est mu.prg, and grad-
273
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
ually increase the degrees of freedom r of the χ2 distribution
(and observe that the skewness decreases in this process)!
– Alternative notations (Φ(z) is the cumulative distribution func-tion of the standard normal distribution):
√n
(µn − µ
σ
)d−→ N(0, 1) (5.2)
P
(√n
(µn − µ
σ
)≤ z
)−→ Φ(z) (5.3)
µn − µ
σ/√
n
approx∼ N(0, 1) (5.4)
µnapprox∼ N
(µ,
σ2
n
)(5.5)
Notation: the mean estimator is asymptotically normally dis-
tributed.
274
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
• In large samples the standardized mean estimator is approximated
by a standard normal distribution. Then, due to (5.4)
wi ∼ i.i.d.N (µ, σ2) t(w1, . . . , wn) = µ−µσµ
∼ N (0, 1)
wi ∼ i.i.d.(µ, σ2) t(w1, . . . , wn) = µ−µσµ
approx∼ N (0, 1)
and it can be shown that
wi ∼ i.i.d.N (µ, σ2) t(w1, . . . , wn) = µ−µσµ
∼ tn−k−1
wi ∼ i.i.d.(µ, σ2) t(w1, . . . , wn) = µ−µσµ
approx∼ N (0, 1)
and we get the following (very convenient) result: the (small sam-
ple) theory of t tests and confidence intervals for the mean
estimator of i.i.d. variables holds approximately in large
(enough) samples.
275
Intensive Course in Econometrics — Section 5.1 — UR March 2009 — R. Tschernig
• Hence the test results in our empirical exercise are still approximately
valid!
• How about this concept of validity in a regression context?
276
Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig
5.2 Large Sample Inference for the OLS Estimator
• The OLS-estimator
β = β +(X′X
)−1X′u = β + Wu
depends on X or W. Hence, for the OLS estimator to be consistent
and asymptotically normal, certain conditions must hold for the re-
gressor variables as n → ∞. One of these conditions is that for all
i, l = 0, 1, . . . , k we have plim 1n
∑ni=1 xijxil = E[xjxl] = aij or
1
nX′X
p−→ A. (5.6)
277
Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig
• Asymptotic normality of the OLS estimator
All necessary conditions for asymptotic normality are fulfilled if the
standard assumptions MLR.1-MLR.5 hold true. Then (see a sketch
of proof in Appendix E.4 in Wooldridge (2009)):
√n(β − β
)d−→ N
(0, σ2A
). (5.7)
278
Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig
For the (asymptotic) distributions of the t statistics we get:
MLR.1-MLR.6 t (X,y) =βj−βjσ
βj
∼ N (0, 1)
MLR.1-MLR.5 t (X,y) =βj−βjσ
βj
approx∼ N (0, 1)
and it can be shown that
MLR.1-MLR.6 t (X,y) =βj−βj
σ/(SSTj(1−R2j))
∼ tn−k−1
MLR.1-MLR.5 t (X,y) =βj−βj
σ/(SSTj(1−R2j))
approx∼ N (0, 1)
279
Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig
A frequent observation from many Monte Carlo simulations and
empirical practice is that
– for small n one proceeds as in the case of normally distributed
errors and uses the critical values of the t distribution:
MLR.1-MLR.5 t (X,y) =βj−βj
σ/(SSTj(1−R2j))
approx∼ tn−k−1
– and analogously for the F statistic the critical values are deter-
mined from the F distribution.
– Note again: the critical values are valid only approximately, not
exactly. Analogously, the p-values (calculated in EViews) are valid
only approximately!
280
Intensive Course in Econometrics — Section 5.2 — UR March 2009 — R. Tschernig
• Conclusion:
– For the calculation of test statistics and confidence intervals (ex-
ception: forecast intervals) we proceed as hitherto. However, all
statistical results hold only as an approximation.
– If the assumption of homoskedasticity is violated, even the asymp-
totic results do not hold and models for heteroskedastic errors are
required (with stronger assumptions for LLN and CLT), see Chap-
ter 8.
Reading: Chapter 5 and Appendix C.3 in Wooldridge (2009).
281
6 Multiple Regression Analysis: Interpretation
6.1 Level and Log Models
Recall section 2.6 on level-level, level-log, log-level, log-log models. All
the results remain valid in the multiple regression model in a ceteris-
paribus analysis.
282
Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig
6.2 Data Scaling
•• Scaling the dependent variable:
– Initial model:
y = Xβ + u.
– Variable transformation: y∗i = a · yi with scale factor a.
→ New, transformed regression equation:
ay︸︷︷︸y∗
= X aβ︸︷︷︸β∗
+ au︸︷︷︸u∗
y∗ = Xβ∗ + u∗ (6.1)
283
Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig
– OLS-estimator for β∗ in (6.1):
β∗ =(X′X
)−1X′y∗
= a(X′X
)−1X′y = aβ.
– Error variance:
V ar(u∗) = V ar(au) = a2V ar(u) = a2σ2I.
– Variance-covariance matrix:
V ar(β∗) = σ∗2 (X′X
)−1= a2σ2 (X′X
)−1= a2V ar(β)
– t statistic:
t∗ =β∗
j − 0
σβ∗
j
=aβj
aσβj
= t.
284
Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig
• Scaling explanatory variables:
– Variable transformation: X∗ = X · a. New regression equation:
y = Xa · a−1β + u = X∗β∗ + u. (6.2)
– OLS estimator for β∗ in (6.2):
β∗ =(X∗′X∗
)−1X∗′y =
(a2X′X
)−1X′ay
= a−2a(X′X
)−1X′y = a−1β.
– Result: The sole magnitude of βj is no indicator for the relevance
of the impact of the j-th regressor. One always has to take the
scale of the variable into account.
– Example: In Section 2.3 a simple level-level model was estimated
for imports on gdp. The parameter estimate βgdp = 2.16 · 10−05
appears very small. However, taking into account that gdp is
285
Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig
measured in dollars, this estimate is not small. Simply rescale
gdp to millions of dollars with a = 10−6 and you obtain β∗gdp =
106 · 2.16 · 10−05 = 21.6.
• Scaling of variables in logarithmic form
just alters the constant β0 since ln y∗ = ln ay = ln a + ln y.
• Standardized Coefficients:
We just saw that it is not possible to deduce the relevance of ex-
planatory variables from the magnitude of the corresponding coef-
ficient. This is possible, however, if the regression is suitably stan-
dardized.
286
Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig
Deviation: First, consider the following sample regression model
yi = β0 + xi1β1 + . . . + xikβk + ui, (6.3)
and its representation after taking means over all n observations
y = β0 + x1β1 + . . . + xkβk. (6.4)
Then we calculate the difference between (6.4) and (6.3)
(yi − y) = (xi1 − x1)β1 + . . . + (xik − xk)βk + ui. (6.5)
Finally, we divide equation (6.5) by the estimated standard deviation
of y, say σy, and expand every term on the right-hand side by
the estimated standard deviations of the corresponding explanatory
variables, say σxj , j = 1, . . . , k,
(yi − y)
σy=
(xi1 − x1)
σy· σx1
σx1
β1 + . . . +(xik − xk)
σy· σxk
σxk
βk +ui
σy.
287
Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig
Simple algebra gives(yi − y)
σy︸ ︷︷ ︸zi,y
=(xi1 − x1)
σx1︸ ︷︷ ︸zi,x1
σx1
σyβ1
︸ ︷︷ ︸b1
+ . . . +(xik − xk)
σxk︸ ︷︷ ︸zi,xk
σxk
σyβk
︸ ︷︷ ︸bk
+ui
σy︸︷︷︸ξi
.
In the literature the transformed variables zi,y and zi,x1, . . . , zi,xk
are usually denoted as z-scores.
In compact notation we get
zi,y = zi,x1b1 + · · · + zi,xk
bk + ξi.
where bj are denoted as standardized coefficients (or simply
beta coefficients).
The magnitudes of the standardized coefficients can be compared
to each other. Hence, the explanatory variable with the largest
parameter βj has the relatively largest impact on the dependent
variable.
288
Intensive Course in Econometrics — Section 6.2 — UR March 2009 — R. Tschernig
Interpretation: a one standard deviation increase in xj changes y
by bj standard deviations.
Standardized coefficients can be calculated in SPSS (see Example
6.1 in Wooldridge (2009)).
289
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
6.3 Dealing with Nonlinear or Transformed Regressors
• Further details on logarithmic variables:
Consider the following log-level regression model
ln y = β0 + β1x1 + β2x2 + u, (6.6)
where x2 is a dummy variable (it is either equal to 0 or 1).
– How can we determine the exact impact of x2, that is, how
should we interpret β2? From (6.6) follows
y = eln y = eβ0+β1x1+β2x2+u = eβ0+β1x1+β2x2 · eu
and for the conditional expectation
E[y|x1, x2] = eβ0+β1x1+β2x2 · E[eu|x1, x2]. (6.7)
290
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
Inserting the two possible values of x2 into (6.7) delivers
E[y|x1, x2 = 0] = eβ0+β1x1 · E[eu|x1, x2]
E[y|x1, x2 = 1] = eβ0+β1x1 · E[eu|x1, x2] · eβ2
= E[y|x1, x2 = 0] · eβ2.
– Thus, if E[eu|x1, x2] is constant (for x2), the relative mean
change of the dependent variable with respect to a unit
change in x2 is equal to
∆E[y|x1, x2]
E[y|x1, x2 = 0]=
E[y|x1, x2 = 1] − E[y|x1, x2 = 0]
E[y|x1, x2 = 0]
=E[y|x1, x2 = 0] · eβ2 − E[y|x1, x2 = 0]
E[y|x1, x2 = 0]
= eβ2 − 1.
291
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
This implies
%∆E[y|x1, x2] = 100(eβ2 − 1
).
– In the general case of k regressors:
%∆E[y|x1, x2, . . . , xk] = 100(eβj∆xj − 1
). (6.8)
Obviously (6.8) represents the exact partial effect, whereas
the interpretation as an approximate semi-elasticity may be rather
crude in some cases.
– Trade Example Continued (from Section 4.8 and specifically
from Section 4.4):
For Model 3 we obtained the sample regression
LOG(TRADE_0_D_O) = -9.5789 + 1.3566*LOG(WDI_GDPUSDCR_O)
- 1.1443*LOG(CEPII_DIST) + 3.1265*CEPII_COMCOL_REV + RESIDUAL
Recall that CEPII COMCOL REV denotes a dummy variable.
292
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
∗ The approximate interpretation of βcomcol is that 1 unit change
changes imports by 100βcomcol = 313%.
∗ The exact partial effect is 100(eβcomcol − 1
)= 2179.5%, al-
most 7 times as large!
∗ Of course, the difference between the approximate and exact
effect are smaller if β is closer to zero.
293
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
• Models with quadratic regressors:
– For example, consider the multiple regression
y = β0 + β1x1 + β2x2 + β3x22 + u.
The marginal effect of a change in x2 on the conditional expec-
tation of y is equal to
∂E[y|x1, x2]
∂x2= β2 + 2β3x2.
Therefore a change of ∆x2 in x2 changes ceteris paribus the
dependent variable y on average by
(β2 + 2β3x2)∆x2.
Clearly, this effect depends on the level of x2 (and an interpreta-
tion of β2 alone does not make any sense!).
294
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
– In some empirical applications regressor variables are considered
using quadratics and logarithms, in order to approximate a non-
linear regression function.
Example: we can approximate non-constant elasticities using the
model
ln y = β0 + β1x1 + β2 ln x2 + β3(ln x2)2 + u.
Then the elasticity of y with respect to x2 equals
β2 + 2β3 ln x2
and is constant if and only if β3 = 0.
295
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
– Trade Example Continued:
So far we only considered multiple regression models that are
log-log or log-level in the original variables.
Now consider a further specification for modeling imports where
a log regressors also enters as square.
Model 5:
ln(imports) = β0 + β1 ln(gdp) + β2 (ln(gdp))2 + β3 ln(distance)
+ β4 com colonizer + β5 com language + u.
Using the previous result, the elasticity of imports with respect
to gdp is
β1 + 2β2 ln(gdp). (6.9)
296
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
Estimation of Model 5 delivers:====================================================================
Dependent Variable: LOG(TRADE_0_D_O)
Method: Least Squares
Sample: 1 55, Included observations: 52
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C -85.82647 27.89947 -3.076276 0.0035
LOG(WDI_GDPUSDCR_O) 7.087930 2.186465 3.241731 0.0022
(LOG(WDI_GDPUSDCR_O))^2 -0.111982 0.042366 -2.643219 0.0112
LOG(CEPII_DIST) -0.761431 0.513375 -1.483187 0.1448
CEPII_COMCOL_REV 3.911289 0.673603 5.806523 0.0000
CEPII_COMLANG_OFF 2.312292 1.244898 1.857414 0.0697
====================================================================
R-squared 0.751895 Mean dependent var 15.97292
Adjusted R-squared 0.724927 S.D. dependent var 2.613094
S.E. of regression 1.370499 Akaike info criterion 3.576394
Sum squared resid 86.40031 Schwarz criterion 3.801537
Log likelihood -86.98624 Hannan-Quinn criter. 3.662709
F-statistic 27.88113 Durbin-Watson stat 2.094393
Prob(F-statistic) 0.000000
====================================================================
297
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
Comparing the AIC, HQ, and SC of Model 5 with those of Models
1 to 4, see Section 4.4, one finds that Model 5 exhibits the lowest
values throughout. In addition, the (approximate) p-value of β2
is 0.011 and the quadratic term is statistically significant at the
5% significance level.
This provides also evidence for a nonlinear elasticity. Inserting
the parameter estimates into (6.9) delivers
η(gdp) = 7.088 − 0.224 ln(gdp).
One may plot the elasticity η(gdp) versus gdp for each observedvalue of gdp. In EViews this can be done by a little program
’ Model 5
equation model5.ls log(trade_0_d_o) c log(wdi_gdpusdcr_o)
(log(wdi_gdpusdcr_o))^2 log(cepii_dist) cepii_comcol_rev cepii_comlang_off
’ generate elasticities for gdp
genr elast_gdp = model5.@coefs(2) + 2*eq_ln_sq_ln_col.@coefs(3)*log(wdi_gdpusdcr_o)
group group_elast_gdp (wdi_gdpusdcr_o) elast_gdp ’ create group
group_elast_gdp.scat ’make scatter plot
298
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
0.0
0.4
0.8
1.2
1.6
2.0
2.4
0.0E+00 4.0E+12 8.0E+12 1.2E+13
WDI_GDPUSDCR_O
EL
AS
T_
GD
P
The import elasticity with respect to gdp is much larger for small
economies in terms of gdp than for large economies.
Warning: Nonlinearities are sometimes due to missing variables.
Can you think of any control variables left out that should be
included in Model 5?
299
Intensive Course in Econometrics — Section 6.3 — UR March 2009 — R. Tschernig
• Interactions:
Example:
y = β0 + β1x1 + β2x2 + β3x2x1 + u.
The marginal effect of a change in x2 is given by
∆E[y|x1, x2] = (β2 + β3x1)∆x2.
Hence, in this case the marginal effect also depends on the level of
x1!
• Selection between non-nested models of the same dependent
variable:
Definition: Non-nested means, that one model cannot be repre-
sented as a special case of the other model.
Example: the choice between models y = β0 +β1x1+β2x21+u and
y = β0 + β1 ln x1 + u can be based on SC or AIC (or R2). Why?
300
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
6.4 Regressors with Qualitative Data
Dummy variables or binary variables
A binary variable can take exactly two different values and allows to
describe two qualitatively different states.
Examples: female vs. male, employed vs. unemployed, etc.
• In general these values are coded as 0 and 1. This allows for a very
easy and straightforward interpretation. Example:
y = β0 + β1x1 + β2x2 + · · · + βk−1xk−1 + δD + u,
where D equals 0 or 1.
301
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
• Interpretation (well known by now):
E[y|x1, . . . , xk−1, D = 1] − E[y|x1, . . . , xk−1, D = 0] =
β0 + β1x1 + β2x2 + · · · + βk−1xk−1 + δ
− (β0 + β1x1 + β2x2 + · · · + βk−1xk−1) = δ
The coefficient of a dummy variable is equal to an intercept shift of
size δ in the case D = 1. All slope parameters βi, i = 1, . . . , k− 1
remain unchanged.
• Wage Example Continued:
– Question of interest: Do females earn significantly less than males?
– Data: a sample of n = 526 U.S. workers obtained in 1976.
(Source: Examples 2.4, 7.1 in Wooldridge (2009)).
302
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
∗ wage in dollars per hour,
∗ educ: years of schooling of each worker,
∗ exper: years of professional experience,
∗ tenure: years of employment in current firm,
∗ female: dummy=1 if female, dumm=0 otherwise.
====================================================================
Dependent Variable: ln(WAGE), Method: Least Squares, Sample: 1 526
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C 0.416691 0.098928 4.212066 0.0000
FEMALE -0.296511 0.035805 -8.281169 0.0000
EDUC 0.080197 0.006757 11.86823 0.0000
EXPER 0.029432 0.004975 5.915866 0.0000
EXPER^2 -0.000583 0.000107 -5.430528 0.0000
TENURE 0.031714 0.006845 4.633035 0.0000
TENURE^2 -0.000585 0.000235 -2.493365 0.0130
====================================================================
R-squared 0.440769, Adjusted R-squared 0.434304, Akaike info crit. 1.017438
Mean dependent var 1.623268, S.D. dependent var 0.531538, Schwarz criterion 1.074200
S.E. of regression 0.399785, Sum squared resid 82.95065
F-statistic 68.17659, Prob(F-statistic) 0.000000
303
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
– Note: In order to be able to interpret the coefficients of dummy
variables one has to know the reference group. The reference
group is given by the group for which the dummy equals zero.
– Prediction: How much earns a woman with 12 years of school-
ing, 10 years of experience, and 1 year tenure? (Or course, you
can insert any other numbers here.)
E[ln(wage)|female = 1, educ = 12, exper = 10, tenure = 1]
= 0.4167 − 0.2965 · 1 + 0.0802 · 12 + 0.0294 · 10
− 0.0006 · (102) + 0.0317 · 1 − 0.0006 · (12)
= 1.35
Thus, the expected hourly wage is approximately exp(1.35) =
3.86 US dollar.
– We already know that in case of a log-level model the expected
304
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
value of y given the regressors x1, x2 is given by
E[y|x1, x2] = eβ0+β1x1+β2x2 · E[eu|x1, x2].
The true value of E[eu|x1, x2] depends on the probability distri-
bution of u.
It holds that: If u is normally distributed with variance σ2, then
E[eu|x1, x2] = eE[u|x1,x2]+σ2/2.
The precise prediction is therefore
E[y|x1, x2] = eβ0+β1x1+β2x2+E[u|x1,x2]+σ2/2.
305
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
The exact prediction of the desired hourly wage is
E[wage|female = 1, educ = 12, exper = 10, tenure = 1]
= exp(0.4167 − 0.2965 · 1 + 0.0802 · 12 + 0.02943 · 10
− 0.0006 · (102) + 0.0317 · 1 − 0.0006 · (12) + 0.39982/2)
= 4.18.
Thus, the precise value of the mean hourly wage for the specified
person is about 4.18$ and thus 30 Cent larger than the approxi-
mate value.
– The parameter δ corresponds to the difference between the log
income of female and male workers keeping everything else con-
stant (e.g. years of schooling, experience, etc.).
Question: How large is the exact wage difference?
Answer: 100(e−0.2965 − 1) = 34.51%.
306
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
Note that ceteris paribus analysis is much more informative than
the comparison of the unconditional means of male and female
wages. Assuming normal errors one has
E[wagef ] − E[wagem]
E[wagem]=
eE[ln(wagef)]+σ2
f/2 − eE[ln(wagem)]+σ2m/2
eE[ln(wagem)]+σ2m/2
.
Inserting estimates one obtains
e1.416+0.442/2 − e1.814+0.532/2
e1.814+0.532/2= −0.3570,
which, by the way, is very similar to inserting estimates for (E[wagef ]−E[wagem])/E[wagem] leading to -0.3538.
Females earn 36% less than males if one does not control for
other effects.
307
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
Several subgroups
• Example: A worker is female or male and married or unmarried
=⇒ 4 subgroups:
1. female and not married
2. female and married
3. male and not married
4. male and married
How to proceed:
– Choose one subgroup to be the reference group, for example:
female and not married
– Define dummy variables for the other subgroups. For example, in
EViews with the Command “generate series” (genr):
308
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
∗ FEMMARR = FEMALE*MARRIED
∗ MALESING = (1-FEMALE) * (1-MARRIED)
∗ MALEMARR = (1-FEMALE) * MARRIED
Dependent Variable: ln(WAGE), Method: Least Squares, Sample: 1 526
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C 0.211028 0.096644 2.183548 0.0294
FEMMARR -0.087917 0.052348 -1.679475 0.0937
MALESING 0.110350 0.055742 1.979658 0.0483
MALEMARR 0.323026 0.050114 6.445759 0.0000
EDUC 0.078910 0.006694 11.78733 0.0000
EXPER 0.026801 0.005243 5.111835 0.0000
EXPER^2 -0.000535 0.000110 -4.847105 0.0000
TENURE 0.029088 0.006762 4.301613 0.0000
TENURE^2 -0.000533 0.000231 -2.305552 0.0215
====================================================================
R-squared 0.460877, Adjusted R-squared 0.452535, Akaike info crit. 0.988423
Mean dependent var 1.623268, S.D. dependent var 0.531538, Schwarz criterion 1.061403
S.E. of regression 0.393290, Sum squared resid 79.96799
F-statistic 55.24559, Prob(F-statistic) 0.000000
309
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
Examples for Interpretation:
– Married women earn about 8.8% less than unmarried women.
However, this effect is only significant at the 10% significance
level (for a two-sided test).
– The wage difference between married men and women is about
32.3 − (−8.8) = 41.1%. A t test cannot be directly applied.
(Solution: Choose a new reference group with one of the two
subgroups as the reference group.)
Remarks:
– Using dummies for all subgroups is not recommended since then
differences with respect to the ref. group cannot be tested directly.
– If you use dummies for all subgroups you cannot include a con-
stant. Otherwise MLR.3 is violated. Why?
310
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
• Using ordinal information in regression
Example: Ranking of universities
The quality difference between ranks 1 and 2 and ranks 11 and 12,
respectively, may be dramatically different. Hence, ranks should not
be used as regressors. Instead, we have to assign a dummy variable
Dj for all but one (the “reference category”) of the universities,
inducing several new parameters which have to be estimated.
Note: Then, the coefficient of a dummy variable Dj denotes the
intercept shift between university j and the reference university.
Sometimes there are too many ranks and hence too many parame-
ters to be estimated. Then it proves useful to group the data, e.g.,
ranks 1-10, 11-20, etc.
311
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
Interactions and Dummy Variables
• Interactions between dummy variables:
– May be used to define sub-groups (e.g., married males).
– Note that a useful interpretation and comparison of sub-group
effects crucially depends on a correct setup of dummies. For
example, let us include the dummies male and married and
their interaction in a wage equation
y = β0 + δ1male + δ2married + δ3male · married + . . .
Then, a comparison between male-married and male-single is
given by
E[y|male = 1,married = 1] − E[y|male = 1,married = 0]
= β0 + δ1 + δ2 + δ3 + . . . − (β0 + δ1 + . . .) = δ2 + δ3
312
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
• Interactions between dummies and quantitative variables:
– Allows different slope parameters for different groups
y = β0 + β1D + β2x1 + β3(x1 · D) + u.
Note: here β1 denotes the difference between both groups only
for the case x1 = 0. If x1 6= 0, then this difference is equal to
E[y|D = 1, x1] − E[y|D = 0, x1]
= β0 + β1 · 1 + β2x1 + β3(x1 · 1) − (β0 + β2x1)
= β1 + β3x1
Even if β1 is negative, the total effect may be positive!
313
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
– Wage Example Continued:Dependent Variable: ln(WAGE), Method: Least Squares, Sample: 1 526
====================================================================
Variable Coefficient Std. Error t-Statistic Prob.
====================================================================
C 0.388806 0.118687 3.275892 0.0011
FEMALE -0.226789 0.167539 -1.353643 0.1764
EDUC 0.082369 0.008470 9.724919 0.0000
EXPER 0.029337 0.004984 5.885973 0.0000
EXPER^2 -0.000580 0.000108 -5.397767 0.0000
TENURE 0.031897 0.006864 4.646956 0.0000
TENURE^2 -0.000590 0.000235 -2.508901 0.0124
FEMALE*EDUC -0.005565 0.013062 -0.426013 0.6703
====================================================================
R-squared 0.440964, Adjusted R-squared 0.433410, Akaike info crit. 1.020890
Mean dependent var 1.623268, S.D. dependent var 0.531538, Schwarz criterion 1.085761
S.E. of regression 0.400100, Sum squared resid 82.92160
F-statistic 58.37084, Prob(F-statistic) 0.000000
Are returns to schooling sensitive to gender?
314
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
• Testing for differences between groups
– Can be done with F tests.
– Chow Test: Allows to test whether there is a difference between
groups in a sense that there may be group specific intercepts
and/or (at least one) slope parameter.
Illustration:
y = β0 + β1D + β2x1 + β3(x1 · D) + β4x2 + β5(x2 · D) + u.
(6.10)
Pair of hypotheses:
H0 :β1 = β3 = β5 = 0 vs.
H1 :β1 6= 0 and/or β3 6= 0 and/or β5 6= 0
315
Intensive Course in Econometrics — Section 6.4 — UR March 2009 — R. Tschernig
Application of F tests:
∗ Estimate the regression equation for each group l
y = β0l + β2lx1 + β4lx2 + u, l = 1, 2,
and calculate SSR1 and SSR2.
∗ Then estimate this regression for both groups together and
calculate SSR.
∗ Compute the F statistic
F =SSR − (SSR1 + SSR2)
SSR1 + SSR2
n − 2(k + 1)
(k + 1)
where the degrees of freedom for the F distribution are equal
to k + 1 and n − 2(k + 1).
Reading: Chapter 6 (without Section 6.4) and Chapter 7 (without
Sections 7.5 und 7.6) in Wooldridge (2009).
316
7 Multiple Regression Analysis: Prediction
7.1 Prediction and Prediction Error
• Consider the multiple regression model y = Xβ + u, i.e.
yi = β0 + β1xi1 + · · · + βkxik + ui, 1 ≤ i ≤ n.
• We search for a predictor y0 for y0 given x01, . . . , x0k.
• Define the prediction error
y0 − y0.
317
Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig
• We assume that MLR.1 to MLR.5 hold for the prediction sample
(x0, y0). Then
y0 = β0 + β1x01 + · · · + βkx0k + u0 (7.1)
and
E[u0|x01, . . . , x0k] = 0,
so that
E[y0|x01, . . . , x0k] = β0 + β1x01 + · · · + βkx0k = x′0β,
where x′0 = (1, x01, . . . , x0k).
MLR.4 guarantees that for known parameters the predictions are un-
biased. Then, the prediction is, loosely speaking, correct on average
(if averaged over many samples).
It can be shown that the conditional expectation is optimal in the
sense of minimizing the mean squared prediction error.
318
Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig
• In practice, the true regression coefficients βj, j = 0, . . . , k, are
unknown. Inserting the OLS estimators βj gives
y0 = E[y0|x01, . . . , x0k] = β0 + β1x01 + · · · + βkx0k.
Using compact notation the prediction rule is:
y0 = x′0β (7.2)
• This prediction rule only makes sense if (y0,x′0) belongs to the
population as well. Otherwise the population regression model is
not valid for (y0,x′0) and the prediction based on the estimated
version possibly strongly misleading.
319
Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig
• General decomposition of the prediction error
u0 = y0 − y0 (7.3)
= (y0 − E[y0|x0])︸ ︷︷ ︸unavoidable error v0
+(E[y0|x0] − x′0β
)︸ ︷︷ ︸
possible specification error
+(x′0β − x′0β
)
︸ ︷︷ ︸estimation error
– If MLR.1 and MLR.4 are correct for the population and if the
prediction sample also belongs to the population, then the spec-
ification error is zero. Then v0 = u0 in (7.1).
– If the estimator is consistent, plim β = β, then the estimation
error becomes negligible in large samples.
320
Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig
– Using the OLS estimator, the estimation error is
x′0β − x′0β = x′0(β − β)
= x′0β − x′0((X′X)−1X′y
)
= x′0β − x′0(β + (X′X)−1X′u
)
= −x′0(X′X)−1X′u. (7.4)
Thus, the estimation error only depends on the estimation sample.
– The OLS prediction error under MLR.1 to MLR.5 is given by
(using (7.3) and (7.4)):
u0 = u0 + x′0(β − β)
= u0 − x′0(X′X)−1X′u. (7.5)
321
Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig
• Variance of the prediction error:
– Extension of Assumption MLR.2 (Random Sampling):
u0 and u are uncorrelated.
– Conditional variance of (7.5) given X and x0:
V ar(u0|X,x0) = V ar(u0|X,x0) + V ar(x′0(β − β)|X,x0
)
= σ2 + x′0V ar(β − β|X)x0
= σ2 + x′0σ2(X′X)−1x0
or
V ar(u0|X,x0) = σ2(1 + x′0(X
′X)−1x0
). (7.6)
– Relevant in practice: Estimated variance of the prediction
error
V ar(u0|X,x0) = σ2(1 + x′0(X
′X)−1x0
).
322
Intensive Course in Econometrics — Section 7.1 — UR March 2009 — R. Tschernig
• Prediction interval: A prediction interval is (given an a priori
chosen confidence probability 1 − α) for the multiple regression
model given by[y0 − tn−k−1
√V ar(u0|X,x0) , y0 + tn−k−1
√V ar(u0|X,x0)
].
Notes:
– Derivation and structure are analogous to the case of confidence
intervals for the parameter estimates.
– Prediction intervals are in contrast to confidence intervals even
in large samples only valid if the prediction errors are normally
distributed. This is because there is no averaging of the true
prediction error u0 as it occurs for β − β = Wu due to the
central limit theorem.
323
Intensive Course in Econometrics — Section 7.2 — UR March 2009 — R. Tschernig
7.2 Statistical Properties of Linear Predictions
Apparently the prediction rule is linear (in y) since
y0 = x′0β = x′0(X′X)−1X′y.
Gauss-Markov property of linear prediction
If β is the BLU estimator for β, then
y0 = x′0β
is the BLU prediction rule. Among all linear prediction rules with a
mean prediction error of zero it exhibits the smallest prediction error
variance.
Reading: Section 6.4 in Wooldridge (2009).
324
8 Multiple Regression Analysis: Heteroskedasticity
• In this chapter Assumptions MLR.1 through MLR.4 continue to
hold.
• If MLR.5 fails to hold such that
V ar(ui|xi1, . . . , xik) = σ2i 6= σ2, i = 1, . . . , n,
the errors of the regression model exhibit heteroscedasticity. More
precisely (instead of MLR.5) we have
325
Intensive Course in Econometrics — Chapter 8 — UR March 2009 — R. Tschernig
– Assumption GLS.5: Heteroskedasticity
V ar(ui|xi1, . . . , xik) = σ2i (xi1, . . . , xik)
= σ2h(xi1, . . . , xik) = σ2hi, i = 1, . . . , n.
The error variance of the i-th sample observation σ2i is a function
h(·) of the regressors.
• Examples:
– The variance of net rents depends on the size of the flat.
– The variance of consumption expenditures depends on the level
of income.
– The variance of log hourly wages depends on years of education.
326
Intensive Course in Econometrics — Chapter 8 — UR March 2009 — R. Tschernig
• The covariance matrix of the errors of the regression is given
by:
V ar(u|X) = E[uu′|X] =
σ2h1 0 · · · 0
0 σ2h2 · · · 0
... ... . . . ...
0 0 · · · σ2hn
= σ2
h1 0 · · · 0
0 h2 · · · 0
... ... . . . ...
0 0 · · · hn
︸ ︷︷ ︸Ψ
.
Thus, we have
y = Xβ + u, V ar(u|X) = σ2Ψ, (8.1)
which will be referred to as the original model in matrix nota-
tion.
327
Intensive Course in Econometrics — Section 8.1 — UR March 2009 — R. Tschernig
• When estimating models with heteroskedastic errors three cases
have to be distinguished:
1. Function h(·) is known, see Section 8.3.
2. Function h(·) is only partially known, see Section 8.4.
3. Function h(·) is completely unknown, see Section 8.2.
8.1 Consequences of Heteroskedasticity for OLS
• The OLS estimator is unbiased and consistent.
• Variance of the OLS estimator in the presence of heteroskedas-
tic errors (compare Section 3.4.2):
328
Intensive Course in Econometrics — Section 8.1 — UR March 2009 — R. Tschernig
From β − β = (X′X)−1X′u it can be derived that
V ar(β|X) = E
[(β − β
)(β − β
)′|X]
= E[(X′X)−1X′uu′X(X′X)−1|X
]
= (X′X)−1X′ E[uu′|X
]︸ ︷︷ ︸
σ2Ψ
X(X′X)−1
= (X′X)−1X′σ2ΨX(X′X)−1. (8.2)
• Note that with homoskedastic errors one has Ψ = I. Then (8.2)
yields the usual OLS covariance matrix, namely σ2(X′X)−1.
• If heteroskedasticity is present, using the usual covariance
matrix σ2(X′X)−1 is misleading and leads to faulty inference.
• The problem with using (8.2) directly is that Ψ is unknown. The
next section introduces an appropriate estimator.
329
Intensive Course in Econometrics — Section 8.2 — UR March 2009 — R. Tschernig
• Even if Ψ is known, OLS is not the best linear unbiased estimator,
and thus not efficient. One has to use the GLS estimator instead,
see Section 8.3.
8.2 Heteroskedasticity-Robust Inference after OLS
• Derivation of heteroskedasticity-robust standard errors
Let x′i = (1, xi1 . . . , xik). Note that the middle term in the variance-
covariance matrix (8.2) with dimension (k + 1) × (k + 1) can be
written as
X′σ2ΨX =
n∑
i=1
σ2hixix′i.
Because E[u2i |X] = σ2hi, one can estimate σ2hi by the “one ob-
330
Intensive Course in Econometrics — Section 8.2 — UR March 2009 — R. Tschernig
servation average” u2i . Of course this is not a good estimator but for
the present purpose it is doing well enough. Since ui is not known,
one takes the residual ui.
Hence one can estimate the covariance matrix (8.2) of the OLS
estimator in presence of heteroskedasticity by
V ar(β|X) = (X′X)−1
n∑
i=1
u2ixix
′i
(X′X)−1. (8.3)
• Comments:
– Standard errors obtained from (8.3) are called heteroskedasticity-
robust standard errors or also White standard errors named
after Halbert White, an econometrician at the University of Cal-
ifornia in San Diego.
331
Intensive Course in Econometrics — Section 8.2 — UR March 2009 — R. Tschernig
– For single βj heteroskedasticity-robust standard errors can be
smaller or larger than the usual OLS standard errors.
– If heteroskedasticity-robust standard errors are used, it can be
shown that the OLS estimator β has no longer a known finite
sample distribution. However, it is asymptotically normally
distributed. Thus, critical values and p-values remain approxi-
mately valid if (8.3) is used.
– The OLS estimator with White standard errors is unbiased and
consistent since MLR.1 to MLR.4 are unaffected by heteroskedas-
ticity.
– However, the OLS estimator is not efficient. Efficient estima-
tors will be presented in the next sections.
332
Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig
8.3 The General Least Squares (GLS) Estimator
• Original model (8.1):
yi = β0 + β1xi1 + . . . + βkxik + ui, (8.4)
V ar(ui|xi1, . . . , xik) = σ2h(xi1, . . . , xik) = σ2hi.
• Basic idea: Weighted estimation of (8.4):
Transformation of the initial model to a model that satisfies all
assumptions, including MLR.5. This is achieved by kind of stan-
dardizing the regression error ui. This amounts to dividing ui and
thus the whole regression equation (8.4) by the square root of hi:
yi√hi︸︷︷︸
y∗i
= β01√hi︸︷︷︸
x∗i0
+β1xi1√hi︸︷︷︸
x∗i1
+ . . . + βkxik√hi︸︷︷︸
x∗ik
+ui√hi︸︷︷︸
u∗i
.
333
Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig
The resulting model is
y∗i = β0x∗i0 + β1x
∗i1 + . . . + βkx
∗ik + u∗i . (8.5)
Note: For the transformed error u∗i we have
V ar(u∗i |xi1, . . . , xik) = V ar
(ui√hi
∣∣∣∣xi1, . . . , xik
)
= E
[u2
i
hi
∣∣∣∣∣xi1, . . . , xik
]
=1
hiE[u2
i |xi1, . . . , xik] =1
hiσ2hi = σ2.
Result: We have transformed the original regression (8.4) in such
a way that the homoskedasticity assumption MLR.5 holds for the
resulting regression model (8.5).
334
Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig
• Therefore the OLS estimator based on the transformed model (8.5)
has all desirable properties: BLU (best linear unbiased).
• The OLS estimator of the transformed model (8.5) is based on the
minimization of a weighted sum of squared residualsn∑
i=1
(yi − β0 − β1xi1 − . . . − βkxik)2/hi.
Therefore, it is called a weighted least squares (WLS) procedure.
Note in its current form it requires that h(·) is known.
• The transformed model does not contain a constant term if√
hi is
not identical to one of the regressors in model (8.4).
• Next we derive the transformed model in matrix notation.
335
Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig
• Explicit statement of y∗, X∗, and u∗ in matrix notation:
y∗1y∗2...
y∗n
︸ ︷︷ ︸y∗
=
h−1/21 0 · · · 0
0 h−1/22 · · · 0
... ... . . . ...
0 0 · · · h−1/2n
︸ ︷︷ ︸P
y1
y2
...
yn
︸ ︷︷ ︸y
x∗10 x∗11 · · · x∗1kx∗20 x∗21 · · · x∗2k... ... ...
x∗n0 x∗n1 · · · x∗nk
︸ ︷︷ ︸X∗
= P ·
1 x11 · · · x1k
1 x21 · · · x2k... ... ...
1 xn1 · · · xnk
︸ ︷︷ ︸X
,
u∗1u∗2...
u∗n
︸ ︷︷ ︸u∗
= P ·
u1
u2
...
un
︸ ︷︷ ︸u
336
Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig
• For the transformation matrix P it holds that
P′P = Ψ−1
and hence
E[uu′|X] = σ2Ψ = σ2(P′P)−1.
• Therefore, the transformed model (8.5) in matrix notation is
given by
Py = PXβ + Pu,
or
y∗ = X∗β + u∗, E[u∗(u∗)′|X∗] = σ2I. (8.6)
• Obviously (8.6) is obtained by multiplying the original model (8.1)
y = Xβ + u by the transformation matrix P from the left.
• What is the explicit formula for the OLS estimator in terms of the
transformed model (8.6) and the original model (8.1)?
337
Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig
GLS (generalized least squares) estimator
• OLS estimation of (8.6) yields
βGLS =(X∗′X∗)−1
X∗′y∗
=((PX)′PX
)−1(PX)′Py
=(X′P′PX
)−1X′P′Py
and therefore
βGLS =(X′Ψ−1X
)−1X′Ψ−1y. (8.7)
βGLS in (8.7) is called the GLS estimator.
In case of heteroskedasticity Ψ is a diagonal matrix and each of the
n observations is weighted by 1/√
hi.
338
Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig
• Properties for known h(·):Under MLR.1 to MLR.4 and GLS.5 the GLS-estimator βGLS
– is unbiased and consistent,
– is BLUE (best linear unbiased), and thus efficient,
– has variance-covariance matrix V ar(βGLS|X) = σ2(X′Ψ−1X
)−1,
– is unbiased and consistent even if Ψ is misspecified since Ψ is a
function of X and not of u and thus
E[βGLS − β|X] =(X′Ψ−1X
)−1X′Ψ−1E[u|X] = 0.
As a consequence, OLS is inefficient since OLS and GLS are both
linear estimators. OLS variances are larger than or equal to those
of the GLS estimator. This can be shown using matrix algebra.
• Analogously to MLR.6 in Section 4.2 above, we assume
339
Intensive Course in Econometrics — Section 8.3 — UR March 2009 — R. Tschernig
– Assumption GLS.6: Normal Distribution
ui|xi ∼ N (0, σ2hi), i = 1, . . . , n,
which, together with MLR.2 (Random Sampling) implies the
multivariate normal distribution
u|X ∼ N(0, σ2Ψ
).
Note GLS.6 implies that ui given xi is independently but not iden-
tically distributed since the variance changes with i.
All test statistics based on the transformed model (8.6) and ap-
propriately modified for the original model (8.1) exhibit the exact
distributions of Chapter 4 (normal, t, F ).
• Frequent problem in practice: hi is not known. In this case,
the feasible GLS estimator has to be used −→ Case 2.
340
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
8.4 Feasible Generalized Least Squares (FGLS)
• In general, the variance function hi is not known and has to be
estimated. Frequently neither the relevant factors nor the functional
relationship are known.
• Hence, one needs a specification that flexibly captures a large range
of possibilities, e.g.
hi = h(xi1, . . . , xik) = exp (δ1xi1 + . . . + δkxik)
and thus
V ar(ui|xi1, xi2, . . . , xik) = σ2hi = σ2 exp (δ1xi1 + . . . + δkxik) .
Remark: On pp. 282, Wooldridge (2009) considers in hi additionally
the factor exp δ0. As this factor is constant, it can also be captured
by σ2.
341
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
• How can one estimate the unknown parameters δ1, . . . , δk?
Standardizing ui delivers vi = ui/(σ√
hi) with E[vi|X] = 0 and
V ar(vi|X) = 1. Therefore ui = σ√
hivi and
u2i = σ2hiv
2i , i = 1, . . . , n.
Taking logarithms leads to
ln u2i = ln σ2 + ln hi + ln v2
i
= ln σ2 + ln exp (δ1xi1 + · · · + δkxik) + ln v2i
= ln σ2 + E[ln v2i ]︸ ︷︷ ︸
α0
+δ1xi1 + · · · + δkxik + ln v2i − E[ln v2
i ]︸ ︷︷ ︸ei
ln u2i = α0 + δ1xi1 + · · · + δkxik + ei. (8.8)
For the regression equation (8.8) the assumptions MLR.1-MLR.4
are satisfied. Hence, the OLS estimator for δj is unbiased and
consistent.
342
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
In practice, the u2i ’s in the variance regression (8.8) are replaced
by the squared OLS residuals u2i ’s from the sample regression y =
Xβ+ u of (8.1). The resulting δj’s are used to get the fitted values
hi’s which are inserted into the GLS estimator (8.7) in step II.
• Outline of the FGLS-method:
Step I
a) Regress y on X and compute the residual vector u by OLS
estimation of the original specification (8.1).
b) Calculate ln u2i , i = 1, . . . , n, that are used as regressand in
the variance regression (8.8).
c) Estimate the variance regression (8.8) by OLS.
d) Compute hi = exp(δ1xi1 + · · · + δkxik
), i = 1, . . . , n.
343
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
Step II
The FGLS estimator βFGLS is obtained analogously to the
GLS procedure. The original regression (8.1) is multiplied from
the left with the matrix
P =
h−1/21 · · · 0
... . . . ...
0 · · · h−1/2n
.
This delivers a variant of the transformed regression
y# = X#β + u#. (8.9)
Hence, OLS estimation of (8.9) leads to the FGLS estimator
βFGLS =(X′Ψ
−1X)−1
X′Ψ−1
y, (8.10)
with Ψ−1
= P′P.
344
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
• Estimation properties of the FGLS estimators:
– They are consistent, that is, they converge in probability to the
true parameters for n → ∞plim βFGLS = β.
– The FGLS estimator is asymptotically efficient: For a cor-
rectly specified hi and a sufficiently large sample, the FGLS esti-
mator is preferable to the OLS estimator as the former one has a
lower estimation-variance. (This is plausible, as FGLS also uses
information on the functional form of the heteroskedasticity while
OLS with heteroskedasticity-robust standard errors does not.)
– If the variance function hi is misspecified, then the FGLS esti-
mator is inefficient.
– Be aware that there may be considerable differences between the
345
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
FGLS estimates and the OLS estimates.
• Comparing OLS with heteroskedasticity-robust standard
errors and FGLS
– If you know something about the variance function hi, then
FGLS is preferable. If you have no idea about it, then OLS with
heteroskedasticity-robust standard errors may be better.
– It is always a good idea to run an OLS regression also with
heteroskedasticity-robust standard errors in order to see
whether the significance of parameters depends on the presence
of heteroskedasticity.
– Since any estimator taking into account heteroskedasticity should
be avoided if there is no heteroskedasticity, one should test for
the presence of heteroskedasticity, see Section 9.2.
346
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
• Trade Example Continued
– Consider Model 5 of Section 6.3 and compare OLS estimates,
FGLS estimates, and OLS estimates with heteroskedasticity-robust
standard errors.
– EViews program to run OLS, FGLS with both steps, and OLS
with White standard errors, and scatter plots of residuals against
fitted values for both estimators.
’EViews program for FGLS estimation, Chapter 8 Heteroskedasticity
’RT, 2009_02_26
’requires workfile IMPORTS_KAZAKHSTAN_2004
’ define variables
’ define log of dependent variable
genr log_imp = log(trade_0_d_o)
’ define group of regressors (right hand side variables)
’ Model 5
group rhs_model5_base c log(wdi_gdpusdcr_o) (log(wdi_gdpusdcr_o))^2 log( cepii_dist)
cepii_comcol_rev cepii_comlang_off
347
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
’ potential regressors to add
group rhs_model5_add log(cepii_area_o) log(weo_pop_o) cepii_comlang_ethno_rev
cepii_col45_rev cepii_contig
’ rhs_model5_add (can be added)
group rhs_model5 rhs_model5_base
’ Step I: OLS regression
’ ols regression
equation eq_ols_model5.ls log_imp rhs_model5
’ compute residuals
eq_ols_model5.makeresids res_ols_model5
’ compute fitted values
eq_ols_model5.fit fit_ols_model5
’ plot residuals versus fitted log imports in order to check for heteroskedasticity
group group_ols_res fit_ols_model5 res_ols_model5
group_ols_res.scat
’Step II: FGLS regression
’ square residuals and take logs
genr ln_u_hat_sq = log(res_ols_model5^2)
’ estimate variance equation
348
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
equation eq_h_model5.ls ln_u_hat_sq rhs_model5
’ predicted squared residuals
eq_h_model5.fit ln_u_hat_sq_hat
’ compute exponential of fitted values of variance regression
genr h_hat = exp(ln_u_hat_sq_hat)
’ estimate FGLS using w=h_hat^(-1/2)
equation eq_fgls_model5.ls(w=h_hat^(-1/2)) log_imp rhs_model5
’ compute fitted values based on FGLS
eq_fgls_model5.fit fit_fgls_model5
’ compute residuals based on FGLS
eq_fgls_model5.makeresids res_fgls_model5
’ standardize residuals with weight function
series res_fgls_model5_star = res_fgls_model5*h_hat^(-1/2)
’ plot residuals versus fitted
group group_fgls_res fit_fgls_model5 res_fgls_model5_star
group_fgls_res.scat
’ OLS regression with heteroskedasticity-robust standard errors
equation eq_white_model5.ls(h) log_imp rhs_model5
349
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
– OLS output with usual standard errors
====================================================================
Dependent Variable: LOG_IMP
Method: Least Squares
Sample: 1 55; Included observations: 52
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C -85.82647 27.89947 -3.076276 0.0035
LOG(WDI_GDPUSDCR_O) 7.087930 2.186465 3.241731 0.0022
(LOG(WDI_GDPUSDCR_O))^2 -0.111982 0.042366 -2.643219 0.0112
LOG(CEPII_DIST) -0.761431 0.513375 -1.483187 0.1448
CEPII_COMCOL_REV 3.911289 0.673603 5.806523 0.0000
CEPII_COMLANG_OFF 2.312292 1.244898 1.857414 0.0697
====================================================================
R-squared 0.751895 Mean dependent var 15.97292
Adjusted R-squared 0.724927 S.D. dependent var 2.613094
S.E. of regression 1.370499 Akaike info criterion 3.576394
Sum squared resid 86.40031 Schwarz criterion 3.801537
Log likelihood -86.98624 Hannan-Quinn criter. 3.662709
F-statistic 27.88113 Durbin-Watson stat 2.094393
Prob(F-statistic) 0.000000
====================================================================
350
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
– FGLS - Step I: estimate variance regression (8.8)
====================================================================
Dependent Variable: LN_U_HAT_SQ
Method: Least Squares
Sample: 1 55, Included observations: 52
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C 10.63404 44.84623 0.237122 0.8136
LOG(WDI_GDPUSDCR_O) -0.848029 3.514572 -0.241290 0.8104
(LOG(WDI_GDPUSDCR_O))^2 0.004631 0.068099 0.068007 0.9461
LOG(CEPII_DIST) 0.863174 0.825210 1.046006 0.3010
CEPII_COMCOL_REV -0.946104 1.082764 -0.873786 0.3868
CEPII_COMLANG_OFF 1.187134 2.001077 0.593248 0.5559
====================================================================
R-squared 0.177909 Mean dependent var -0.847962
Adjusted R-squared 0.088551 S.D. dependent var 2.307505
S.E. of regression 2.202971 Akaike info criterion 4.525657
Sum squared resid 223.2417 Schwarz criterion 4.750801
Log likelihood -111.6671 Hannan-Quinn criter. 4.611972
F-statistic 1.990976 Durbin-Watson stat 1.711027
Prob(F-statistic) 0.097787
====================================================================
351
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
– Estimate FGLS - Step II: estimate (8.10)====================================================================
Dependent Variable: LOG_IMP
Method: Least Squares
Sample: 1 55, Included observations: 52
Weighting series: H_HAT^(-1/2)
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C -88.19314 22.59938 -3.902459 0.0003
LOG(WDI_GDPUSDCR_O) 7.425860 1.675365 4.432383 0.0001
(LOG(WDI_GDPUSDCR_O))^2 -0.117133 0.030951 -3.784479 0.0004
LOG(CEPII_DIST) -1.107230 0.347434 -3.186879 0.0026
CEPII_COMCOL_REV 3.704807 0.678132 5.463250 0.0000
CEPII_COMLANG_OFF 2.012321 0.989031 2.034640 0.0477
====================================================================
Weighted Statistics
====================================================================
R-squared 0.780089 Mean dependent var 16.88216
Adjusted R-squared 0.756186 S.D. dependent var 10.32249
S.E. of regression 1.034215 Akaike info criterion 3.013329
Sum squared resid 49.20159 Schwarz criterion 3.238472
Log likelihood -72.34654 Hannan-Quinn criter. 3.099643
352
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
F-statistic 32.63518 Durbin-Watson stat 2.577604
Prob(F-statistic) 0.000000
====================================================================
Unweighted Statistics
====================================================================
R-squared 0.746834 Mean dependent var 15.97292
Adjusted R-squared 0.719316 S.D. dependent var 2.613094
S.E. of regression 1.384409 Sum squared resid 88.16300
Durbin-Watson stat 2.055925
====================================================================
353
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
– OLS with heteroskedasticity-robust standard errors====================================================================
Dependent Variable: LOG_IMP
Method: Least Squares
Sample: 1 55, Included observations: 52
White Heteroskedasticity-Consistent Standard Errors & Covariance
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C -85.82647 28.84426 -2.975513 0.0046
LOG(WDI_GDPUSDCR_O) 7.087930 2.230037 3.178391 0.0026
(LOG(WDI_GDPUSDCR_O))^2 -0.111982 0.041483 -2.699469 0.0097
LOG(CEPII_DIST) -0.761431 0.456362 -1.668480 0.1020
CEPII_COMCOL_REV 3.911289 0.777904 5.027986 0.0000
CEPII_COMLANG_OFF 2.312292 0.789404 2.929161 0.0053
====================================================================
R-squared 0.751895 Mean dependent var 15.97292
Adjusted R-squared 0.724927 S.D. dependent var 2.613094
S.E. of regression 1.370499 Akaike info criterion 3.576394
Sum squared resid 86.40031 Schwarz criterion 3.801537
Log likelihood -86.98624 Hannan-Quinn criter. 3.662709
F-statistic 27.88113 Durbin-Watson stat 2.094393
Prob(F-statistic) 0.000000
====================================================================
354
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
– Diagnostic plots: (standardized) residuals against fitted values
-4
-3
-2
-1
0
1
2
3
10 12 14 16 18 20 22
FIT_OLS_MODEL5
RE
S_
OL
S_
MO
DE
L5
-4
-3
-2
-1
0
1
2
3
4
8 10 12 14 16 18 20 22
FIT_FGLS_MODEL5
RE
S_
FG
LS
_M
OD
EL
5_
ST
AR
355
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
– Output table for Model 4 and Model 5 using various
estimators (compare Section 4.8):
356
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
Dependent Variable: ln(imports to Kazakhstan)
Independent Variables/Model (4)-OLS (5)-OLS (5)-FGLS
constant -13.027 -85.827 -88.193
(4.726) (27.900) (22.599)
[4.778] [28.844]
ln(gdp) 1.318 7.088 7.426
(0.129) (2.187) (1.675)
[0.158] [2.230]
(ln(gdp))2 — -0.112 -0.117
(0.042) (0.031)
[0.042]
ln(distance) -0.625 -0.761 -1.107
(0.542) (0.513) (0.347)
[0.438] [0.456]
common ”colonizer” since 1945 3.351 3.911 3.705
(0.679) (0.674) (0.678)
[0.782] [0.778]
common official language 2.110 2.312 2.012
(1.319) (1.245) (0.989)
[1.020] [0.789]
Number of observations 52 52 52
R2 0.714 0.752 0.747
Standard error of regression 1.455 1.371 1.384
Sum of squared residuals 99.523 86.40 88.16
AIC 3.6793 3.576
HQ 3.7513 3.663
SC 3.8670 3.802
Notes: OLS or FGLS standard errors in paren-
theses, White standard errors in brackets
357
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
– Results and Interpretation:
∗ OLS and FGLS parameter estimates for ln(distance) and com.
official language are quite different: the FGLS estimates im-
ply less impact of language, more impact of distance. In addi-
tion, log(distance) is statistically significant in the FGLS esti-
mate while insignificant in the OLS estimate with heteroskedas-
ticity-robust standard errors. This makes the FGLS estimates
more plausible.
∗ When taking into account heteroskedasticity, based on FGLS
there is no insignificant parameter at the 5% significance level
and based on heteroskedasticity-robust OLS standard errors
only log(distance) is insignificant.
∗ Comparing the usual OLS standard errors and White standard
errors shows that the multicollinearity problem for the param-
358
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
eter estimates of distance and language decreases.
∗ Inspecting the scatter plots of OLS and standardized FGLS
residuals against fitted values does not automatically suggest
heteroskedasticity. Thus heteroskedasticity tests are useful, see
Section 9.2.
• Cigarette Example (Wooldridge 2009, Example 8.7) (with EViews-
outputs/commands):
Step I
1. OLS-estimationDependent Variable: CIGS, Method: Least Squares, Sample: 1 807
Variable Coefficient Std. Error t-Statistic Prob.
C -3.639855 24.07866 -0.151165 0.8799
LINCOME 0.880268 0.727783 1.209520 0.2268
LCIGPRIC -0.750855 5.773343 -0.130056 0.8966
EDUC -0.501498 0.167077 -3.001596 0.0028
AGE 0.770694 0.160122 4.813155 0.0000
AGE^2 -0.009023 0.001743 -5.176494 0.0000
359
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
RESTAURN -2.825085 1.111794 -2.541016 0.0112
R-squared 0.052737, Adjusted R-squared 0.045632, Akaike info crit. 8.037737
Mean dependent var 8.686493, S.D. dependent var 13.72152, Schwarz criterion 8.078448
S.E. of regression 13.40479, Sum squared resid 143750.7, Log likelihood -3236.227
F-statistic 7.423062, Prob(F-statistic) 0.000000, Durbin-Watson stat 2.012825
2. Save the residuals using genr u hat = resid
3. Taking the logarithm of the squared residuals by using
genr ln u sq = log(u hat^ 2)
360
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
4. Estimation of variance regression (8.8) with OLS yieldsDependent Variable: LN_U_SQ, Method: Least Squares, Sample: 1 807
Variable Coefficient Std. Error t-Statistic Prob.
C -1.920704 2.563034 -0.749387 0.4538
LINCOME 0.291541 0.077468 3.763351 0.0002
LCIGPRIC 0.195421 0.614539 0.317996 0.7506
EDUC -0.079704 0.017784 -4.481656 0.0000
AGE 0.204005 0.017044 11.96927 0.0000
AGE^2 -0.002392 0.000186 -12.89313 0.0000
RESTAURN -0.627012 0.118344 -5.298212 0.0000
R-squared 0.247362, Adjusted R-squared 0.241717, Akaike info crit. 3.557469
Mean dependent var 4.207485, S.D. dependent var 1.638575, Schwarz criterion 3.598179
S.E. of regression 1.426862, Sum squared resid 1628.749, Log likelihood -1428.439
F-statistic 43.82126, Prob(F-statistic) 0.000000, Durbin-Watson stat 2.024587
– Saving the hi, i = 1, . . . , n, using
genr h hat = exp(ln u sq - resid).
361
Intensive Course in Econometrics — Section 8.4 — UR March 2009 — R. Tschernig
Step IIWeighted LS estimate (see Options) with weights h hat^ (-1/2)Dependent Variable: CIGS, Method: Least Squares, Sample: 1 807, Weighting series: WEIGHTS
Variable Coefficient Std. Error t-Statistic Prob.
C 5.635433 17.80314 0.316541 0.7517
LINCOME 1.295240 0.437012 2.963855 0.0031
LCIGPRIC -2.940305 4.460145 -0.659240 0.5099
EDUC -0.463446 0.120159 -3.856953 0.0001
AGE 0.481948 0.096808 4.978378 0.0000
AGE^2 -0.005627 0.000939 -5.989707 0.0000
RESTAURN -3.461064 0.795505 -4.350776 0.0000
Weighted Statistics
R-squared 0.002751, Adjusted R-squared -0.004728, Akaike info crit. 7.765025
Mean dependent var 7.158227, S.D. dependent var 11.66855, Schwarz criterion 7.805736
S.E. of regression 11.69611, Sum squared resid 109439.1, Log likelihood -3126.188
F-statistic 17.05549, Prob(F-statistic) 0.000000, Durbin-Watson stat 2.049719
Unweighted Statistics
R-squared 0.045739, Adjusted R-squared 0.038582, S.E. of regression 13.45421
Mean dependent var 8.686493, S.D. dependent var 13.72152, Sum squared resid 144812.7
(Remark: The Unweighted Statistics are based on the resid-
uals y − XβGLS; see EViews-helpfile.)
– Compare with the OLS estimator based on White standard errors.
362
9 Multiple Regression Analysis: Model Diagnostics
9.1 The RESET Test
RESET Test (regression specification error test)
Idea and implementation:
• If the original model
y = x0β0 + . . . + xkβk + u = x′β + u
363
Intensive Course in Econometrics — Section 9.1 — UR March 2009 — R. Tschernig
satisfies assumption MLR.4 E[u|x0, . . . , xk] = 0, it holds that
E[y|x0, . . . , xk] = x0β0 + . . . + xkβk + u = x′β.
• Then, any further term added to the model should not be significant.
Thus, any nonlinear function of the independent variables should be
insignificant.
• Thus, the null hypothesis of the RESET test is formulated such
that one can test the significance of nonlinear functions of the fit-
ted values y = x′β that are added to the model. Note that the
fitted values are a linear function of the regressors of the original
specification.
364
Intensive Course in Econometrics — Section 9.1 — UR March 2009 — R. Tschernig
• In practice it turned out that for implementing the RESET test it is
sufficient to include quadratic and cubic terms of y only
y = x′β + αy2 + γy3 + ε.
The pair of hypotheses is
H0 : α = 0, γ = 0 (linear model is correctly specified)
H1 : α 6= 0 and/or γ 6= 0.
The null hypothesis is tested using an F test with 2 degrees of
freedom in the numerator and n − k − 3 in the denominator.
• Be aware that the null hypothesis may also be rejected because of
omitting relevant regressor variables.
365
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
9.2 Heteroskedasticity Tests
• As already noted, it does not make sense to “automatically” use the
FGLS estimator. If the errors are homoskedastic, the OLS estimator
with OLS standard errors should be used.
• Thus, one should test if there is statistical evidence for heteroskedas-
ticity.
• In the following, two different test for heteroskedasticity are dis-
cussed: the Breusch-Pagan test and the White test. For both, the
null hypothesis is “homoskedastic errors”.
• Both tests are implemented in EViews 6.0, however, for the White
test without cross terms the level variables are included only in earlier
EViews versions.
366
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
It is assumed that for the multiple linear regression
y = β0 + x1β1 + . . . + xkβk + u
assumptions MLR.1 to MLR.4 hold.
The pair of hypotheses that has to be tested is
H0 : V ar(ui|xi) = σ2 (homoskedasticity),
H1 : V ar(ui|xi) = σi 6= σ2 (heteroskedasticity).
The general idea underlying heteroskedasticity tests is that under the
null hypothesis no regressor should have any explanatory power for
V ar(ui|xi). If the null hypothesis is not true, V ar(ui|xi) can be a
(nearly arbitrary) function of the regressors xj, (1 ≤ j ≤ k).
Note: The Breusch-Pagan test and the White test differ with respect
to the specification of their alternative hypothesis.
367
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
Breusch-Pagan Test
• Idea: Consider the regression
u2i = δ0 + δ1xi1 + · · · + δkxik + vi, i = 1, . . . , n. (9.1)
Under assumptions MLR.1 to MLR.4 the OLS estimator for the δj’s
is unbiased.
The pair of hypotheses is:
H0 : δ1 = δ2 = · · · = δk = 0 versus
H1 : δ1 6= 0 and/or δ2 6= 0 and/or . . .,
since under H0 it holds that E[u2i |X] = δ0.
368
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• Difference from the previous application of the F test:
– The squares of the errors u2i are by no means normally distributed
since they are squared quantities and thus cannot take negative
values. Hence, the vi cannot be normally distributed and the
F distribution of the F statistic does not hold exactly in finite
samples. However, the central limit theorem (CLT) works here as
well, see Section 5.2, and the F statistic follows approximately
an F distribution in large samples.
– The errors ui are unknown.They can be replaced by the OLS
residuals ui. In doing so, the F test remains asymptotically valid
(proof is formally sophisticated).
369
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• The R2 version of the test statistic can be used. Note that for a
regression including only a constant, it holds that R2 = 0 since
SSR = SST (there are no regressors that show a variation). Call
the coefficient of variation of the OLS estimation of (9.1) R2u2 then
F =R2
u2/k
(1 − R2u2)/(n − k − 1)
.
The F statistic for testing the joint significance of all regressors is
generally given by the appropriate software.
• H0 is rejected if F exceeds the critical value for a chosen significance
level (or equivalently if the p-value is smaller than the significance
level).
370
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• Cigarette Example Continued: (from Section 8.4):Dependent Variable: U_HAT_SQ, Method: Least Squares, Sample: 1 807
Variable Coefficient Std. Error t-Statistic Prob.
C -636.3031 652.4946 -0.975185 0.3298
LINCOME 24.63849 19.72180 1.249302 0.2119
LCIGPRIC 60.97656 156.4487 0.389754 0.6968
EDUC -2.384226 4.527535 -0.526606 0.5986
AGE 19.41748 4.339068 4.475034 0.0000
AGE^2 -0.214790 0.047234 -4.547398 0.0000
RESTAURN -71.18138 30.12789 -2.362641 0.0184
R-squared 0.039973, Adjusted R-squared 0.032773, Akaike info crit. 14.63669
Mean dependent var 178.1297, S.D. dependent var 369.3519, Schwarz criterion 14.67740
S.E. of regression 363.2491, Sum squared resid 1.06E+08, Log likelihood -5898.905
F-statistic 5.551687, Prob(F-statistic) 0.000012, Durbin-Watson stat 1.937302
The F statistic for the above H0 is 5.55 and the corresponding p-
value is smaller than 1%. The null hypothesis of homoskedastic
errors thus is rejected at a level of 1%.
371
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• Note:
– If one conjectures that the heteroskedasticity is caused by specific
variables that have not been included previously, they can be
included in regression (9.1).
– If H0 is not rejected, this does not mean automatically that the
ui’s are homoskedastic. If the specification (9.1) does not con-
tain all relevant variables causing heteroskedasticity, then it may
happen that all δj, j = 1, . . . , k, are jointly insignificant.
– A variant of the Breusch-Pagan test is a test for multiplicative
heteroskedasticity, i.e. the variance is of the form σ2i = σ2 ·
h(x′iβ). If, for example, the case h(·) = exp(·) is assumed, the
test equation ln(u2i ) = ln(σ2) + x′iβ + v results.
372
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
White Test
• Background:
For deriving the asymptotic distribution of the OLS estimator the
assumption of homoskedastic errors MLR.5 is not necessary.
It is enough that the squared errors u2i are uncorrelated with all
regressors and the squares and cross products of the latter.
This can easily be tested using the following regression, where the
errors are already replaced by the residuals:
u2i = δ0 + δ1xi1 + · · · + δkxik
+ δk+1x2i1 + · · · + δJ1
x2ik
+ δJ1+1xi1xi2 + · · · + δJ2xik−1xik
+ vi, i = 1, . . . , n. (9.2)
373
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• The pair of hypotheses is:
H0 : δj = 0 for all j = 1, 2, . . . , J2, vs. H1 : δj 6= 0 for at least one j.
Again, a F test can be used whose distribution is approximated by
the F distribution (asymptotic distribution).
• With many regressors, it is tedious to implement the F test for (9.2)
manually. However, most software packages provide the White test.
• When implementing the White test, a large number of parameters
has to be estimated if the original model exhibits large k. This is
hardly possible in small samples. Then one only includes the squares
x2ij into the regression and neglects all cross products.
• Note: If the null hypothesis is rejected, this may also be due to
violation of MLR.1 or MLR.4. Then, the original regression is mis-
specified!
374
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• Cigarette Example Continued:White Heteroskedasticity Test:
F-statistic 2.159257 Probability 0.000905
Obs*R-squared 52.17244 Probability 0.001140
Test Equation: Dependent Variable: RESID^2, Method: Least Squares, Sample: 1 807
Variable Coefficient Std. Error t-Statistic Prob.
C 29374.75 20559.14 1.428793 0.1535
LINCOME -1049.627 963.4360 -1.089462 0.2763
LINCOME^2 -3.941187 17.07122 -0.230867 0.8175
LINCOME*LCIGPRIC 329.8888 239.2417 1.378893 0.1683
LINCOME*EDUC -9.591844 8.047067 -1.191968 0.2336
LINCOME*AGE -3.354564 6.682195 -0.502015 0.6158
LINCOME*(AGE^2) 0.026704 0.073025 0.365689 0.7147
LINCOME*RESTAURN -59.88701 49.69040 -1.205203 0.2285
LCIGPRIC -10340.67 9754.559 -1.060086 0.2894
LCIGPRIC^2 668.5282 1204.316 0.555110 0.5790
LCIGPRIC*EDUC 32.91400 59.06252 0.557274 0.5775
LCIGPRIC*AGE 62.88178 55.29011 1.137306 0.2558
LCIGPRIC*(AGE^2) -0.622372 0.594730 -1.046479 0.2957
LCIGPRIC*RESTAURN 862.1558 720.6219 1.196405 0.2319
EDUC -117.4717 251.2852 -0.467484 0.6403
EDUC^2 -0.290344 1.287605 -0.225492 0.8217
EDUC*AGE 3.617047 1.724659 2.097253 0.0363
EDUC*(AGE^2) -0.035558 0.017664 -2.012988 0.0445
EDUC*RESTAURN -2.896491 10.65709 -0.271790 0.7859
375
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
AGE -264.1467 235.7624 -1.120394 0.2629
AGE^2 3.468605 3.194651 1.085754 0.2779
AGE*(AGE^2) -0.019111 0.028655 -0.666935 0.5050
AGE*RESTAURN -4.933195 10.84029 -0.455080 0.6492
(AGE^2)^2 0.000118 0.000146 0.807552 0.4196
(AGE^2)*RESTAURN 0.038446 0.120459 0.319160 0.7497
RESTAURN -2868.188 2986.776 -0.960296 0.3372
R-squared 0.064650, Adjusted R-squared 0.034709, Akaike info crit. 14.65774
Mean dependent var 178.1297, S.D. dependent var 369.3519, Schwarz criterion 14.80895
S.E. of regression 362.8853, Sum squared resid 1.03E+08, Log likelihood -5888.398
F-statistic 2.159257, Prob(F-statistic) 0.000905, Durbin-Watson stat 1.933288
Result: With the White test H0 is also rejected.
376
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
Trade Example Continued
(from Section 8.4):
• Breusch-Pagan test for heteroskedasticity using OLS residuals====================================================================
Heteroskedasticity Test: Breusch-Pagan-Godfrey
====================================================================
F-statistic 4.139262 Prob. F(5,46) 0.0035
Obs*R-squared 16.13595 Prob. Chi-Square(5) 0.0065
Scaled explained SS 11.61688 Prob. Chi-Square(5) 0.0404
====================================================================
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Sample: 1 55, Included observations: 52
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C 59.29291 40.51240 1.463575 0.1501
LOG(WDI_GDPUSDCR_O) -4.634815 3.174932 -1.459815 0.1511
(LOG(WDI_GDPUSDCR_O))^2 0.075281 0.061518 1.223720 0.2273
LOG(CEPII_DIST) 1.382417 0.745464 1.854439 0.0701
377
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
CEPII_COMCOL_REV -1.642264 0.978128 -1.678986 0.0999
CEPII_COMLANG_OFF 0.394264 1.807698 0.218103 0.8283
====================================================================
R-squared 0.310307 Mean dependent var 1.661544
Adjusted R-squared 0.235340 S.D. dependent var 2.275813
S.E. of regression 1.990081 Akaike info criterion 4.322395
Sum squared resid 182.1794 Schwarz criterion 4.547538
Log likelihood -106.3823 Hannan-Quinn criter. 4.408709
F-statistic 4.139262 Durbin-Watson stat 2.285868
Prob(F-statistic) 0.003459
====================================================================
378
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• White test (without cross terms) for heteroskedasticity using OLSresidualsHeteroskedasticity Test: White
====================================================================
F-statistic 4.085786 Prob. F(5,46) 0.0037
Obs*R-squared 15.99159 Prob. Chi-Square(5) 0.0069
Scaled explained SS 11.51295 Prob. Chi-Square(5) 0.0421
====================================================================
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Sample: 1 55, Included observations: 52
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C 18.75859 10.91313 1.718900 0.0924
(LOG(WDI_GDPUSDCR_O))^2 -0.055924 0.031777 -1.759911 0.0851
((LOG(WDI_GDPUSDCR_O))^2)^2 3.05E-05 2.34E-05 1.307676 0.1975
(LOG(CEPII_DIST))^2 0.090498 0.049960 1.811423 0.0766
CEPII_COMCOL_REV^2 -1.582928 0.992436 -1.594992 0.1176
CEPII_COMLANG_OFF^2 0.291977 1.766624 0.165274 0.8695
====================================================================
R-squared 0.307531 Mean dependent var 1.661544
379
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
Adjusted R-squared 0.232262 S.D. dependent var 2.275813
S.E. of regression 1.994082 Akaike info criterion 4.326412
Sum squared resid 182.9127 Schwarz criterion 4.551555
Log likelihood -106.4867 Hannan-Quinn criter. 4.412726
F-statistic 4.085786 Durbin-Watson stat 2.285412
Prob(F-statistic) 0.003749
====================================================================
380
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• Breusch-Pagan test for heteroskedasticity using standardized FGLSresiduals====================================================================
Heteroskedasticity Test: Breusch-Pagan-Godfrey
====================================================================
F-statistic 0.683750 Prob. F(5,46) 0.6381
Obs*R-squared 3.597320 Prob. Chi-Square(5) 0.6087
Scaled explained SS 1.922069 Prob. Chi-Square(5) 0.8598
====================================================================
Test Equation:
Dependent Variable: WGT_RESID^2
Method: Least Squares
Sample: 1 55, Included observations: 52
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C -0.323375 1.044251 -0.309672 0.7582
LOG(WDI_GDPUSDCR_O)*WGT 0.122435 0.229469 0.533560 0.5962
(LOG(WDI_GDPUSDCR_O))^2*WGT -0.010813 0.007708 -1.402801 0.1674
LOG(CEPII_DIST)*WGT 0.678927 0.586156 1.158269 0.2527
CEPII_COMCOL_REV*WGT -0.781193 0.641851 -1.217095 0.2298
CEPII_COMLANG_OFF*WGT 0.299789 1.228329 0.244062 0.8083
====================================================================
381
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
R-squared 0.069179 Mean dependent var 0.946185
Adjusted R-squared -0.031997 S.D. dependent var 1.116472
S.E. of regression 1.134193 Akaike info criterion 3.197887
Sum squared resid 59.17414 Schwarz criterion 3.423031
Log likelihood -77.14507 Hannan-Quinn criter. 3.284202
F-statistic 0.683750 Durbin-Watson stat 1.973137
Prob(F-statistic) 0.638076
====================================================================
382
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
• White test (without cross terms) for heteroskedasticity using FGLSresidualsHeteroskedasticity Test: White
====================================================================
F-statistic 0.538645 Prob. F(6,45) 0.7760
Obs*R-squared 3.484361 Prob. Chi-Square(6) 0.7460
Scaled explained SS 1.861714 Prob. Chi-Square(6) 0.9320
====================================================================
Test Equation:
Dependent Variable: WGT_RESID^2
Method: Least Squares
Sample: 1 55, Included observations: 52
====================================================================
Coefficient Std. Error t-Statistic Prob.
====================================================================
C 0.568776 0.520032 1.093733 0.2799
WGT^2 6.158506 9.676452 0.636443 0.5277
(LOG(WDI_GDPUSDCR_O))^2*WGT^2 -0.014641 0.023638 -0.619379 0.5388
((LOG(WDI_GDPUSDCR_O))^2)^2*WGT^2 6.39E-06 1.41E-05 0.454559 0.6516
(LOG(CEPII_DIST))^2*WGT^2 0.021214 0.022901 0.926322 0.3592
CEPII_COMCOL_REV^2*WGT^2 -0.941910 1.019790 -0.923632 0.3606
CEPII_COMLANG_OFF^2*WGT^2 -0.421654 1.220702 -0.345420 0.7314
====================================================================
383
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
R-squared 0.067007 Mean dependent var 0.946185
Adjusted R-squared -0.057392 S.D. dependent var 1.116472
S.E. of regression 1.148063 Akaike info criterion 3.238680
Sum squared resid 59.31224 Schwarz criterion 3.501347
Log likelihood -77.20568 Hannan-Quinn criter. 3.339380
F-statistic 0.538645 Durbin-Watson stat 1.955059
Prob(F-statistic) 0.775963
====================================================================
Results:
– Note that the specification of the White test without cross terms
follows EViews 6.0 and does not include level terms (in contrast
to (9.2)).
– Both, the Breusch-Pagan and the White test reject the null hy-
pothesis of homoskedastic errors for the OLS residuals at the 1%
significance level. Thus, using OLS with heteroskedasticity-robust
standard errors or FGLS in Section 8.4 was justified.
384
Intensive Course in Econometrics — Section 9.2 — UR March 2009 — R. Tschernig
– Both, the Breusch-Pagan and the White test do not reject the
null hypothesis of homoskedastic standardized errors in the FGLS
framework. Both p-values are above 60%. Thus, the variance
regression in Section 8.4 does not seem to be misspecified.
– In sum, among all models and estimation procedures considered,
the FGLS estimates of Model 5 seem to be the most reliable ones.
Reading: Chapter 8 in Wooldridge (2009) (without Section 8.5 con-
cerning linear probability models).
385
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
9.3 Model Specification II: Useful Tests
9.3.1 Comparing Models with Identical Regressand
Starting point: two non-nested models
(M1) y = x0β0 + . . . + xkβk + u = x′β + u,
(M2) y = z0γ0 + . . . + zmγm + v = z′γ + v,
where k = m does not have to hold.
Decision between (M1) and (M2): using
• information criteria (AIC, SC, HQ, ...),
• encompassing test,
• non-nested F test,
• J test.
386
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
All three tests can be constituted on the encompassing principle.
Encompassing Principle
Let two non-nested models be given:
(M1) y = x′β + u,
(M2) y = z′γ + v.
For clarifying the non-nested relationship between (M1) and (M2), de-
fine
x′ =(w′ x′B
), β =
(βA βB
),
z′ =(w′ z′B
), γ =
(γA γB
),
such that w contains all common regressors
(M1) y = w′βA + x′BβB + u,
(M2) y = w′γA + z′BγB + v.
387
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
Idea of the encompassing principle:
• If (M1) is correctly specified, it must be able to explain the results
of an estimation of (M2) (and vice versa).
• If not, (M1) has to be rejected (and vice versa).
Derivation:
Consider the “artificial nesting model”
(ANM) y = w′a + x′Bbx + z′Bbz + ε, E[ε|w,xB, zB] = 0.
Different settings:
• (ANM) correctly specified model such that (M1) and (M2) are mis-
specified. Model (M2) is estimated.
• (M1) correctly specified model. Model (M2) is estimated.
• (M2) correctly specified model. Model (M1) is estimated.
388
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
In general an omitted variable bias results for all cases.
Details:
• (ANM) correctly specified model such that (M1) and (M2) are mis-
specified. Model (M2) is estimated. ⇒ xB omitted.
E[y|w, zB] = E[w′a + x′Bbx + z′Bbz + ε|w, zB]
= E[w′a|w, zB] + E[x′Bbx|w, zB]
+ E[z′Bbz|w, zB] + E[ε|w, zB]
= w′a + E[x′B|w, zB]bx + z′Bbz + E[ε|w, zB].
For simplicity it is assumed that xB is scalar. Then it holds that
xB = w′q + z′Bp + ν,
E[xB|w, zB] = w′q + z′Bp.
It also holds that
E [E[ε|w,xB, zB]|w, zB] = E[ε|w, zB].
389
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
Since (ANM) is correct, it holds that E[ε|w,xB, zB] = 0 and thus
E[0] = 0 = E[ε|w, zB].
When estimating (M2) instead of (ANM), one gets
E[y|w, zB] = w′a + [w′q + z′Bp]bx + z′Bbz
= w′ [a+qbx]︸ ︷︷ ︸γA
+z′B [bz+pbx]︸ ︷︷ ︸γB
. (9.3)
Note that the biases qbx and pbx are caused by omitting the variable
xB. These effects bias the direct impact of w via a and of zB via
bz on y.
390
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
• (M1) correctly specified model. Model (M2) is estimated.
Then bz = 0 and from (9.3) the following restriction results:
pbx = γB.
Now it can be seen that knowing the correctly specified model (M1)
is enough for deriving model (M2), thus predicting γB or the expec-
tation of the OLS estimator. In other words: Since (M2) is “smaller”
than (M1) with respect to the relevant variables, the behavior of
(M2) can be predicted with the help of (M1) when an unbiased es-
timator is used for the latter. Then one says “(M1) encompasses
(M2)”. (Knowing (M1) is not enough here if (ANM) is the correct
model, bz 6= 0.)
• (M2) correctly specified model. Model (M1) is estimated.
Can be derived just as in the above case.
391
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
Thus, for the null hypothesis “(M1) encompasses (M2)” two equivalent
hypotheses can be tested:
• H0 : pb − γB = 0 - more complicated, no details here. (This
version is often termed encompassing test and sometimes has
advantages in more general models.)
• H0 : bz = 0 in (ANM) - easy: by the help of a non-nested F
test.
Proceeding for more than two alternatives: selection procedure
• Based on this same principle, the remaining model competes with
further alternative models as long as it is not rejected.
• Problem of this principle: it can happen that both null hypotheses
have to be rejected.
392
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
Non-nested F test
Idea and implementation:
• Hypotheses: “H0: model (M1) is correct” versus “H1: model (M1)
incorrect”.
• Again, partition z′ = (w′, z′B), where the kA regressors from w are
contained in x but the kB regressors from zB are not contained.
• Formulate the artificial nesting model (ANM)
y = x′β + z′Bbz + ε.
• Based on this ANM test H0 where
H0 : bz = 0
using an F test with kB degrees of freedom in the numerator and
n − m − kB in the denominator.
393
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
• For the test of (M2) vs. (M1) proceed analogously with partition
x′ = (w′,x′B) ...
J test (Davidson-MacKinnon test)
Idea and implementation:
• For the J test the ANM is formulated such that both (M1) and
(M2) are nested in the ANM:
y = (1 − λ)x′β + λz′γ + ε.
For the case λ = 0 the model (M1) results, for λ = 1 model (M2).
• Problem: λ, β and γ are not identified in the above approach.
• Solution: replace γ by the OLS estimator from (M2) γ.
I.e. test H0 : λ = 0 with test equation y = x′β∗+λyM2+η, where
β∗ = (1 − λ)β and yM2 = z′γ is the fitted value from the OLS
estimation of (M2).
394
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
• For testing whether (M2) is valid, proceed analogously ...
• Interpretation of the logic of the test:
For testing model (M1) it is enlarged by the fitted values of model
(M2); these (i.e. the by the regressors in (M2) explained part of y)
are tested for their significance in the test equation.
• Advantages of the J test compared to the non-nested F test:
– only one single restriction has to be tested,
– higher power, if kB or respectively mB are very large,
– in case of kB = 1 or respectively mB = 1 the tests are equivalent.
395
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
9.3.2 Comparing Models with differing Regressand
Idea and implementation (of the P test):
Example linear model versus log-log alternative
• Step 1: Run an OLS estimation for both models.
• Step 2: Compute the corresponding fitted values
ylin (linear model) and ln(ylog) (log-log model).
• Step 3a: Test the linear approach against the log-log alternative
using the ANM
y =∑
xjβj,lin + δlin[ln(ylin) − ln(ylog)] + u,
by a t test with the null hypothesis
H0 : δlin = 0 (linear model is correct).
396
Intensive Course in Econometrics — Section 9.3 — UR March 2009 — R. Tschernig
• Step 3b: Test the log-log approach against the linear alternative
using the ANM
ln(y) =∑
ln(xj)βj,log + δlog[ylin − exp( ln ylog)] + v,
by a t test with the null hypothesis
H0 : δlog = 0 (log-log model is correct).
Problem: it is possible that both hypotheses are rejected (i.e. another
functional form is relevant) or both cannot be rejected (i.e. the problem
of lacking power or something else).
Note: in this case a comparison using the information criteria is not
possible.
Reading: Chapter 9 in Wooldridge (2009).
397
10 Appendix
10.1 A Condensed Introduction to Probability
Preliminary Statement: The following pages are not considered as de-
terrence, but as supplement to the illustrations found in introductory
textbooks for econometrics. This supplement is intended to explain
the intuition underlying the large amount of definitions and concepts
in probability theory. Nevertheless it is not possible to completely avoid
formulas, although it may take some time to clarify your mind.
I
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
• Sample space, outcome space:
The set Ω contains all possible outcomes of a random experiment.
This set can contain (countably) finite or infinite outcomes.
Examples:
– Urn with 4 balls of different color: Ω = yellow, red, blue, green– Monthly income of a household in the future: Ω = [0,∞)
Remark:
– If there is a finite number of outcomes, they are often denoted
as ωi. For S outcomes, Ω appears as
Ω = ω1, ω2, . . . , ωS.
– If there is an infinite number of outcomes, each one is often
denoted as ω.
II
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
• Event:
Every set of possible outcomes = every subset of the set Ω including
Ω itself.
Examples:
– Urn-example: possible events are for example yellow, red or
red, blue, green.– Household income: possible events are all possible subintervals
and combinations of them, e.g. (0, 5000], [1000, 1001), (400,∞),
4000, and so on.
Remark: By using the general point of view with the ω’s, one has
– for the case of S outcomes: ω1, ω2, ωS, ω3, . . . , ωS, and
so on.
– for the case of infinitely many outcomes located inside an interval
III
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
Ω = (−∞,∞): (a1, b1], [a2, b2), (0,∞), and so on, where the
lower bound always has to be lower or equal the upper bound
(ai ≤ bi).
• Random variable:
A random variable is a function that assigns a real number X(ω) to
each outcome ω ∈ Ω.
Urn example: X(ω1) = 0, X(ω2) = 3, X(ω3) = 17, X(ω4) = 20.
• Density function
– Preliminary statement: As we have already seen, it gets com-
plicated if Ω contains infinitely many outcomes. Consider for
example Ω = [0, 4]. If one wants to compute the probability for
the number π to appear, this probability is equal to zero. If it
were not equal to zero, we had the problem that a sum of all
probabilities for all (infinitely many) numbers could not be equal
IV
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
to 1. What to do?
– A back door is the following trick: Consider the probability for the
outcome of the random variable X being located in the interval
[0, x], with x < 4. This probability can be written as P (X ≤ x).
Now determine how the probability changes by extending the size
of this interval [0, x] by h. The solution to this is: P (X ≤x + h) − P (X ≤ x). By relating this change in probability to
the interval length one gets
P (X ≤ x + h) − P (X ≤ x)
h.
For a decreasing interval length h that approaches zero, one ob-
tains the following limit:
limh→0
P (X ≤ x + h) − P (X ≤ x)
h= f (x).
This limit is called probability density function or shortly
V
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
density function that belongs to the probability function P .
– How to interpret a density function?
By using the sloppy formulation
P (X ≤ x + h) − P (X ≤ x)
h≈ f (x)
and rewriting as
P (X ≤ x + h) − P (X ≤ x) ≈ f (x)h,
one can see that f (x) determines the rate of change for the
probability that X falls into the interval [0, x] if the interval length
is extended by h. Hence, the density function is a rate.
– As the density function is a derivative, we get conversely for our
example ∫ x
0f (u)du = P (X ≤ x) = F (x).
VI
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
Here, F (x) = P (X ≤ x) is called probability distribution
function. Certainly, in this example we get∫ 4
0f (u)du = P (X ≤ 4) = 1.
In general, the integral of the density function over the full support
of the random variable yields a value of 1. Consider for example
X(ω) ∈ R:∫ ∞
−∞f (u)du = P (X ≤ ∞) = 1.
VII
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
• Conditional probability function
Let’s begin with an example:
Let the random variable X ∈ [0,∞) be the payoff in a lottery.
The probability function (distribution function) P (X ≤ x) = F (x)
is the probability for a maximum payoff x. Additionally, we know
that there are two machines (machine A and B) that determine the
payoff.
Question: What is the probability for a maximum payoff of x if
machine A is used?
In other words, what is the probability of interest if the condition
“Machine A is used” is applied? Hence, the probability under con-
sideration is also called conditional probability and written as
P (X ≤ x|A).
VIII
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
Accordingly one writes P (X ≤ x|B), if the condition “Machine B
is used” is applied.
Question: What is the relationship between the unconditional
probability P (X ≤ x) and the conditional probabilities P (X ≤x|A) and P (X ≤ x|B)?
To answer this question one has to clarify what the corresponding
probabilities of using machine A or B are. Denoting these proba-
bilities by P (A) and P (B) we have:
P (X ≤ x) = P (X ≤ x|A)P (A) + P (X ≤ x|B)P (B)
F (x) = F (x|A)P (A) + F (x|B)P (B)
In this example there are two outcomes. The corresponding relation-
IX
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
ship can be extended to n discrete outcomes Ω = A1, A2, . . . , An:F (x) = F (x|A1)P (A1) + F (x|A2)P (A2) + · · · + F (x|An)P (An)
(10.1)
Until now we defined the conditions in terms of events and not
in terms of random variables. An example for the latter one were
if the payoff is determined by only one machine, but where the
mode of operation for this machine is conditioned upon the payoffs’
magnitude Z. In this case, the conditional distribution function
is F (x|Z = z), with Z = z meaning that the random variable
Z exactly takes the value z. For relating the unconditional and
conditional probability we have to replace the sum by an integral,
and the probability of the conditioning event by the corresponding
density function, as Z can have infinitely many values. For our
X
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
example we obtain:
F (x) =
∫ ∞
0F (x|Z = z)f (z)dz =
∫ ∞
0F (x|z)f (z)dz
or generally
F (x) =
∫F (x|Z = z)f (z)dz =
∫F (x|z)f (z)dz (10.2)
Another important property:
If the random variables X and Z are stochastically independent, we
have
F (x|z) = F (x).
• Conditional density function
The conditional density function can be heuristically derived from
the conditional distribution function in the same way as for the case
of the unconditional density function: one simply replaces the uncon-
XI
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
ditional probabilities by conditional probabilities. The conditional
density function arises from
limh→0
P (X ≤ x + h|A) − P (X ≤ x|A)
h= f (x|A).
For finitely many conditions equation (10.1) becomes
f (x) = f (x|A1)P (A1) + f (x|A2)P (A2) + · · · f (x|An)P (An).
The relationship (10.2) turns to
f (x) =
∫f (x|Z = z)f (z)dz =
∫f (x|z)f (z)dz. (10.3)
• Expectation
Consider again the payoff example.
Question: Which payoff would you expect “on average”?
Answer:∫∞0 xf (x)dx. For a payoff paid in n different discrete
amounts, one would expect∑n
i=1 xiP (X = xi) on average. Each
XII
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
possible payoff is multiplied by its probability of entry and added up.
It is not surprising that the result is denoted as expectation.
In general the expectation is defined as
E[X ] =
∫xf (x)dx, continuous X,
E[X ] =∑
xiP (X = xi), discrete X.
XIII
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
• Rules for the expectation e.g. Appendix B in Wooldridge (2009).
1. For each constant c it holds that
E[c] = c.
2. For all constants a and b and all random variables X and Y it
holds that
E[aX + bY ] = aE[X ] + bE[Y ].
3. If the random variables X and Y are independent, it holds that
E[Y X ] = E[Y ]E[X ].
• Conditional expectation
So far we did not care for the machine that was used to create the
payoff. If we are interested in the expected payoff of using machine
XIV
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
A, we have to calculate the conditional expectation
E[X|A] =
∫ ∞
0xf (x|A)dx.
This is easily achieved by replacing the unconditional density f (x) by
the conditional density f (x|A) and stating the condition in the no-
tation of expectations accordingly. Analogously the expected payoff
for machine B is determined as
E[X|B] =
∫ ∞
0xf (x|B)dx.
In general one has for discrete conditioning events
E[X|A] =
∫xf (x|A)dx, continuous X,
E[X|A] =∑
xiP (X = xi|A), discrete X,
XV
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
and for continuous conditions
E[X|Z = z] =
∫xf (x|Z = z)dx, continuous X,
E[X|Z = z] =∑
xiP (X = xi|Z = z), discrete X.
Remark: Frequently, the short versions are used as in Wooldridge
(2009).
E[X|z] =
∫xf (x|z)dx, continuous X,
E[X|z] =∑
xiP (X = xi|z), discrete X.
In accordance to the relationship of unconditional and conditional
probabilities there is a similar relationship for unconditional and con-
ditional expectations. The relationship is
E[X ] = E [E[X|Z]]
which is denoted as law of iterated expectations (LIE).
XVI
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
Sketch of proof:
E[X ] =
∫xf (x)dx
=
∫x
[∫f (x|z)f (z)dz
]dx (insert (10.3))
=
∫ ∫xf (x|z)f (z)dzdx
=
∫ ∫xf (x|z)dx
︸ ︷︷ ︸E[X|z]
f (z)dz (interchange dx and dz)
=
∫E[X|z]f (z)dz
=E [E[X|Z]]
In our example with 2 machines, the law of iterated expectations
XVII
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
yields
E[X ] = E[X|A]P [A] + E[X|B]P (B).
This example also shows that the conditional expectations E[X|A]
and E[X|B] are random variables. If they are weighted by the cor-
responding probabilities of entry P (A) and P (B), they yield E[X ].
Suppose that, prior to the lottery, you only know both conditional
expectations but not which machine is used. Then the expected
payoff is equal to E[X ] and both conditional expectations are con-
sidered as random variables. After knowing what machine is used,
the corresponding conditional expectation is the outcome of the ran-
dom variable. This is a general property of conditional expectations.
XVIII
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
• Rules for conditional expectations
e.g. Appendix B in Wooldridge (2009).
1. For each function c(·) it holds that
E[c(X)|X ] = c(X).
2. For all functions a(·) and b(·) it holds that
E[a(X)Y + b(X)|X ] = a(X)E[Y |X ] + b(X).
3. If the random variables X and Y are independent, it holds that
E[Y |X ] = E[Y ].
4. Law of iterated expectations (LIE)
E[E[Y |X ]] = E[Y ].
5. E[Y |X ] = E[E[Y |X,Z]|X ].
XIX
Intensive Course in Econometrics — Section 10.1 — UR March 2009 — R. Tschernig
6. If it holds that E[Y |X ] = E[Y ], then it also holds that Cov(X,Y ) =
0.
7. If E[Y 2]
< ∞ and E[g(X)2] < ∞ for an arbitrary function
g(·), then the following inequalities hold:
E[Y − E[Y |X ]]2|X ≤ E[Y − g(X)]2|XE[Y − E[Y |X ]]2 ≤ E[Y − g(X)]2.
XX
Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig
10.2 Important Rules of Matrix Algebra
Matrix addition
A =
a11 a12 . . . a1K
a21 a22 . . . a2K
... ... ...
aT1 aT2 . . . aTK
, C =
c11 c12 . . . c1K
c21 c22 . . . c2K
... ... ...
cT1 cT2 . . . cTK
.
If A and C are of the same dimension
A + C =
a11 + c11 a12 + c12 · · · a1K + c1K
a21 + c21 a22 + c22 · · · a2K + c2K
... ... ...
aT1 + cT1 aT2 + cT2 · · · aTK + cTK
.
XXI
Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig
Matrix multiplication
A =
a11 a12 · · · a1K
a21 a22 · · · a2K
... ... ...
aT1 aT2 · · · aTK
, B =
b11 b12 · · · b1L
b21 b22 · · · b2L
... ... ...
bK1 bK2 · · · bKL
.
If the number of columns in A is equal to the number of rows in B,
then the product C = AB is defined and the following equality holds
for every element in C
cij =(
ai1 · · · aiK
)
b1j
...
bKj
= ai1b1j + · · · + aiKbKj =
K∑
l=1
ailblj.
Caution: In general it holds that AB 6= BA.
XXII
Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig
Transpose of a matrix
Given the (2 × 3)-matrix (i.e. 2 rows, 3 columns)
A =
(a11 a12 a13
a21 a22 a23
),
the transpose of A is the (3 × 2)-matrix
A′ =
a11 a21
a12 a22
a13 a23
.
It holds that
(AB)′ = B′A′.
XXIII
Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig
Inverse of a matrix
Let A be the (K × K)-matrix
A =
a11 a12 · · · a1K
a21 a22 · · · a2K
... ... ...
aK1 aK2 · · · aKK
,
then the inverse of A is A−1 and is defined by
AA−1 = A−1A = IK =
1 0 . . . 0
0 1 0
... . . .
0 0 . . . 1
with IK as identity matrix of dimension (K × K).
XXIV
Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig
The matrix A is invertible if the rows respectively columns are linearly
independent. In other words: No row (column) can be described as
linear combination of the other rows (columns). Technically this is
satisfied whenever the determinant of A is unequal to zero.
Frequently, a noninvertible matrix is called singular.
The calculation of an inverse is better left to a computer. Only for
matrices of 2 or 3 columns/rows, the calculation is of moderate com-
plexity. Hence a manual calculation can be useful.
XXV
Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig
Special issue of a (2 × 2) matrix:
For a square (2 × 2) matrix
B =
(b11 b12
b21 b22
)
the determinant is computed as
det(B) = b11b22 − b21b12
and the inverse as
B−1 =1
det(B)
(b22 −b12
−b21 b11
)
=1
b11b22 − b21b12
(b22 −b12
−b21 b11
).
XXVI
Intensive Course in Econometrics — Section 10.2 — UR March 2009 — R. Tschernig
Example:
C =
(0 2
1 −1
), with det(C) = 0 · (−1) − 1 · 2 = −2
C−1 =1
−2
(−1 −2
−1 0
)=
(12 112 0
)
Check:
CC−1 =
(0 2
1 −1
)(12 112 0
)=
(1 0
0 1
)
Reading: As supplement for matrix algebra and its implementation
in the multiple linear regression framework see Appendices D, E.1 in
Wooldridge (2009).
XXVII
Intensive Course in Econometrics — Section 10.3 — UR March 2009 — R. Tschernig
10.3 Rules for Matrix Differentiation
•
c =
c1
c2
...
cT
, w =
w1
w2
...
wT
z = c′w =(
c1 c2 · · · cT
)
w1
w2
...
wT
∂z
∂w= c
XXVIII
Intensive Course in Econometrics — Section 10.3 — UR March 2009 — R. Tschernig
•
A =
a11 a12 · · · a1T
a21 a22 · · · a2T
. . . . . . . . . . . . . . . . .
aT1 aT2 · · · aTT
z = w′Aw =(
w1 w2 · · · wT
)
a11 a12 · · · a1T
a21 a22 · · · a2T
. . . . . . . . . . . . . . . . .
aT1 aT2 · · · aTT
w1
w2
...
wT
∂z
∂w= (A′ + A)w
XXIX
Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig
10.4 Data for Estimating Gravity Equations
Legend for data in gravity data kaz.wf1
• Countries and country codes1 ALB Albania 17 GBR United Kingdom 33 NLD Netherlands
2 ARM Armenia 18 GEO Georgia 34 NOR Norway
3 AUT Austria 19 GER Germany 35 POL Poland
4 AZE Azerbaijan 20 GRC Greece 36 PRT Portugal
5 BEL Belgium and 21 HRV Croatia 37 ROM Romania
Luxembourg
6 BGR Bulgaria 22 HUN Hungary 38 RUS Russia
7 BLR Belarus 23 IRL Ireland 39 SVK Slovakia
8 CAN Canada 24 ISL Iceland 40 SVN Slovenia
9 CHE Switzerland 25 ITA Italy 41 SWE Sweden
10 CYP Cyprus 26 KAZ Kazakhstan 42 TKM Turkmenistan
11 CZE Czech Republic 27 KGZ Kyrgyzstan 43 TUR Turkey
12 DNK Denmark 28 LTU Lithuania 44 UKR Ukraine
13 ESP Spain 29 LVA Latvia 45 USA United States
14 EST Estonia 30 MDA Moldova 46 YUG Serbia and
Montenegro
15 FIN Finland 31 MKD Macedonia
16 FRA France 32 MLT Malta
XXX
Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig
Notes: Table is based on Table 1 in “Explanatory notes on gravity data.wf1 ” .
Countries that feature only as origin countries:BIH Bosnia and Herzegovina
TJK Tajikistan
UZB Uzbekistan
CHN China
HKG Hong Kong
JPN Japan
KOR South Korea
TWN Taiwan
THA Thailand
XXXI
Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig
• Endogenous variable:
– TRADE 0 D O:
Imports of country d from country o (i.e., exports of country o
to country d) in current US dollars
– Commodity classifications: Trade flows are based on aggregating
disaggregate trade flows according to the Standard International
Trade Classification, Revision 3 (SITC, Rev.3) at the lowest ag-
gregation levels (4- or 5-digit). Source: UN COMTRADE
– Without fuels and lubricants (i.e., specifically without petrol and
natural gas products). Cut-off value for underlying disaggregated
trade flows (at SITC Rev.3 5-digit level) is 500 US dollars.
• Explanatory variables:
XXXII
Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig
Origin country
WDI GDPUSDCR O Origin country GDP data; in current US dollars World Bank - World Development Indicato
WDI GDPPCUSDCR O Origin country GDP per capita data; in current US dollars World Bank - World Development Indicato
WEO GDPCR O Destination and origin country GDP data; in current US dollars IMF - World Economic Outlook database
WEO GDPPCCR O Destination and origin country GDP per capita data; in current US dollars IMF - World Economic Outlook database
WEO POP O Origin country population data IMF - World Economic Outlook database
CEPII AREA O area of origin country in km2 CEPII
CEPII COL45 dummy; d and o country have had a colonial relationship after 1945 CEPII
CEPII COL45 REV dummy; revised by “expert knowledge”
CEPII COLONY dummy; d and o country have ever had a colonial link CEPII
CEPII COMCOL dummy; d and o country share a common colonizer since 1945 CEPII
CEPII COMCOL REV dummy; revised by “expert knowledge”
CEPII COMLANG ETHNO dummy; d and o country share a language CEPII
CEPII COMLANG ETHNO REV at least spoken by 9% of each population
CEPII COMLANG OFF dummy; d and o country share common official language CEPII
CEPII CONTIG dummy; d and o country are contiguous (neighboring countries) CEPII
CEPII DISINT O internal distance in origin country CEPII
CEPII DIST geodesic distance between d and o country CEPII
CEPII DISTCAP distance between d and o country based on capitals 0.67√
area/π CEPII
CEPII DISTW weighted distances, see CEPII for details CEPII
CEPII DISTWCES weighted distances, see CEPII for details CEPII
CEPII LAT O latitute of the city CEPII
CEPII LON O longitute of the city CEPII
CEPII SMCTRY REV dummy; d and o country were/are the same country CEPII, revised
ISO O ISO codes in three characters of origin country CEPII
EBRD TFES O EBRD measure of foreign trade and payments liberalisation of o country EBRD
XXXIII
Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig
Destination country
WDI GDPUSDCR D Destination country GDP data; in current US dollars World Bank - World Development Indicators
WDI GDPPCUSDCR D Destination country GDP per capita data; in current US dollars World Bank - World Development Indicators
WEO GDPCR D Destination and origin country GDP data; in current US dollars IMF - World Economic Outlook database
WEO GDPPCCR D Destination and origin country GDP per capita data; in current US dollars IMF - World Economic Outlook database
WEO POP D Destination country population data IMF - World Economic Outlook database
Notes: The EBRD measures reform on a scale between 1 and 4+ (=4.33); 1 represents no or little progress; 2 indicates important
progress; 3 is substantial progress; 4 indicates comprehensive progress, while 4+ indicates countries have reached the standards and
performance norms of advanced industrial countries, i.e., of OECD countries. By construction, this variable is ordered qualitative
rather than cardinal.
• Thanks: to Richard Frensch, Osteuropa-Institut, for providing the
data set.
XXXIV
Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig
• EViews-Commands to extract selected data from main workfile:
– to select observations of countries that export to Kazakhstan:
in workfile: Proc → Copy/Extract from Current Page
→ By Value to New Page or Workfile:
in Sample - observations to copy: @all if (iso d="KAZ"). Objects to
copy: select. Page Destination: select.
– to select observations for one period, e.g. 2004:
as above but: in Sample - observations to copy: 2004 2004
– to select observations for trade flows from Germany to Kaza-
khstan for all periods:
as above but: in Sample - observations to copy: @all if (iso o="KAZ")
and (iso d="GER")
• Websites CEPII
XXXV
Bibliography
Anderson, J. E. & Wincoop, E. v. (2003), ‘Gravity with gravitas: A
solution to the border puzzle’, The American Economic Review
93, 170–192.
Davidson, R. & MacKinnon, J. G. (2004), Econometric Theory and
Methods, Oxford University Press, Oxford.
Fratianni, M. (2007), The gravity equation in international trade, Tech-
nical report, Dipartimento di Economia, Universita Politecnica delle
Marche.
XXXVI
Intensive Course in Econometrics — Section 10.4 — UR March 2009 — R. Tschernig
Pindyck, R. S. & Rubinfeld, D. L. (1998), Econometric models and
economic forecasts, Irwin McGraw-Hill.
Stock, J. H. & Watson, M. W. (2007), Introduction to Econometrics,
Pearson, Boston, Mass.
Wooldridge, J. M. (2009), Introductory Econometrics. A Modern Ap-
proach, 4th edn, Thomson South-Western.
XXXVII