university of groningen latent instrumental variables ... · my family and friends have always been...
TRANSCRIPT
![Page 1: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/1.jpg)
University of Groningen
Latent instrumental variablesEbbes, P.
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite fromit. Please check the document version below.
Document VersionPublisher's PDF, also known as Version of record
Publication date:2004
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity. s.n.
CopyrightOther than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of theauthor(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
Take-down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons thenumber of authors shown on this cover page is limited to 10 maximum.
Download date: 08-12-2020
![Page 2: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/2.jpg)
Latent Instrumental Variables– A New Approach to Solve for Endogeneity –
Peter Ebbes
![Page 3: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/3.jpg)
![Page 4: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/4.jpg)
Latent Instrumental Variables– A New Approach to Solve for Endogeneity –
![Page 5: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/5.jpg)
Published by: Labyrinth Publications
P.O. Box 334
2984 AX Ridderkerk
The Netherlands
Tel.: +31 180 463 962
Printed by: Offsetdrukkerij Ridderprint B.V., Ridderkerk
ISBN 90-5335-029-2
c© 2004, Peter Ebbes
All rights reserved. No part of this publication may be reproduced, stored in a
retrieval system of any nature, or transmitted in any form or by any means,
electronic, mechanical, now known or hereafter invented, including photocopying or
recording, without prior written permission of the publisher.
![Page 6: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/6.jpg)
Promotores: Prof. dr. M. WedelProf. dr. U. BockenholtProf. dr. drs. A. G. M. Steerneman
Beoordelingscommissie: Prof. dr. P. S. H. LeeflangProf. dr. P. J. LenkProf. dr. P. H. Franses
ISBN 90-5335-029-2
![Page 7: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/7.jpg)
Acknowledgements
This dissertation introduces, develops, and illustrates the latent instrumental
variables (LIV) approach, which is a new method to correct for potential endo-
geneity bias in commonly used linear models. It has several advantages over
traditional methods, such as ordinary least squares and instrumental variables
techniques: it estimates the regression parameters unbiasedly regardless of the
presence of regressor-error dependencies, it allows to test for such dependen-
cies in a straightforward manner, and it does not require the availability of
observed instrumental variables at hand. This dissertation is the result of four-
and-a-half years of work and could not have been conducted without the help
and support of many people.
First and foremost, I am deeply grateful to my advisors, Michel Wedel, Ulf
Bockenholt, and Ton Steerneman, for their continuous guidance and support
and for sharing their vast knowledge and experience. In particular, I would
like to thank Michel Wedel for introducing me to the (international) marketing
community, for showing me what it takes to pursue a career in academics, and
for the considerable effort and time he invested in me. As for Ulf Bockenholt,
I am grateful to him for the inspiring and constructive discussions, after which
I always had the feeling of being (back) on a great avenue with a lot of further
work ahead and many interesting new research questions. Ton Steerneman’s
excellent technical skills and his ability to teach me about the ‘abstract world
of statistics’ were essential for the well-being of this project and for enhancing
my understanding of statistics. There was always enough time to talk about
the other aspects of life as well. I am looking forward to collaborating with
them on many future research projects.
Next, I would like to express my appreciation to the members of my Ph.D.
committee for carefully reading my manuscript. This committee consists of
Peter Leeflang (University of Groningen), Philip Hans Franses (Erasmus Uni-
versity Rotterdam), and Peter Lenk (University of Michigan). Their comments,
questions, and suggestions were very constructive.
i
![Page 8: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/8.jpg)
This research has benefitted from the comments and suggestions made by the
editors and reviewers of papers that were based on parts of this thesis. In ad-
dition, this research gained from suggestions made by seminar participants at
the University of Groningen, Tilburg University, Erasmus University Rotter-
dam, the Durham Business School, McGill University, and the University of
Michigan. In particular, I would like to thank Tom Wansbeek for his effort in
studying earlier versions of our manuscripts and for providing us with valuable
suggestions to improve and position our research. I thank Paul Bekker for his
time in the early phases of my dissertation work, helping me study Jan van der
Ploeg’s thesis.
The department of Marketing and the SOM graduate school at the University
of Groningen provided me with an inspiring research and social environment,
which is one of the most important aspects of completing a dissertation project.
I would like to thank my colleagues for the warm contacts, the keen interest,
and the many pleasant “off-hours” social happenings. Being part of the board
of the GAIOO for two years gave me the opportunity to be actively involved in
Ph.D. student matters, and to organize sufficient social events to stimulate the
interdisciplinary character of doctoral research at the University of Groningen.
Many special thanks go to my paranimfen, Frits Wijbenga and Bart van de Aa,
for their help in organizing my defense and for all the pleasant social and in-
tellectual moments that we shared, in whatever form or combination.
I have special memories of my stay at the Ross School of Business at the Uni-
versity of Michigan. I acknowledge Michel Wedel’s efforts and the financial
support of both the Netherlands Organization for Scientific Research and The
Prince Bernhard Cultural Foundation, which made it possible for me to spend
a considerable amount of time at this top-tier school. In addition, Dirk Pieter
van Donk from the bureau of the SOM graduate school was of great help. My
research and my personal development have benefitted greatly from this stay.
I enjoyed being part of the community and working with faculty members and
doctoral students of the University of Michigan. My special thanks go to Fred
ii
![Page 9: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/9.jpg)
Feinberg for involving me in a challenging project on product line develop-
ment, for helping me with job market issues, and for the pleasant and joyful
talks (among other things) about ‘the Dutch’ and ‘the Americans’. I am grate-
ful to Peter Lenk, who collaborated with me on the Bayesian chapter (chapter
7) of my thesis and whose particular sense of humor made the meetings always
enjoyable. Jie Zhang’s support is greatly acknowledged and appreciated and
I look forward to (finally) starting with our self-selection project on purchase
decisions in on- and offline stores.
My family and friends have always been very supportive, which is essential
to me, and for which I cannot thank them enough. In particular, and most
importantly, the unconditional support of my parents and sister throughout all
these years, their interest in my study and work, and their encouragement in
pursuing my (international) ambitions, are invaluable to me.
Ann Arbor, MI, October 2004.
iii
![Page 10: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/10.jpg)
![Page 11: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/11.jpg)
Contents
1 Introduction 1
2 Instrumental variables: a survey 7
2.1 Introduction and bias in OLS . . . . . . . . . . . . . . . . . . 7
2.1.1 Relevant omitted explanatory variables . . . . . . . . 8
2.1.2 Measurement error . . . . . . . . . . . . . . . . . . . 10
2.1.3 Self-selection . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 The simultaneous equation model . . . . . . . . . . . 13
2.1.5 Lagged dependent variables . . . . . . . . . . . . . . 14
2.1.6 Bias in OLS when E(ε|X) 6= 0 . . . . . . . . . . . . 15
2.2 The IV approach . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Considerations when using Instrumental Variables . . 17
2.2.2 IV based solutions to the weak instrument problem . . 24
2.3 Alternative approaches to solve for regressor-error dependencies 27
2.4 Conclusions and positioning of research . . . . . . . . . . . . 31
3 The LIV model 35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 The LIV model . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Identifiability and information . . . . . . . . . . . . . . . . . 39
3.3.1 Identifiability . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Information matrix . . . . . . . . . . . . . . . . . . . 44
3.4 A test-statistic to test for regressor-error dependencies . . . . . 47
3.5 Monte Carlo experiments . . . . . . . . . . . . . . . . . . . . 48
v
![Page 12: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/12.jpg)
3.5.1 Design of the simulation study: data generation . . . . 49
3.5.2 Results for the simple LIV model (m= 2) . . . . . . . 51
3.5.3 Sensitivity analysis: usingm= 3 andm= 4 . . . . . 55
3.5.4 Results forπ , λ, σ 2ν , andσεν . . . . . . . . . . . . . . 58
3.6 An illustrative example: a simple measurement error model . . 60
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Appendix 3A Basic theorems on identifiability of mixtures . . . . 67
Appendix 3B 1st and 2nd order derivatives log-likelihood . . . . . 67
4 LIV implementation issues 77
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Additional regressors and identifiability . . . . . . . . . . . . 79
4.3 Investigating observed instrumental variables . . . . . . . . . 85
4.3.1 Testing for weak instruments . . . . . . . . . . . . . . 85
4.3.2 Testing for endogenous instruments . . . . . . . . . . 86
4.4 A simulation study . . . . . . . . . . . . . . . . . . . . . . . 87
4.4.1 Results for the regression parameters . . . . . . . . . 87
4.4.2 Results testH0 : observed IV is exogenous . . . . . . 90
4.4.3 Results testH0 : observed IV has no effect onx . . . . 91
4.4.4 Concluding remarks simulation study . . . . . . . . . 93
4.5 LIV model diagnostics . . . . . . . . . . . . . . . . . . . . . 94
4.5.1 Selection of the number of categories of the discrete
instrument . . . . . . . . . . . . . . . . . . . . . . . 94
4.5.2 Residuals, outliers, and influential observations . . . . 96
4.6 The Hausman-LIV test revised . . . . . . . . . . . . . . . . . 102
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Appendix 4A 1st and 2nd order derivatives log-likelihood of the
general LIV model . . . . . . . . . . . . . . . . . . . . . . . 108
Appendix 4B Simulation results for the exogenous regressor . . . 109
5 Estimating the return to education using LIV 111
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2 Sources of bias in the OLS estimate of the return to education . 113
vi
![Page 13: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/13.jpg)
5.2.1 Ability bias . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.2 Measurement error bias . . . . . . . . . . . . . . . . . 114
5.2.3 Heterogeneity bias . . . . . . . . . . . . . . . . . . . 116
5.2.4 Optimizing behavior bias . . . . . . . . . . . . . . . . 118
5.3 IV estimation of the returns to education . . . . . . . . . . . . 118
5.3.1 Institutional features of the schooling system . . . . . 119
5.3.2 Family background . . . . . . . . . . . . . . . . . . . 120
5.3.3 Alternative, non-IV approaches . . . . . . . . . . . . 120
5.4 Empirical results . . . . . . . . . . . . . . . . . . . . . . . . 122
5.4.1 Data description . . . . . . . . . . . . . . . . . . . . 122
5.4.2 LIV results for schooling . . . . . . . . . . . . . . . . 124
5.4.3 Relative biases and comparison with classical IV . . . 130
5.4.4 Wrap-up . . . . . . . . . . . . . . . . . . . . . . . . 134
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Appendix 5A Descriptive statistics datasets used . . . . . . . . . 138
5A.1 NLSY data . . . . . . . . . . . . . . . . . . . . . . . 139
5A.2 Brabant data . . . . . . . . . . . . . . . . . . . . . . 139
5A.3 PSID data . . . . . . . . . . . . . . . . . . . . . . . . 139
Appendix 5B Results optimal LIV model for the three datasets . . 141
6 Regressor and random-effects dependencies 145
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2 Biases caused by level-1 (Xα)– and level-2 (Xη)– dependencies 149
6.3 The case of level-2 (Xα) dependencies only . . . . . . . . . . 151
6.3.1 Testing forXα–dependencies . . . . . . . . . . . . . 151
6.3.2 Mundlak’s approach forXα–dependencies . . . . . . 152
6.3.3 The Hausman-Taylor estimator underXα–dependencies 153
6.4 Limitations in the presence of level-1 (Xη)– dependencies . . 155
6.5 Testing and solving forXη–dependencies . . . . . . . . . . . 158
6.5.1 External Instruments . . . . . . . . . . . . . . . . . . 158
6.5.2 Internal instruments: Lewbel’s approach . . . . . . . . 160
6.6 Discussion and future research . . . . . . . . . . . . . . . . . 161
Appendix 6A Classical instrumental variables (IV) estimation . . 166
vii
![Page 14: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/14.jpg)
Appendix 6B Estimation for the hierarchical linear model . . . . 166
Appendix 6C Lewbel’s instruments in a simple multilevel model . 168
7 A Nonparametic Bayesian LIV approach 171
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.2 A simple multilevel model with a general latent instrument . . 174
7.2.1 The Dirichlet process prior forθi j . . . . . . . . . . . 176
7.2.2 MCMC estimation . . . . . . . . . . . . . . . . . . . 179
7.3 Endogenous subject-level covariates and random coefficients . 184
7.3.1 Estimating the hierarchical model with general latent
instruments . . . . . . . . . . . . . . . . . . . . . . . 186
7.4 A simulation study . . . . . . . . . . . . . . . . . . . . . . . 191
7.4.1 Simulation results for the simple multilevel model . . 192
7.4.2 Simulation results for the hierarchical model . . . . . 198
7.5 Discussion nonparametric Bayesian LIV approach . . . . . . . 202
Appendix 7A The Dirichlet process . . . . . . . . . . . . . . . . 206
Appendix 7B Full conditionals: the simple multilevel model with
general LIV . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Appendix 7C Full conditionals: the hierarchical model with gen-
eral LIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Appendix 7D Iteration plots . . . . . . . . . . . . . . . . . . . . 217
8 Discussion 219
8.1 Summary and conclusions . . . . . . . . . . . . . . . . . . . 219
8.2 Limitations and future research . . . . . . . . . . . . . . . . . 224
8.2.1 Methodological (technical) issues . . . . . . . . . . . 225
8.2.2 Substantive issues . . . . . . . . . . . . . . . . . . . 231
Bibliography 237
Author index 249
Subject index 252
Samenvatting (summary in Dutch) 255
viii
![Page 15: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/15.jpg)
Chapter 1
Introduction
In this thesis we propose a new method to estimate regression coefficients
in linear regression models where regressor-error correlations are likely to be
present. This method, the Latent Instrumental Variables (LIV) method utilizes
a discrete latent variable model that accounts for dependencies between regres-
sors and the error term. As a result, observed exogenous instrumental variables
are not required. In the following chapters we introduce and illustrate the LIV
method on both simulated data and empirical applications. We show that the
LIV method has desirable properties over existing methods, such as ordinary
regression and instrumental variables methods, when regressor-error depen-
dencies are present. Each chapter is more or less self-contained and based on
articles. In the following we present the scope and outline of the thesis.
The starting point of this research is the simple linear regression model given
by
yi = β0+ β1xi + εi , (1.1)
whereyi is the dependent variable,xi the explanatory variable (regressor), and
εi is the error term or disturbance with mean zero and varianceσ 2, all inde-
pendent. The regression parametersβ0 andβ1 are the objects of inference.
We focus on a situation where the regressor is random and possibly correlated
1
![Page 16: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/16.jpg)
2 Chapter 1 Introduction
with the disturbance1, in which case it is not ‘exogenous’ but ‘endogenous’.
Regressor-error correlations may be the result of several causes and arise in a
wide variety of models, e.g. when relevant explanatory variables are omitted,
when the dependent variable influences the explanatory variable (simultane-
ity), when the sampling process is non-random (self-selection), or when the
explanatory variable is measured with error.
The standard inferential methods are invalid if regressor-error dependencies
exist. For instance, the ordinary least squares estimator for the regression pa-
rameters(β0, β1) suffers from inconsistency, in which case the true effect of the
explanatory variable on the dependent variable is systematically over- or un-
derestimated, leading to false conclusions and erroneous decision making. The
instrumental variables (IV) methods were developed to overcome these prob-
lems and have a long history in econometrics (Bowden and Turkington, 1984,
Greene, 2000, or Judge et al., 1985). Instrumentsz are variables that mimic
the endogenous regressorx as well as possible, but are uncorrelated2 with the
error termε. Once ‘valid’ instruments are available, the regression param-
eters can be consistently estimated via, for instance, two-stage least squares
techniques. However, finding exogenous instruments is hard work, and empir-
ical researchers are often confronted with weak instruments. An instrument is
‘weak’ when it only weakly correlates with the endogenous regressors. If in-
struments are weak and/or not exogenous, the standard instrumental variables
estimation and inferential procedures are inaccurate and produce “bad results”,
that are potentially worse than simply ignoring the endogeneity problem and
relying on biased ordinary least squares. Hence, small biases in ordinary least
squares estimates can become large biases when invalid instruments are used
(Stock, Wright, and Yogo, 2002, or Hahn and Hausman, 2003)3. Besides the
problems of potential weak and/or endogenous instruments, these variables
may simply not be available to a researcher, whereas collecting them is time
1At least in the weak sense that plim∑
i xi εi 6= 0, or that E(xi εi ) 6= 0 implying E(εi |xi ) 6=0, e.g. White (2001) or Ferguson (1996).
2The instrument is said to be ‘exogenous’.3This was already observed by Sargan in the 1950s, see e.g. Arellano (2002).
![Page 17: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/17.jpg)
3
consuming and expensive. The main purpose of this research is to develop
a new method (the latent instrumental variables (LIV) method) that does not
require observed instrumental variables at hand. As such, the difficult task of
finding instruments and the inferential issues in presence of bad quality instru-
ments are circumvented. In fact, the ‘optimal’ LIV instruments are estimated
as a “by-product” from the available data.
The above discussion on the problems surrounding instrumental variables es-
timation is considered in greater detail inchapter 2. The literature review pre-
sented in this chapter covers most of the recent studies on weak instruments
and contains several references to empirical research (labor economics, mar-
keting, industrial economics) that aims at solving regressor-error dependen-
cies. Furthermore, we point out a few alternative approaches to instrumental
variables estimation that may be useful in solving regressor-error dependen-
cies. This overview of the literature is a selection of issues that motivates the
development of the latent instrumental variables (LIV) method. We conclude
chapter 2by highlighting the relevance and contribution of this research.
In chapter 3 we introduce the latent instrumental variable (LIV) model. It
solves regressor-error correlations in linear models by postulating that the in-
strumental variable is discrete and latent. As a byproduct, the method allows
for testing for endogeneity without requiring access to observable instruments.
Our simulation results show that the LIV method yields consistent estimates
for the model parameters without having observable instrumental variables at
hand. These results are superior to OLS estimates which are biased when the
regressors are not exogenous. The proposed test statistic to test for exogene-
ity is shown to have a reasonable power throughout a wide range of settings.
Furthermore, we prove identifiability of all model parameters. We apply the
LIV method to an empirical measurement error application where a labora-
tory dummy instrumental variable is available. We show that the predicted
LIV dummy instrument is identical to this observed laboratory instrument.
Hence, the LIV estimate for the regression parameter,without using the ob-
served instrument, is identical to the classical IV estimate thatdoesrequire the
![Page 18: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/18.jpg)
4 Chapter 1 Introduction
existence of an observed instrument. We conclude that our ‘instrument-free’
approach can be successfully used to estimate regression parameters in pres-
ence of regressor-error correlations, and to test for this dependency without the
necessity of first finding valid instruments.
The method proposed inchapter 3 is extended inchapter 4 to more gen-
eral settings. We extend the model to a situation where several exogenous
regressors are available. Furthermore, we allow for the possibility that ob-
served instrumental variables are available. Using similar techniques as for the
more simple LIV model, we prove that all model parameters can be identified.
Importantly, from this proof it follows that the general LIV model is still iden-
tified, even when possible observed instruments have no or very small effects
on the endogenous regressor. In such a case, the classical IV model is uniden-
tified or weakly identified, respectively. This identifiability result suggests a
straightforward approach to examine instrument weakness, that is based on
existing testing principles. Furthermore, using a similar reasoning, it suggests
a straightforward test of instrument exogeneity (validity). To the best of our
knowledge, such tests to independently investigate instrument exogeneity and
weakness for each instrument have not appeared in the literature before. We
illustrate both tests by the means of a simulation example and show that the
proposed tests have a reasonable power under a variety of settings. Besides, we
propose several diagnostics to complete an LIV analysis. We propose several
statistics to choose among the number of categories of the discrete LIV instru-
ment. Furthermore, we examine the robustness of the LIV estimates towards
misspecification of the likelihood equation and suggest how to examine resid-
uals. We adapt standard methods from regression models to detect outliers and
influential observations.
The proposed LIV model, tests, and diagnostics are applied inchapter 5. We
examine the effect of education on income, where the variable ‘education’ is
potentially endogenous due to omitted ‘ability’ or other causes. We review
part of the schooling literature and discuss the problems associated with clas-
sical instrumental variables estimation. As will become clear, the classical IV
![Page 19: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/19.jpg)
5
method has produced a less than satisfactory solution in estimating the return
to education. Importantly, researchers who use different sets of instruments
arrive at different conclusions in terms of size and magnitude of the bias found
in the OLS estimate for the return to eduction. We examine three empirical
datasets. In all three applications, we find an upward bias in the OLS esti-
mates of approximately 7%. Our conclusions agree closely with recent results
obtained in studies with twins that find an upward bias in OLS of about 10%
(Card, 1999). Diagnostic evaluations demonstrate that the LIV method pro-
vides a satisfactory fit of the data. We also find that for each of the three
datasets the classical IV estimates for the return to education point to biases
in OLS that are not consistent in terms of size and magnitude. The proposed
diagnostics and tests to examine the validity of available observed instruments
indicate that in two of the three datasets the used instruments are potentially
weak and/or endogenous. Our conclusion is that LIV estimates are preferable
to the classical OLS and IV estimates in understanding the effects of education
on income.
In chapter 6 we consider endogeneity problems in multilevel models, i.e.
when data has an hierarchical structure. As before, the explanatory variables
are assumed to be independent of the random components at various levels.
However, in many applications this is an unrealistic assumption. When the
same cross-section units are observed over time, for instance, or when data on
siblings or twins is available, multilevel models may in fact be used to solve
regressor-error correlations at a lower level. In this chapter we show that much
care is required in relying on these methods in actual applications. We re-
view methods that can be used to test for different types of random effects –
regressor dependencies. Secondly, we present results from Monte Carlo stud-
ies designed to investigate the performance of these methods, and, finally, we
discuss estimation methods that can be used when some, but not all of the
random effects – regressor independence assumptions are violated. Because
current methods are limited in various ways, we will also present a list of open
problems and suggest solutions for some of them. As we will show, the issue
of regressor random – effects independence has received some attention in the
![Page 20: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/20.jpg)
6 Chapter 1 Introduction
econometrics literature, but this important work has had little impact on cur-
rent research practices in the social and behavioral sciences.
In chapter 7we take parts of the results ofchapter 6a step further and develop
sophisticated nonparametric Bayesian methods (Dey, Muller and Sinha, 1998)
to solve regressor-error dependencies in multilevel models at various stages of
the model. This method solves some of the problems addressed inchapter
6 and is a generalization of the standard LIV model in the sense that we do
not impose restrictions (discreteness) on the distribution of the instruments. In
fact, we let the data determine the best distribution. This is an important ad-
vantage as it does not require an a priori specification of the ‘right’ number
of categories of the unobserved discrete instrument. Because we take fully
advantage of Bayesian estimation methods, the proposed model can readily
be adapted and extended to more general and more complex model structures.
Furthermore, insight in small sample properties of the estimation results is
more easily obtained and inference does not rely on asymptotic results. This
chapter is still work-in-progress and the results are preliminary, yet promising.
We illustrate the potential usefulness of this approach to regressor-error depen-
dencies and suggest steps for further research.
In chapter 8 we present a discussion of the proposed LIV method and the
results found. Furthermore, we present future research directions.
![Page 21: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/21.jpg)
Chapter 2
Instrumental variables: asurvey
2.1 Introduction and bias in OLS
The standard linear regression modely = Xβ + ε is an important tool in
(applied) statistical science to model the effect of a set of explanatory vari-
ables on a dependent variable. Herey = (y1, ..., yn)′ denotes then × 1 vec-
tor of observations on the dependent variable,X ∈ Rn×k denotes then × k
matrix of observations on the explanatory variables (regressors),β is the un-
known k × 1 vector of regression parameters andε = (ε1, ..., εn)′ is an un-
observed stochastic disturbance. Because of identifiability it is assumed that
rank X = k < n. Although the standard linear regression model is frequently
used in cross-sectional applications, in many situations data has an hierarchical
structure (see also chapter 6). For instance, when it is investigated how work-
place characteristics affect a worker’s productivity, both workers and firms are
units in the analysis. Similarly, hierarchical data arise in the context of panel
data, when multiple observations are available on the ‘objects’ under study.
This type of data is modeled through multilevel models, panel data models,
or hierarchical linear models, which generalize the standard linear regression
model (Judge et al., 1985, Wooldridge, 2002, Snijders and Bosker, 1999, Bryk
and Raudenbush, 1992, or Greene, 2000).
7
![Page 22: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/22.jpg)
8 Chapter 2 Instrumental variables: a survey
An important assumption in these models is the independence of the explana-
tory variables (X) and the random error components (ε). In this case the re-
gressors are said to be ‘exogenous’ and are assumed to be determined outside
the model. Failure of this assumption may lead to biased or inconsistent es-
timates for the parameters of interest and therefore to wrong conclusions and
erroneous decision-making.
Unfortunately, in many situations the assumption of regressors and error in-
dependence is not satisfied. In this case the regressors are often said to be
‘endogenous’. Endogeneity can arise from a number of different sources: (1)
relevant omitted variables, (2) measurement error in the regressors, (3) the
problem of self-selection, (4) simultaneity, and (5) serially correlated errors
in the presence of a lagged dependent variable in the set of regressors. Ruud
(2000) shows that the possibilities (2)-(5) can be viewed as a special case of
(1). A similar argument is put forward by Wooldridge (2002), who notes that
the distinction among the possible causes of regressor-error correlation is not
always clear. Card (1999), for instance, argues that measurement error in the
education variable1, on the one hand, results in a downward bias of the effect
of education on income, whereas omitted ability bias may, on the other hand,
results in a positive bias in OLS. Similarly, Nevo (2000) states that price en-
dogeneity can be generated by a price-setting firm taking unobserved product
attributes into account, or can be a result of the mechanics of consumer’s opti-
mization problem. These causes may enforce or offset each other to an extent
that depends on the empirical context. In the following subsections we briefly
illustrate the previously mentioned causes and provide some references to em-
pirical studies in labor economics, marketing, and industrial economics that
are confronted with these problems.
2.1.1 Relevant omitted explanatory variables
Card (1999, 2001) and Uusitalo (1999), among others, consider the estimation
of the causal effect of education on earnings, where ability is a typical omit-
1‘Education’ is usually measured by ‘years of schooling’, see also chapter 5.
![Page 23: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/23.jpg)
2.1 Introduction and bias in OLS 9
ted variable. Individuals with a higher ability are potentially more successful
on the labor market by earning higher wages, whereas these individuals may
acquire more education. As such, unobserved ability affects both education
and earnings, causing a dependency between the regressor ‘education’ and the
model error term (see also chapter 5).
Marketing modelers are often faced with omitted variables. Wansbeek and
Wedel (1999) put forward that the exogeneity assumption of regressors, in-
cluding price, is a shortcoming of standard market response models2. Shugan
(2004) observes an increasing focus of reviewers on endogeneity. The lack
of exogeneity of regressors due to the omission of ‘key’ aspects in marketing
models is gaining more interest in marketing research studies. As store man-
agers set the marketing mix variables (e.g. price or advertising variables), their
decision is based on (local) market information or product characteristics un-
known to the researcher. This unobserved information may affect consumer
behavior, which induces a correlation between the error term and the regres-
sors, usually price, in a typical marketing model. Examples of unobserved lo-
cal market information are competition, word-of-mouth-effects, taste changes,
local market shares, or coupon availability. For recent omitted variables stud-
ies in marketing, see e.g. Villas-Boas and Winer (1999), Chintagunta (2001),
Nevo (2001), Petrin and Train (2002), or Vilcassim and Chintagunta (1995).
An omitted variable model is given by (Judge et al., 1985).
E (yi |xi , wi ) = x′iβ + w′i γ, (2.1)
where thewi ’s are the latent or unobserved variables. Conditioning on the
observablexi but omittingwi , gives
E (yi |xi ) = x′iβ + E (w′i |xi )γ, (2.2)
2Models that relate sales to marketing mix variabels.
![Page 24: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/24.jpg)
10 Chapter 2 Instrumental variables: a survey
which is unequal tox′iβ whenever: (i) E(w′i |xi ) 6= 0 (i.e. when omitted and
included regressors are not orthogonal) and (ii)γ 6= 0 (i.e. when the omit-
ted regressors are relevant). The resulting bias in the OLS estimator forβ
is equal to E(βOLSn − β) = 5γ of which magnitude and size depends on
5 = (X′X)−1X′W andγ . As can be seen, all estimated coefficients inβ
are affected by the omission of relevant explanatory variables.
2.1.2 Measurement error
Measurement error in regressors arises when the variables specified in the re-
gression model are not similar to the observed measure. This may arise, for
instance, due to method- or instrument-error, the absence of a ‘physical’ mea-
sure for the true construct, like IQ, ability, perceptions, ‘total price’ versus
‘money price’, or incorrectly aggregated and combined measures from differ-
ent data-sources, like GDP, price inflation or productivity of employees. When
the regressors used do not conform to the variables included in regression mod-
els, it is unlikely that they are independent of the random components.
As stated before, a good measure of education that corresponds to the qualities
that employers are willing to pay for, needs to be available when estimating the
effect of education on income. It is common practice to use ‘years of school-
ing completed’ as a measure for ‘total education’. Apart from errors due to
recall or recording errors in ‘years of schooling completed’, it can be ques-
tioned whether this measure fully represents education levels, because individ-
uals may, for instance, educate themselves with evening courses or on-the-job
training. Besides, as most studies on labor economics rely on household in-
terview data, all of the variables are subject to some error (Griliches, 1977).
Even if the errors are small, their effect may be magnified if more variables are
added in an attempt to control for e.g. omitted ability bias3 (Card, 1999, 2001,
or Griliches, 1977).
Nevo (2000) and Sudhir (2001) argue that the measure for price used in es-
3See also subsection 5.2.2.
![Page 25: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/25.jpg)
2.1 Introduction and bias in OLS 11
timating aggregate logit demand in inferring competition may be measured
with error. The price variable used in these studies is often ‘list price’ or an
aggregated price measure, whereas the model specification assumes that all
consumers face the same product characteristics. However, if consumers face
different prices in different stores, regions, or weeks, depending on the data,
the price measure used exhibits measurement error. Instead, it would be ideal
to estimate the model with transaction prices (cf. Sudhir, 2001). Bagozzi, Yi
and Nassen (1999) explore measurement error in marketing research data. For
instance, questionnaire items or rating scales that are used to measure percep-
tions, beliefs, attitudes, judgements, or other theoretical constructs are likely to
reflect measurement error because of the absence of physical measures corre-
sponding to these variables. Besides, marketing research data may be subject
to method errors like halo effects4, interviewer effects, or social desirability
distortions. Their findings suggest that measurement error in marketing data
may be large and needs to be corrected for in empirical applications to improve
decision making and inferences.
We illustrate the problem of measurement error in the simple bivariate case.
Consider
yi = β0+ β1χi + εi ,
whereχi is the ‘true’ unobserved construct. Instead,xi is observed andxi =χi + νi , with E (εi ) = E (νi ) = 0, E(ε2
i ) = σ 2ε > 0, E(ν2
i ) = σ 2ν > 0, and
E (εi νi ) = E (χi εi ) = E (χi νi ) = 0. These two equations can be combined,
giving yi = β0 + β1xi + εi , with εi = εi − βνi . The OLS estimator forβ1 is
biased towards zero as E(εi xi ) = −β1σ2ν 6= 0, which implies that E(εi |xi ) 6=
0. For more details, see e.g. Plat (1988), Wansbeek and Meijer (2000), or
Carroll, Ruppert and Stefanski (1995).
4A problem that arises in data collection when there is carry-over from one judgement toanother (source: www.marketingpower.com).
![Page 26: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/26.jpg)
12 Chapter 2 Instrumental variables: a survey
2.1.3 Self-selection
The problem of self-selection arises when individuals tend to select themselves
in a certain state, like union vs. non-union member (Vella and Verbeek, 1998),
or treated vs. not treated (Angrist, Imbens and Rubin, 1996), on the basis of
economic, or other, usually unknown, arguments. For instance, Angrist (1990)
considers the effect of Vietnam veteran status on civilian income to investigate
whether these veterans ought to be compensated by the US government for
their possible loss of personal income caused by serving the army. However,
civilian earnings are not easily compared by Vietnam veteran status simply be-
cause certain individuals with fewer civilian opportunities are more likely to
enlist than others, and such individuals would have earned less income regard-
less of serving the army.
Hamilton and Nickerson (2003) give an overview of endogenous decision-
making in strategic management, where managers often make strategic orga-
nizational choices between several competing strategies not ‘randomly’ but
based on expectations and experience. Similarly, data collected on the internet
may suffer from self-selection. Certain individuals are more likely to be on
the internet and are therefore more likely to fill-in the web-survey, to click on
the web pages or to purchase products online. If these unobserved individual
characteristics also influence web behavior, preferences or perceptions5, then
part of the effect of these latent characteristics is falsely attributed to internet
usage. One could argue that these individuals would have reacted differently
regardless of their frequency of being on the internet. These issues are impor-
tant, for instance, when investigating purchase quantities decisions in online
stores versus brick-and-mortar (“offline”) stores, whether or not to buy a cer-
tain product category, or whether or not to buy a certain brand, given category
and shopping environment.
To illustrate, a simple self-selection model is given by
5Or our phenomenon under study.
![Page 27: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/27.jpg)
2.1 Introduction and bias in OLS 13
yi = x′i (β + δ)+ εi if i ∈ I,= x′iβ + εi if i ∈ II ,
where I and II denote a certain state (e.g. treated vs. non treated, or web-user
versus non web-user). More compactly,
yi = x′iβ + di x′i δ + εi ,
with di = 1 if i ∈ I and di = 0 otherwise. From this representation it
can be seen thatdi is a dummy regressor and standard estimation fails when
E (εi |di ) 6= 0. This assumption is possibly violated for the examples given in
the previous paragraph. For more details on self-selection problems, see e.g.
Vella (1998), Wooldridge (2002), or Bowden and Turkington (1984).
2.1.4 The simultaneous equation model
Ordinary (or hierarchical) regression analysis will not be appropriate when
the right-hand variables are simultaneously determined along with the depen-
dent variables. However, it is often hard to rule out such feedback loops. Ex-
amples are an economic agent making choices regarding education or labor
market participation (Card, 1999, 2001) or the price setting behavior of firms
while interacting with competition. Several studies consider simultaneity in
prices and demand for markets with differentiated products, given a structure
for competition. The price-setting behavior of firms due to e.g. unobserved
product characteristics like coupon availability, national advertising, shelf-
space (al)location, and other retail environment characteristics, or competitor’s
(re)actions causes endogeneity. Berry’s (1994) work in dealing with price en-
dogeneity in aggregated models while using instrumental variables has been
widely applied and adapted. For instance, Nevo (2001) estimates a structural
demand-supply model for the ready-to-eat cereal industry; Besanko, Gupta and
Jain (1998) consider a scanner-data application for the two categories yoghurt
and catsup; Berry, Levinsohn and Pakes (1995) and Sudhir (2001) develop a
market equilibrium model with competitive pricing for an automobile market
to investigate automobile pricing and competition. Using a more simple model
for demand, Gasmi, Laffont and Vuong (1992) model collusive behavior on
![Page 28: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/28.jpg)
14 Chapter 2 Instrumental variables: a survey
price and advertising in a soft-drink market.
A simple supply and demand model for a product or good is given by
ydt = (xd
t )′βd + γ d pt + εd
t
yst = (xs
t )′βs+ γ s pt + εs
t ,
where variables inxdt are factors that affect the demand or behavior of con-
sumers, whereas the variablesxst only influence the behavior of producers. The
price pt is determined such thatydt = ys
t = yt . When the demand equation
ydt = (xd
t )′β+γ d pt+εd
t is estimated, it cannot be assumed that E(εdt |pt) = 0,
because price is simultaneously determined with the demanded quantity, i.e.
unobserved positive shocks in demand or competitor (re)actions shift the de-
mand curve upward, implying a higher equilibrium price (ceteris paribus) (van
der Ploeg, 1997, or Asher, 1983). In this case, OLS can not be used to estimate
the parameters of the demand equation. For more technical details, see e.g.
Judge et al. (1985) or Davidson and MacKinnon (1993).
2.1.5 Lagged dependent variables
The presence of lagged dependent variables in the set of regressors violates the
exogeneity assumption when serial correlation is present. It is well known that
OLS estimation should not be used see, for instance, White (2001). Consider
yt = x′tβ1+ yt−1β2+ εt
εt = φεt−1+ vt , (2.3)
where e.g. yt are the sales at timet , xt are promotional activities at timet ,
andyt−1 is included to represent lagged effects of promotional activities held
in the past. Suppose that thevt are i.i.d., |φ| < 1, |β2| < 1, E(vt) = 0,
andvt is independent ofyt andxt , and assume that all second order moments
exist. Nowεt yt−1 = φεt−1yt−1 + vt yt−1, so that E(εt yt−1) = φE (εt−1yt−1).
![Page 29: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/29.jpg)
2.1 Introduction and bias in OLS 15
Furthermore, E(εt yt) = x′tβ1E (εt) + β2E (yt−1εt) + var(εt). By using the
stationarity ofεt it follows that
E (yt−1εt) =φ
1− φβ2
var(εt),
and E(εt |yt−1) 6= 0, unlessφ = 0. Davidson and MacKinnon (1993) (p.
681) make a stronger statement and argue that the OLS estimator is biased in
all models when lagged dependent variables are present (yet consistent when
φ = 0).
In certain situations explanatory variables may ‘act’ as lagged dependent vari-
ables, which can easily be overseen. This is illustrated by Gonul, Kim and Shi
(2000), who examine the effect of sending out catalogues on the probability
to buy products from that catalogue. The mailing variable and other customer
shaped promotional activities, are often functions of passed sales, which im-
plicitly introduces problems of the nature described above.
2.1.6 Bias in OLS when E(ε|X) 6= 0
From the preceding subsections it can be concluded that regressor-error de-
pendencies may exist for many different applications. It follows immediately
that the OLS estimator, given byβOLSn = β + (X′X)−1X′ε, where E(ε) = 0,
is biased when E(ε|X) 6= 0, and it loses its attractiveness as an estimator.
Similarly, in absence of heteroscedasticity and autocorrelation, the usual –for
degrees of freedom corrected– estimator for the error variance that is based on
the OLS residuals, is unbiased when E(ε|X) = 0, see e.g. Verbeek (2000)
(p. 19). Otherwise it can be expected that the true value isunderestimated,
since, on average, conditioning reduces the variance of the variable subject to
the conditioning (cf. Greene, 2000) (p. 81).
Unfortunately, the bias in the OLS estimates does not reduce when the sam-
ple size gets larger. More specifically, the OLS estimates are inconsistent, and
plim(βOLSn ) 6= β and plim(σ 2
n,OLS) < σ 2, but this inconsistency can be reduced,
![Page 30: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/30.jpg)
16 Chapter 2 Instrumental variables: a survey
at least in large samples, by using instrumental variables (White, 2001, or Fer-
guson, 1996). The instrumental variables (IV) approach is discussed next.
2.2 The IV approach
The instrumental variable (IV) method assumes that a set of variablesZ, called
instrumental variables, is available. These instruments should be uncorrelated
with the error termε, i.e. E(ε|Z) = 0, and explain part of the variability in
the endogenous regressors. This implies that the instrumentsZ cannot have
a direct effect ony (the instrumentsZ are ‘exogenous’). The standard IV re-
gression model is obtained by augmenting the standard linear regression model
with a model for the endogenous regressors and the instruments, namely
y = Xβ + εX = Z5+ V (2.4)
wherey, X, andβ are defined as before,Z is ann× q matrix containing the
instrumental variables, andV is ann × k matrix containing the error terms.
The matrix5 represents the effect of the instruments on the endogenous re-
gressors. The exogenous variables inX are assumed to appear inZ as well
and should not be omitted (Wooldridge, 2002). It is assumed for identifiability
thatq ≥ k and rankZ = q < n. The correlation betweenX andε, i.e. the
degree of endogeneity, arises because of nonzero covariances betweenε and
V . The errors are assumed to have mean zero. It can be seen from (2.4) that
the endogenous regressors are ‘split’ into an exogenous part and an endoge-
nous part. This IV model is a special case of a simultaneous equation model
(SEM), which is well-known in econometrics. The most common estimators
for β are the 2SLS estimator (or a method of moments estimator) and the lim-
ited information maximum likelihood (LIML) estimator, which is in fact the
maximum likelihood estimator of (2.4). 2SLS is most frequently used because
of its availability in many standard computing packages.
Once instruments are available, the IV estimator is given by
![Page 31: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/31.jpg)
2.2 The IV approach 17
β IVn = (X′PZ X)−1X′PZ y, (2.5)
wherePZ = Z(Z′Z)−1Z′, and is consistent and approximately normally dis-
tributed for largen when (i) plim(1/n)Z′ε = 0, and (ii) both plim(1/n)Z′Z
and plim(1/n)Z′X exist and have full column rank. Unbiasedness of the IV
estimator is discussed in the next subsection. One often relies on large-sample
analysis in examining this estimator because its expected value does not exist
when the number of instruments equals the number of explanatory variables
(cf. Wooldridge, 2002, p.101). Standard inferential procedures can be em-
ployed to learn about the model parameters or to test hypotheses (Bowden and
Turkington, 1984, or White, 2001). The maximum likelihood (LIML) estima-
tor can be computed with a little more effort and, provided that the instruments
are not too weak, the asymptotic properties of the 2SLS and LIML estimator
are the same (Davidson and MacKinnon, 1993, van der Ploeg, 1997, Kleiber-
gen and Zivot, 2003).
2.2.1 Considerations when using Instrumental Variables
The problem in empirical applications is how or where to find ‘valid’ instru-
ments. In general, there are no clear guidelines, and instruments may not be
easy to obtain. Besides, it can be very expensive to obtain additional data. As
such, instruments are often chosen by ad hoc arguments or even by availability,
resulting in potential invalid instruments. The condition E(ε|Z) = 0 requires
that there is no direct association between the instruments and the dependent
variable, which is debatable in many empirical situations.
Wooldridge (2002, p.88), for instance, discusses the (in)validity of the draft
lottery number instrument used in Angrist (1990) to estimate the effect of Viet-
nam veteran status on personal income. Although the draft lottery number ap-
pears to be random, individuals who are more likely to get drafted may chose
to obtain more education to increase the chance of obtaining a draft postpone-
ment or employers may be more willing to invest in educating and training
individuals who are unlikely to be drafted. Bound, Jaeger and Baker (1995)
![Page 32: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/32.jpg)
18 Chapter 2 Instrumental variables: a survey
question the exogeneity of the quarter of birth instruments used by Angrist and
Krueger (1991) who estimate the effect of schooling on income. They present
evidence that a weak correlation between quarter of birth and wages, indepen-
dently of the effect of quarter of birth on education, exists that is sufficiently
strong to have an effect on the IV results. Card (1999, 2001) provides more
extensive summaries of debates on the validity of family background variables,
like parental education, and institutional features of the schooling system vari-
ables, like the presence of a nearby college, as instruments for the endogenous
regressor schooling, see also chapter 5. In estimating demand, lagged prices
or promotional variables are often used as instruments in marketing response
models, but these are not valid, for instance, when reference prices exist6 and
are historically formed (cf. Bronnenberg and Mahajan, 2001). Yang, Chen and
Allenby (2003) note that lagged prices may not be appropriate due to reasons
as forward buying and stockpiling. Besides, treating lagged variables as ‘ex-
ogenous’ is a potential source of endogeneity itself (see also Arellano, 2002
(p.455)). Nevo (2001) used price data from other markets as instrumental vari-
ables for price, but notes that these instruments are invalid when common (na-
tional) demand shocks occur, or when advertising or promotion activities are
coordinated across markets. This is more likely when the same manufacturer
or retailer is active in several markets. Although cost drivers may be potential
instrumental variables for price, Nevo (2000) (p. 546) concludes that these are
rarely observed, while proxies for cost usually do not exhibit sufficient varia-
tion.
Exogeneity of instruments is only one of the two criteria for an instrument
to be valid, in addition, available instruments may be weak in the sense that
they are poorly correlated with the endogenous regressors. Stock, Wright and
Yogo (2002) state: “Empirical researchers often confront weak instruments.
Finding exogenous instruments is hard work, and the features that make an IV
6The reference price is the ‘expected price’ of a product. Several studies have found asym-metric effects when the perceived price differs from the reference price. The effect of thereference price on demand depends (among other things) on the convenience during the buyingprocess, on the familiarity of the brand, and on the type of store the product is bought (Leeflang,1994) (in Dutch).
![Page 33: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/33.jpg)
2.2 The IV approach 19
plausible exogenous [...] can also work to make the instrument weak”. Unfor-
tunately, statistical properties of IV estimators and inferential procedures based
on these, turn out to be sensitive to the choice and validity of the instruments,
even for large sample sizes. Consequently, researchers who study the same
substantive question but use different instruments may end up with another
conclusion. In the following we will review some recent results on the prob-
lem of weak instruments that appeared in the econometric literature. Most of
the following discussion is developed for the linear (non-hierarchical) regres-
sion model for cross sectional studies, see Stock, Wright and Yogo (2002), and
Hahn and Hausman (2003) for more details7.
Weak instruments
Recent results in the econometric literature has shown that the presence of
weak instruments does not only reduce the precision of the estimates, but may
also lead to biased and inconsistent estimates that are potentially larger than
OLS. Furthermore, standard asymptotic approximations break down (Staiger
and Stock, 1997, Bound, Jaeger and Baker, 1995, Hahn and Hausman, 2002 or
Kleibergen and Zivot, 2003). As a consequence, standard hypothesis tests and
confidence intervals are unreliable. Weak instruments may arise when the in-
struments do not have a high degree of explanatory power for the endogenous
regressors or when the number of instruments is large (cf. Hahn and Hausman,
2002, 2003). In the following we discuss three potential pitfalls with IV esti-
mation in the presence of weak instruments: (1) the finite sample bias of 2SLS,
(2) situations where the instruments are potentially correlated withε, i.e. they
are not exogenous, and (3) the poor asymptotic approximation to the sampling
distribution of IV estimators.
In finite samples the IV estimator (2SLS) is biased in the same direction as
OLS. This fact is often unnoted in empirical studies. Even when E(ε|Z) = 0,
7In a survey article on Sargan’s work on instrumental variables estimation, Arellano (2002)observes that “Many of the themes [on instrumental variables estimation] that appeared [...] inthe econometrics literature of the 1980s and 1990s were presented in a surprisingly mature wayin Sargan’s 1958 and 1959 articles”.
![Page 34: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/34.jpg)
20 Chapter 2 Instrumental variables: a survey
β IVn = β+(X′PZ X)−1X′PZε is, in general, biased as E(X′PZ X)−1X′PZε 6= 0.
This bias arises because coefficients5 in (2.4) are not observed. If we had ob-
servedZ5, an OLS regression ofy on Z5would be unbiased. Instead, an esti-
mate of5 has to be obtained from a regression ofX on Z. Hahn and Hausman
(2002, 2003), Buse (1992), Stock, Wright and Yogo (2002), or Bound, Jaeger
and Baker (1995) show that this finite sample bias8 is a function of (among
other things) the number of instruments, which suggests that augmenting the
set of instruments increases the bias in the estimator. However, as Buse (1992)
shows, the bias will only be proportionally larger when the number of instru-
ments grows faster than the rate of explained variance of the endogenous re-
gressors. As a consequence, adding important or strong instruments does not
necessarily increase the bias, however, adding less important instruments, or
having weak instruments, will undoubtedly lead to more biased results. Bound,
Jaeger and Baker (1995) and Hahn and Hausman (2003) show that the bias is
inversely related to theF-statistic (the Fisher Statistic) of the regression of the
endogenous explanatory variable on the instruments. These results suggest that
the (partial)R2 andF-statistic of the first stage regression (i.e. the regression
of X on Z in (2.4)) are useful as rough guides to the quality of IV estimates
and should routinely be reported (cf. Bound, Jaeger and Baker, 1995). The
LIML estimator is known to have no finite moments and has thicker tails. As
such, it is generally less sensitive to the addition of superfluous instruments
(cf. Kleibergen and Zivot, 2003). Nevertheless, when the IVs are weak, even
LIML may not solve the problem (cf. Hahn and Hausman, 2003).
A second problem associated with weak instruments is the inconsistency of
the IV estimator relative to OLS when the instrument is potentially correlated
with ε, i.e. it is endogenous itself. Bound, Jaeger and Baker (1995) show that
the relative inconsistency of IV to OLS is equal to (for simplicity it is assumed
thatk = q = 1)
8They find that for one endogenous regressor, the expectation does not exist when only oneinstrument is available. See also Wooldridge (2002) who states that the number of momentsthat exists is one less than the number of overidentifying restrictions.
![Page 35: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/35.jpg)
2.2 The IV approach 21
plim β IVn − β
plim βOLSn − β
= ρz,ε/ρx,ε
ρx,z
,
whereρx,z indicates the correlation betweenx and z, the other terms being
defined similarly. When the instrument is weak,ρx,z→ 0, implying that even
a small correlation between thez andε can produce a large relative inconsis-
tency in the IV estimator, making the inconsistency of IV potentiallylarger
than in OLS9.
Thirdly, if the instruments are weak, then, even in large samples, classical
(first-order) asymptotic approximations are poor. This is illustrated by (among
others) Nelson and Startz (1990). As they note, conventional wisdom suggests
that when the instruments are weak, the classical asymptotic variance matrix
will be large and the asymptotic distribution ofβ is dispersed. However, it
is also shown that the asymptotic distribution is a very poor approximation
to the exact finite density function (which is bimodal, fat tailed and concen-
trated closer to the probability limit of least squares than the true value). If
the asymptotic variance ofβ IVn decreases, i.e. when the instruments are gener-
ally stronger, the classical approximation becomes better. As a consequence,
with weak instruments inferential procedures based on classical asymptotic
results are unreliable. Although finite sample methods could be used in these
situations, their use in practice is limited due to restrictive assumptions, com-
putationally intractable distributions, or the absence of a clear framework for
testing or constructing confidence intervals. The weak instruments problem is
not only relevant for “small samples” and it cannot be ignored in large sam-
ples. This is illustrated by Bound, Jaeger and Baker (1995), who show that for
the Angrist and Krueger (1991) study it is possible to obtain similar results if
artificial random (dummy) instrumental variables are used, despite the sample
size of 329500 observations.
9See also Hahn and Hausman (2003), section V.
![Page 36: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/36.jpg)
22 Chapter 2 Instrumental variables: a survey
Examining instrument validity
As (asymptotic) properties of IV estimators are sensitive to the choice of valid
instruments, regardless of the sample size, measures for ‘weakness’ are desir-
able. Recently, the outcome of several studies suggests to reportF-statistics
andR2 measures of the first stage regression routinely. Stock, Wright and Yogo
(2003), for instance, suggest that the first-stageF-statistics must be larger
than 10 for 2SLS inference to be reliable. Furthermore, Bowden and Turk-
ington (1984) argue that one should find instruments that maximize all of the
canonical correlations withX. Staiger and Stock (1997) develop a data-based
measure for the relative bias, where large values should alert the researcher
to potential problems of correlations between the instruments and the random
components. Bowden and Turkington (1984) and Verbeek (2000) (among oth-
ers) present a test for instrument admissibility whenq > k (overidentified).
If the test rejects, there is sample evidence against the joint validity of the in-
struments, although it is not possible to determine which one is incorrect. The
method in Bowden and Turkington can be used to examine whether an addi-
tional set of instruments is admissible, but this test does not address potential
weakness of the instruments. In fact, Hahn and Hausman (2003) argue that
this test rejects too often when weak instruments are present, which is a major
drawback since it is often used to test economic theory embodied in the model.
Hahn and Hausman (2002) have recently developed a test for the validity of
instrumental variables, which jointly addresses exogeneity and strength. It
is based on the general Hausman specification test approach (Hausman, 1978)
and adopts the second order asymptotic approximations of Bekker (1994). The
idea is to compare forward and backward 2SLS estimators, which are shown to
be equivalent under the null hypothesis that conventional asymptotics is valid.
The test statistic is fairly simple to compute and is shown to have at distribu-
tion under the null hypothesis. Rejection of the null hypothesis might indicate
a failure of the orthogonality assumption of the instruments or that the instru-
ments could be weak. Hahn and Hausman (2002) suggest a two step approach
based on this test to decide whether 2SLS, LIML, or none should be used.
![Page 37: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/37.jpg)
2.2 The IV approach 23
In chapter 4 we propose another method that can be used to investigate the
validity of observed instruments, which is based on the LIV model and exist-
ing test principles. Contrary to the Hahn and Hausman test, it can be used to
separately investigate either instrument weakness or instrument endogeneity,
or both. Furthermore, if the instruments are found to be invalid, the estimates
for the regression parameters can still be used because the LIV results do not
rely on the quality, nor require access to observed instruments. Our simulation
evidence suggests that this approach does not yield size problems in presence
of weak instruments, as opposed to the classical test of overidentifying restric-
tions.
Choosing the (number of) instruments
The finite sample bias in IV estimators is a function of the number of instru-
ments, which suggests that one should not include too many, although identifi-
cation requires that at least as many instruments as endogenous regressors are
included (q ≥ k). Furthermore, increasing the number of instrumental vari-
ables results in a loss of degrees of freedom and the first stage regression (X
on Z) suffers from overfitting. Sargan (1958) concludes that “if the first few
instrumental variables are well chosen, there is usually no improvement, and
even a deterioration, in the confidence regions as the number of instrumental
variables is increased beyond three of four”. Besides, similar to the results
presented above, he also notes that “estimates [may] have large biases if the
number of instrumental variables becomes too large” (p.400). As opposed to
these finite sample results, large sample theory, however, shows that an IV es-
timator with one more instrument is at least as efficient, which suggests that
we can add as many instruments as we please without doing worse (see e.g.
Davidson and MacKinnon, 1993 (p.220-p.224)).
Bowden and Turkington (1984) suggest to perform a principal components
analysis onZ′Z and to choose the firstp principal components as instruments.
This approach, however, does not address the correlation ofX andZ, i.e. the
strength of the instruments. Donald and Newey (2001) developed a mean-
![Page 38: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/38.jpg)
24 Chapter 2 Instrumental variables: a survey
squared error criterion that can be minimized to choose a set of instrumen-
tal variables. They find that this method of choosing instruments generally
yields an improvement in performance. In the leading cases, LIML outper-
forms 2SLS, although they find that 2SLS performs better in situations of little
endogeneity. For the weak instruments case, there is a clear tendency to use
fewer instruments.
Testing regressor-disturbance problems
Given the potential pitfalls when using IV results and the problem of finding
instruments at all, one would like to test for potential regressor error correla-
tion a priori. Unfortunately, it is not possible to examineX′ε directly, asε is
unobserved and OLS estimation yieldsX′ε = 0 by definition10. In order to test
for endogeneity valid instrumental variables are required. A test based on the
general test procedure of Hausman (1978) can then be used. This test is based
on comparing the difference betweenβOLSn andβ IV
n , and Hausman proposed a
test-statistic that has approximately aχ2 distribution under the null hypothesis.
A drawback of this test procedure is that external instruments have to be avail-
able in order to computeβ IVn . As a consequence, the researcher may conclude
that the obtained instruments were not needed after all. Furthermore, this test
is potentially sensitive to weak instruments (see e.g. Staiger and Stock, 1997,
or Bowden and Turkington, 1984, for more details). In fact, the Hausman test
may incorrectly fail to reject the use of the OLS estimator because of the bias
(cf. Hahn and Hausman, 2003). In chapter 3 we propose an instrument-free
test, that solves this circular problem. We show that this test has a reasonable
power over a wide variety of settings.
2.2.2 IV based solutions to the weak instrument problem
Hahn and Hausman (2003), and Stock, Wright and Yogo (2002) surveyed most
of the econometric literature on solutions to the weak instruments problem in
empirical applications. In the following we present a brief summary, since
10An exception is testing forX′α = 0 in random intercept models, whereα = (α1, ..., αn)′
are the unit-specific random intercepts, as a test statistic is readily available (chapter 6).
![Page 39: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/39.jpg)
2.2 The IV approach 25
most of the technicalities and the amount of results that have appeared in the
literature, are beyond the scope of this thesis.
As mentioned before, first-order asymptotic approximations are poor in the
presence of weak instruments. Several studies have presented improved asymp-
totic approximations to finite-sample distributions in this situation. Staiger and
Stock (1997) developed an alternative asymptotic framework that models the
coefficients of the first stage regression as locally zero, i.e. weakly correlated
without assuming normality. In this framework they showed that if the in-
struments are weak, the 2SLS and LIML estimators have nonstandard asymp-
totic distributions and are not consistent, where the bias is less problematic for
LIML than for 2SLS, particularly in small samples. Furthermore, results on
properties of various inferential procedures (liket test, coverage rates of con-
fidence intervals and tests of overidentifying restrictions) are obtained. Bekker
(1994) developed an asymptotic approximation for models with normal errors
in which both the number of instruments and the sample size increases. Simu-
lation evidence shows that these asymptotics provide good approximations for
moderate and large values of the number of instruments, and that LIML is to
be preferred over the standard IV estimator. However, Bekker’s results apply
only to normal cases and do not capture the nonnormality observed in the exact
finite sample density, see also Staiger and Stock (1997).
Besides work on finding better alternatives to the first-order asymptotics, sev-
eral fully robust hypothesis tests and methods are developed to construct con-
fidence sets forβ that have approximately the correct size and coverage rates
under weak instruments. One such robust test to investigateβ = β0 is the
Anderson-Rubin statistic (Anderson and Rubin, 1949), which is not affected
by the degree of underidentification. However, it may lack power because of
a loss of degrees of freedom when the number of instruments is larger. The
K statistic (Kleibergen, 2002) has similar asymptotic properties with a mini-
mal number of degrees of freedom. Bekker and Kleibergen (2003) investigate
the finite-sample distribution under normality. Other tests have been proposed
as well, see e.g. Staiger and Stock (1997). Stock, Wright and Yogo (2002)
![Page 40: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/40.jpg)
26 Chapter 2 Instrumental variables: a survey
present results of power comparisons for several tests under different condi-
tions. Given the duality of hypothesis tests and the construction of confidence
sets, the robust tests can be used to obtain confidence intervals. When the in-
struments are weak, these sets can have infinite volume, indicating that there
simply is limited information to use in order to make inferences aboutβ (cf.
Stock, Wright and Yogo, 2002).
The previous methods to carry out tests or construct confidence intervals do
not readily provide point estimates forβ. In addition, they may be difficult to
compute. Several alternatives to 2SLS are proposed, that ought to be more ro-
bust and reliable if the instruments are weak. Second-order unbiased estimates,
such as LIML or Nagar estimators, are often suggested as robust alternatives.
These estimators, however, do not have finite sample moments which may
present a problem in empirical situations (Hahn and Hausman, 2003). Other al-
ternatives are Jackknife Instrumental Variables (Angrist, Imbens and Krueger,
1999), Fuller-k Estimator (Fuller, 1977), or bias-adjusted 2SLS (Donald and
Newey, 2001). Stock, Wright and Yogo (2002) find that these partially robust
estimators provide relatively reliable alternatives to 2SLS in applications with
weak instruments. However, Hahn and Hausman (2003) recommend, based on
Monte Carlo evidence, extreme caution using “no moment” estimators (LIML
or Nagar). Considering mean-squared error and IQR measures, they conclude
that 2SLS, jackknife 2SLS, and Fuller-based estimators perform best, and state
that “instrument pessimism seems overstated for 2SLS, which may be why
2SLS often performs better than expected in terms of MSE in the weak instru-
ment situation”. The specification test suggested by Hahn and Hausman (2002)
may be used to decide among the alternatives. Both Stock, Wright and Yogo
(2002) and Hahn and Hausman (2003) stress that most of the analysis in the
weak instruments literature is conditional on instrument exogeneity. Failure of
the exogeneity restriction, in particular in combination with weak instruments,
leads to additional complications and situations in which OLS may do better
than the above suggested remedies against weak instruments (see also section
4.4).
![Page 41: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/41.jpg)
2.3 Alternative approaches to solve for regressor-error dependencies 27
2.3 Alternative approaches to solve for regressor-errordependencies
In some applications the nature of the data generating process or the suspected
cause of endogeneity itself suggests suitable instruments or even a different es-
timation approach. Wooldridge (2002) suggests three other solutions to solve
omitted variable problems, including the proxy-variable OLS method (p.63)
and using indicators of the unobervables (p.105-p.107) that require IV esti-
mation11, where the latter method also applies to measurement error models.
Furthermore, observing the same cross-sectional units over time, and applying
fixed effects estimation could also eliminate endogeneity due to omitted vari-
ables, if the endogeneity arises from time-invariant sources, see also chapter
6, or the example given by Verbeek (2000) (p.312). Card (1999) presents an
overview of studies using sibling and twin data to estimate the return to ed-
ucation and argues that omitted ability is eliminated when computing within-
family estimators. Stern (2004) uses data composed of multiple job offers to
postdoctoral students and a fixed-effects approach to estimate the relation be-
tween wages and the scientific orientation of organizations. His results suggest
a negative relation between science and income, that is biased upward when
unobserved quality of researchers is not controlled for. For measurement er-
ror models, autoregressive models, and simultaneous equation models the data
generating process may suggest suitable instruments. It is beyond the scope of
this thesis to review all the literature on these topics. For measurement error
models we refer to e.g. Wansbeek and Meijer (2000), Carroll, Ruppert and
Stefanski (1995), or Bowden and Turkington (1984) for extensive overviews.
These models can be estimated using IV techniques, for instance by using other
(potentially) mismeasured variables (White, 2001). Another method is based
on Wald (1940), that assumes that the observations can be divided into groups.
11The indicator IV solution is different from the classical IV solution discussed previously.The indicator IV solution assumes the existence of a possible mismeasured proxy for the miss-ing variablew, that needs to be instrumented, whereas the classical IV solution leaves the omit-ted variablew in the error term and all elements ofx correlated withw need to be instrumented.See also Petrin and Train’s (2002) control function approach and the discussion in Chintagunta,Dube and Goh (2004) (p.6).
![Page 42: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/42.jpg)
28 Chapter 2 Instrumental variables: a survey
This classification should be independent of the error terms and should dis-
criminate between high and low values of the unobservable true construct (see
also Madansky, 1959, or chapter 3). Similarly, higher order lags may serve as
instruments for a model that has lagged dependent variables as regressors in
the presence of serial correlation. In simultaneous equation models exogenous
variables that are not included in the equation of interest can often serve as
instruments and are readily available (see e.g. Greene, 2000).
In the following we briefly consider three other interesting methods to solve for
endogeneity that have recently appeared in the literature: (1) Lewbel’s method,
(2) methods that model demand, cost and competition, and (3) spatial econo-
metrics.
Lewbel’s method.Lewbel (1997) showed that for measurement error models
instruments can be constructed from available data by exploiting higher order
moments. Hence, observed exogenous instrumental variables are not required.
Erickson and Whited (2002) extend this method and propose a two-step gen-
eralized method of moments estimator for a multiple mismeasured regressor
errors-in-variables model. Consistent estimation requires, among other things,
that measurement and equation errors are independent and have moments of
every order, but no assumptions have to be made about distributional forms.
Hence, information contained in third- and higher order moments of the data
are fully exploited to identify the regression parameters.
This interesting approach is developed for measurement error applications, but
may be applicable to more general regressor-error dependency models as well.
In appendix 6C we show for a simple linear multilevel model how these ideas
can potentially be extended to more ‘general’ endogeneity applications. De-
pending on the empirical situation, it may provide an easy way to construct
instruments from the available data and, hence, deserves more attention.
Methods that model demand, cost and competition.Several studies have
attempted to solve for ‘price endogeneity’ in markets with differentiated prod-
![Page 43: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/43.jpg)
2.3 Alternative approaches to solve for regressor-error dependencies 29
ucts. Price is endogenously determined by the interaction of demand and sup-
ply. The idea is to solve this form of endogeneity by jointly modeling demand
and supply equations, using a profit maximization model. Berry (1994) and
Berry, Levinsohn and Pakes (1995) develop a market equilibrium model, based
on a logit demand function, that is adapted to make it suitable for traditional
instrumental variables estimation. This method is both applicable to aggregate
or dissaggregate data, or a combination of both. The resulting system is ob-
tained by aggregating a discrete choice model of individual consumer behavior,
which is combined with a cost function. These two models are embedded in a
system of price setting firms in differentiated markets. Joint estimation leads
to potentially more efficient estimates, than to focus on the demand side only
with instruments for price. Furthermore, the system provides detailed infor-
mation on cost structures and the nature of competition. Using equilibrium
models, however, imposes more demand on data and incorrect specification of
the firm’s behavior could lead to biased estimates12. This approach has widely
been applied and adapted, with differences in e.g. data-aggregation, type of
heterogeneity, or method of estimation, see, for instance, Besanko, Gupta and
Jain (1998), Besanko, Dube and Gupta (2000), Nevo (2001), or Sudhir (2001).
Most studies employ an instrumental-variables based simultaneous equations
estimation procedure, which is a generalized method of moments estimator.
However, as opposed to homogenous goods models, in differentiated markets
most of the exogenous variables in the model are product characteristics affect-
ing both cost and demand. Traditional exclusion restrictions therefore cannot
be used to form instruments. Sudhir (2001) (section 4.2) and Nevo (2001) (sec-
tion 4.3), for instance, discuss this in more detail and report having instruments
of potential poor quality, see also the discussion in Berry (2003) (section 1).
Recently, Draganska and Jain (2004) proposed a new maximum likelihood
based method for simultaneous estimation of supply-demand. Their proposed
algorithm uses individual level-data for a heterogenous demand model, to-
12See e.g. discussion in Yang, Chen and Allenby (2003) (section 2.3) or Dube and Chinta-gunta (2003).
![Page 44: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/44.jpg)
30 Chapter 2 Instrumental variables: a survey
gether with a supply equation derived from profit maximization behavior of
firms, assuming a Bertrand-Nash equilibrium. The resulting likelihood equa-
tion cannot be maximized straightforwardly because the equilibrium model is
highly nonlinear. Their estimation procedure is based on simulating prices
and choice probabilities to solve for the market equilibrium. The obtained
smoothed empirical distribution can be used for maximization. On the other
hand, Yang, Chen and Allenby (2003) (with discussions) proposed a Bayesian
approach to estimate a simultaneous heterogenous demand and supply model.
The method incorporates consumer heterogeneity and allows for a wide va-
riety of supply model specifications. The advantage of their approach is that
it can handle non-linear model structures and allows for exact small sample
inference. See also Chintagunta, Dube and Goh (2004) for a recent overview
and discussion.
Spatial econometrics.Recently, two studies in marketing appeared, that solve
endogeneity of marketing mix variables using spatial dependencies in observed
market data, where no or limited time variation is present. These dependen-
cies are caused by the fact that economic agents are spatially organized, or have
similar store profiles. Bronnenberg and Mahajan (2001) identify correlations
between marketing mix variables and the error term by imposing a measurable
spatial structure on the random terms in the model. This spatial map is a con-
sequence of unobserved actions of retailers that are faced by trade territories
consisting of multiple neighboring markets. Bronnenberg and Mahajan con-
struct a spatial map by making use of geographic proximities. By accounting
for this space in an econometric model, it is possible to correct and test for the
effect of unobserved retailer’s behavior. Their results for Mexican food items
suggest that unobserved components of the dependent variables are related to
the marketing mix variables. Van Dijk et al. (2004) consider the estimation of
shelf-space elasticities based on endogenous shelf space data. Estimation of
shelf-space elasticities is hampered due to minimal (time) variation in shelf-
space measures. The authors build on the work of Bronnenberg and Maha-
jan and propose to model the correlation between shelf space and the random
terms by using a spatial structure based on similarities in store-, consumer-,
![Page 45: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/45.jpg)
2.4 Conclusions and positioning of research 31
and competitor characteristics. Their results for frequently bought daily care
products provide face valid shelf-space elasticities estimates that outperform
a model with a spatial structure based on geographic proximities in terms of
predictive validity. Since retailers generally decide about shelf space based on
store, customers and competitor characteristics, it is expected that the similar-
ity of two geographically similar stores in this case is lower than the similarity
of two stores with similar profiles in distinct regions.
2.4 Conclusions and positioning of research
It is clear from this review that traditional instrumental variable methods, that
rely on economic theory or intuition to find additional observable instruments,
suffer from at least two problems: (i) in many situations no such variables
are available, and (ii) once available, performance of the inferential proce-
dures critically rely on the quality of these variables. In particular the latter
has recently been the topic of several studies in econometrics. Although many
important contributions to the weak instrument problem have been made, the
problem of having potential endogenous instruments has not yet been solved
(see e.g. concluding remarks of Hahn and Hausman, 2003, or Stock, Wright
and Yogo, 2002). For most empirical researchers the question where to find
suitable instruments is still open and usually there is not much choice when
selecting instrumental variables. Without having valid instrumental variables
at hand, classical instrumental variables estimation techniques cannot be re-
lied on. Furthermore, there is a bit of a dilemma: theory suggests that the
best choice of instruments are variables that are highly correlated with the en-
dogenous regressors. However, the more highly correlated they are, the less
defensible is the claim that these variables themselves are uncorrelated with
the disturbances (cf. Greene, 2000, p. 375).
The latent instrumental variables (LIV) method proposed in the next chapter at-
tempts to solve this circular problem. Similar to the classical IV model in (2.4),
we assume that the endogenous regressor can be separated into an exogenous
part and an endogenous part. However, we propose to model the exogenous
![Page 46: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/46.jpg)
32 Chapter 2 Instrumental variables: a survey
part as an unobserved discrete variable, which is a nuisance parameter. We
prove that the model parameters are identified through the likelihood. Hence,
observed instrumental variables are not required to estimate the regression pa-
rameters. In econometrics, instruments frequently take the form of categor-
ical variables and, in addition, continuous instruments are often transformed
into dummy variables (van de Ploeg, 1997, Bowden and Turkington, 1984, or
Verbeek, 2000). We show that the parameters in the LIV model can be iden-
tified and estimated through maximum likelihood methods. As a by-product,
‘optimal’ LIV instruments are estimated from the data and regressor-error de-
pendencies can be tested for straightforwardlywithout needing observed in-
struments at hand. Furthermore, the proposed likelihood framework allows for
straightforward extensions to different applications.
The LIV approach has some similar features as two methods developed in
the measurement error literature. Wald (1940) and Madansky (1959) assume
that data is divided into two groups according to certain (statistical) criteria.
Then a straight line can be fitted because it is determined by two points. If
the grouping criteria are satisfied, the fitted line can be shown to be a consis-
tent estimate of the true line. Randomly assigning the observations into two
groups, for instance, or simply assigning the observations with highx val-
ues to one group and with lowx values to the other group, does not provide
valid groupings. Ideally, group construction should be based on some knowl-
edge of the pattern of the underlying variation (cf. Bowden and Turkington,
1984). The LIV model does not require the existence of an a priori group-
ing of the data but estimates such a grouping simultaneously with the other
parameters using mixture modeling techniques. Secondly, Lewbel’s idea to
construct instruments from the available data and, hence, solving the circular
problem of needing observed instrumental variables at hand, is similar to the
motivation of the LIV model. Lewbel (1997), and Erickson and Whited (2002)
propose method-of-moments based estimators and show that, under certain
higher-order moment conditions, instrumental variables can be obtained from
the available data. Hence, instruments are constructed based on ‘statistical’
moment conditions and the resulting variables will generally not correspond to
![Page 47: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/47.jpg)
2.4 Conclusions and positioning of research 33
an economic theory or interpretation. Although these methods are developed
for measurement error applications, we believe that they are more generally
applicable, like the LIV model, although this requires further research (see ap-
pendix 6C, subsection 6.5.2, or subsection 8.2.2).
On the other hand, we propose a likelihood-based approach, which constitutes
a very general framework that can be easily adapted to more general situa-
tions, for instance to the Bayesian setting in chapter 7. Furthermore, the pre-
dicted LIV instruments can be used to investigate the nature of the endogene-
ity more thoroughly, since these instruments are estimated from the available
data rather than being constructed based on higher-order moment assumptions
that may or may not be valid. The likelihood-approach has desirable opti-
mality properties and can be expected to be more efficient than method-of-
moments estimation. We agree with Yang, Chen and Allenby (2003) who state
that “likelihood-based inference offers a distinct advantage over a method-of-
moments approach because it makes precise statements about the probability
of the observed data. In a likelihood-based analysis, the researcher is con-
fronted with the correspondence between the model and the data, and cannot
fit a model that is not supported by the data”. Besides, the LIV model belongs
to the class of mixture models, that are often employed to estimate probabil-
ity density functions, and mixture models can be seen as a flexible and robust
approach to approximate them. Kim, Menzefricke and Feinberg (2004), for
instance, provide evidence that mixtures of normals are a simple and effective
way of density estimation, in particular in a Bayesian framework. Tittering-
ton, Smith and Makov (1985) find that finite mixtures (of normals) have often
been used in robustness studies to investigate non-normal conditions in ‘nor-
mal’ inference, or to provide a procedure to reduce the influence of outlying
observations. Hence, it is expected that some of these aspects translate to the
LIV model, and we will show that the LIV results are relative insensitive to
different choices of the shape of the distribution of the data and, hence, to the
(non)existence of higher-order moments.
In the next chapter we introduce the simple LIV model which is further devel-
![Page 48: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/48.jpg)
34 Chapter 2 Instrumental variables: a survey
oped in subsequent chapters.
![Page 49: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/49.jpg)
Chapter 3
The LIV model
3.1 Introduction
In applying the classical linear regression model, the assumption that E(ε|x) =0, with x ∈ Rk×1, may not hold. As a consequence, the OLS estimator is
biased. A more or less standard approach in econometrics to obtain unbi-
ased estimates is to find instrumental variablesz (see for example Bowden and
Turkington, 1984). Instruments mimic the troublesome regressors but are un-
correlated with the error term (i.e. E(ε|z) = 0, z ∈ Rg×1). Onceg ≥ k, the
regression coefficients can be estimated by 2SLS or LIML (see the standard
textbooks, e.g. Greene, 2000).
In empirical work it is not always obvious whether it is necessary to search for
instruments and if so where to find them. Thus, one would like to test a priori
whether E(ε|x) = 0 holds. However, as OLS always yieldsX′e = 0, it is
fruitless to use the OLS estimates for that purpose. One way to test for exo-
geneity is through the use of instruments. Once instruments have been found,
a Hausman test can be applied to determine post-hoc whether they were ac-
tually needed (see e.g. Bowden and Turkington, 1984). This method has as
drawbacks that instruments need to be available and that once they are avail-
able, they may be weak and/or correlate with the error (i.e., E(ε|z) = 0 may
not hold). Several authors have examined problems associated with weak in-
struments (Bound, Jaeger, and Baker, 1995, Staiger and Stock, 1997, or van
35
![Page 50: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/50.jpg)
36 Chapter 3 The LIV model
der Ploeg, 1997), and results on asymptotics, efficiency bounds and tests for
instrument validity for IV models are available (Staiger and Stock, 1997, Hahn
and Hausman, 2002, Hahn, 2002).
We propose an “instrument-free” approach to estimate regression parameters
in a situation with potential regressor-error correlations. As this method does
not rely on observable instruments, issues such as availability, validity, or
weakness of the instruments can be circumvented. The proposed method,
which we call the ‘latent instrumental variable (LIV) method’, utilizes a la-
tent variable model to account for dependencies between the regressors and
the error. The method introduces an (unobserved) discrete binary variable to
decomposex into a systematic part that is uncorrelated withε and one that is
possibly correlated withε.
Although the idea oflatent instruments is new, discrete instruments have been
used before. Frequently, observable instruments are categorical instruments.
For example, in the measurement error literature, grouping methods have been
used to construct instruments based on the method of Wald (cf. Madansky,
1959). Van der Ploeg (1997) uses instruments generated by an a priori group-
ing of the data to apply the group-asymptotics developed by Bekker (1994).
Lewbel (1997) proposed a method to construct internal instruments for regres-
sors with measurement errors by taking simple functions of the model data. As
such, in his approach no additional external instruments are needed to iden-
tify and estimate the regression parameters. Under nonnormality, the use of
third order moments of the data identifies the model parameters and 2SLS or a
GMM approach can be shown to yield consistent estimators (Erickson, 2001,
or Wansbeek and Meijer, 2000). In our LIV model the parameters are not
identified by the first two moments, but can be identified by the likelihood.
However, as Hennig (2000) observes, identifiability of mixture models (the
LIV model belongs to this class of models) is not straightforward and we have
to be careful in claiming identifiability.
This chapter is organized as follows. In section 3.2 we introduce the LIV
![Page 51: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/51.jpg)
3.2 The LIV model 37
model and in section 3.3 we prove identifiability and present results on the
information matrix. Furthermore, we suggest a method which is based on a
Hausman-test (Hausman, 1978) to test directly for regressor-error dependen-
cies, without needing observed instrumental variables. This instrument-free
test can be used to assess a priori the presence of regressor-error correlations
(section 3.4). The model estimators and test-statistic are evaluated on the basis
of a simulation study (section 3.5). We demonstrate that this latent instrumen-
tal variable method yields approximately unbiased1 results for the regression
parameters over a wide range of regressor-error correlations and several dis-
tributions of the instruments. Furthermore, the test-statistic has a reasonable
power across these settings. In section 3.6 we empirically illustrate the LIV
model for a measurement error application where an observed ‘natural’ (lab-
oratory) discrete instrument is available. We show that in this case the LIV
estimate and the IV estimate, computed with this natural instrument, coincide.
Section 3.7 concludes this chapter.
3.2 The LIV model
The structural form of the assumed LIV model is given by
yi = β0+ β1xi + εi ,
xi = π ′zi + νi ,(3.1)
with i = 1, ...,n andπ an(m× 1)-vector of category means. Here we assume
a single unobserved categorical discrete instrumentz. In subsection 3.3.1 we
show that in this case the categorical instrument should be at least of dimension
two, which is in accordance with van der Ploeg’s (1997) result for the standard
IV model with discrete instruments. It is assumed thatz is independent of the
error terms(ε, ν), that are specified to follow a joint normal distributionF
with mean zero and variance-covariance matrix
6 =[σ 2ε σενσεν σ 2
ν
]. (3.2)
1Since we have no formal proof of the unbiasedness or consistency of the LIV estimator(see also chapter 8 for a discussion), we sometimes use ‘approximately’ unbiased or consistentto denote a result from Monte Carlo studies.
![Page 52: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/52.jpg)
38 Chapter 3 The LIV model
If we had observed the instruments,zwould separate the sample intom groups,
with known category–membership for each observation. We assume, however,
that the category indicators are unknown a priori and have a multinomial dis-
tribution with parameters(n, λ), wheren = 1 andλ = (λ1, λ2, ..., λm)′, with
λ j > 0 and∑
j λ j = 1. Conditionally on categoryj = 1,2, ...,m, the reduced
form distribution corresponding to (3.1) has a mean
µ j =(β0+ β1π j
π j
)(3.3)
and variance-covariance matrix
� =[β2
1σ2ν + 2β1σεν + σ 2
ε β1σ2ν + σεν
β1σ2ν + σεν σ 2
ν
]. (3.4)
ThesimpleLIV model assumes the existence of a dummy instrument, i.e.m=2. The assumption of a single dummy instrument prevents overfitting, and adds
tractability to our specification. Furthermore, it is known from the IV literature
that the number of IVs should not be too large (Bowden and Turkington, 1984,
Buse, 1992, or Bound, Jaeger, and Baker, 1995). Moreover, we will show
below that the simple LIV model is robust against misspecification of the true
number of categories of the instrument and performs well for several types of
distributions forx. If m = 2, then conditionally on categoryj = 1,2, the
reduced form distribution corresponding to (3.1) is
L(yi , xi |zi = ej ) = N2
(µ j ,�
)(3.5)
with mean (3.3) and variance (3.4), and wheree1 = (1,0)′ ande2 = (0,1)′.
If f j denotes the normal bivariate probability density function conditionally
given zi = ej , then the unconditional (marginal) probability density function
for (yi , xi ) can be computed as
f (yi , xi ) = λ f1(yi , xi )+ (1− λ) f2(yi , xi ). (3.6)
![Page 53: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/53.jpg)
3.3 Identifiability and information 39
Thus, f (yi , xi ) is a mixture of bivariate homoscedastic normal distributions
and it has expectation2
µy,x =(β0+ β1
(λπ1+ (1− λ)π2
)λπ1+ (1− λ)π2
)
and variance-covariance matrix3
�y,x = �+ λ(1− λ)(π1− π2)2(β1,1)
′(β1,1) (3.7)
The parameters to be estimated areβ0, β1, π1, π2, 6, andλ. The parameters
are identified in our model. We will demonstrate this in subsection 3.3.1.
For estimation of the parameters, assume that a sample ofn i.i.d. observa-
tions (yi , xi ) is available, but we donot require the availability of observed
instrumentszi . The method of maximum likelihood can be used to estimate
the model parameters. The likelihood function is obtained as the product of
(3.6) across the observations. The resulting (log) likelihood equations, how-
ever, are nonlinear and do not allow a closed-form expression. Therefore we
use quasi-Newton numerical optimization routines (the BFGS-method) for the
maximization of the likelihood function that are provided with the GAUSS
package (Aptech, 2000). In the following section we discuss some statistical
properties of the LIV model.
3.3 Identifiability and information
In this section we discuss some statistical properties of the LIV model. Firstly,
we proof the identifiability of all the parameters of the LIV model. Identifia-
bility is proved for a general number of categories and normal regressor-error
distributions. Furthermore, we discuss the estimation of the information ma-
trix.
2Use E(U ) = E [E (U |V)].3Use: Var(U ) = E [Var(U |V)] + Var(E [U |V ]).
![Page 54: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/54.jpg)
40 Chapter 3 The LIV model
3.3.1 Identifiability
The parametersπ andσ 2ν are identified using first and second order moments,
but the parametersβ0, β1, σ2ε , andσεν are not identified. However, these
parameters become identified by considering finite mixtures. Let
F = {F(x, θ), θ ∈ 2, x ∈ Rd}
be the class ofd-dimensional distribution functions from which mixtures are
to be formed. Here2 will be a Borel measurable set inRq andθ is formed
from the elements of the mean (3.3) and variance (3.4). For the simple LIV
model above,d = 2, q = 5, andF(., θ) is a bivariate normal c.d.f..
The class of finite mixturesH generated byF is defined as
H =H(x) : H(x) =
m∑
j=1
ψ j F(x, θ j ), ψ j > 0,
m∑
j=1
ψ j = 1, F(x, θ j ) ∈ F, ∀ j,m= 1,2, ...; x ∈ Rd
. (3.8)
So,H is the convex hull ofF . For the sake of simplicity, we will use some
abbreviations:F(x, θ j )will be written asF j (x) or justF j and the (correspond-
ing) mixture asH = ∑mj=1ψ j F j . We use definition 3.1 for the identifiability
of mixtures inH. Here we are interested in pure mixtures (ψ j > 0) of order
m= 1,2, ....
Definition 3.1 SupposeH andH ′ are any two members ofH, given by
H =m∑
j=1
ψ j F j , H ′ =m′∑
j=1
ψ ′j F′j . (3.9)
H is identifiablewhen H ≡ H ′ if and only if m = m′, and the order of
summation can be chosen such thatψ j = ψ ′j , F j = F ′j , j = 1, ...,m.
![Page 55: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/55.jpg)
3.3 Identifiability and information 41
Stated differently,H ∈ H is identifiable, if there is a unique solution (up to a
permutation of subscripts) of the identity definingH in (3.9). Several theorems
on the identifiability of finite mixtures are available and linear independence
of the members ofF is the key to answering the question (cf. Titterington,
Smith, and Makov, 1985). Core papers on this issue are Teicher (1963) and
Yakowitz and Spragins (1968), from which we present the most important re-
sults in appendix 3A.
In several studies it is proved that certain familiesF of d-dimensional c.d.f.’s,
for instance Gaussian c.d.f.’s, generate identifiable finite mixtures. As we show
below, these results do not carry over directly to the LIV model and we have
to prove identifiability in two steps. Similarly, related work by Hennig (2000)
on the identifiability of mixtures of regressions cannot be extended straight-
forwardly because of structural differences between his and our framework,
and model assumptions (in the context of Hennig at leastσεν 6= 0 is required
whereasσεν = 0 is also of interest in our model).
Let
Fβ,6 ={F |F is a bivariate normal c.d.f. onR2 of the pair(yi , xi )
with mean and variance(µ(β, π),�(β,6)) , π ∈ R} , (3.10)
whereµ(β, π) = (β0 + β1π, π)′ and�(β,6) = � as in (3.4), be the class
of general LIV models, whereβ and6 are known, andHβ,6 the set of all
pure finite mixtures of orderm of the classFβ,6. We will consider general
m > 1. For the simple LIV model we usedm = 2. We apply standard results
of identifiability of finite mixtures to establish identifiability of the classHβ,6
in terms of theπ ’s and the mixture probabilities. However, the identifiability
of the parametersβ and6 does not follow immediately. In fact, we are not
seeking for the identifiability ofHβ,6 but for the identifiability of the larger
class
![Page 56: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/56.jpg)
42 Chapter 3 The LIV model
G =⋃
β,6
Hβ,6. (3.11)
In the following, we first prove the identifiability of the classHβ,6. Subse-
quently, we use the identifiability ofHβ,6 to prove identifiability ofG, which
is the class of LIV models. Identifiability ofG is equivalent with identifiability
of the parameters in the general LIV model.
Proof of identifiability of Hβ,6. Let F j,X be the marginal distribution func-
tion of F j for X. More specifically,F j,X(x) = limy→∞ F j (y, x). From (3.1)
and (3.3) it can be seen thatX has meanπ j and varianceω22. Here, allF j,X,
j = 1,2, ..., are normal distribution functions with different location parame-
ters but with the same variances.
Since we assume for the moment thatβ and6 are known, identifiability
of the classHβ,6 is established if there is a unique solution ofF(y, x) =∑mj=1 a j F j (y, x) in terms ofa j andπ j for m = 1,2, .... But now we only
have to look at the marginal distribution ofX, since this distribution contains
all the relevant parameters. The c.d.f.F is a finite mixture ofN(π, σ 2ν ), dis-
tribution functions withπ ∈ R andσ 2ν is fixed for the moment. According
to proposition 1 of Teicher (1963), or proposition 2 of Yakowitz and Spragins
(1968) (see appendix 3A),F is identifiable. It follows that there is a unique
solution in terms ofm, a j , andπ j , for j = 1,2, ...,m, for anyF ∈ Hβ,6.
In the preceding we assumedβ and6 known, which will not be the case in
general. But the previous result can be used to proof identifiability of the larger
classG = ⋃β,6Hβ,6, which is the union across all possible values of(β,6).
If a distribution from this class has a unique solution in terms of its unknown
parameters, than the parameters of the general LIV model are identified (in-
cluding the relative class sizes).
Proof of identifiability of G. In the following we prove thatG is also identi-
fied, i.e. we prove the following theorem.
![Page 57: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/57.jpg)
3.3 Identifiability and information 43
Theorem 3.1 Assume thatm≥ 2.Hβ,6 is identified for allβ and6 positive
semi-definite⇐⇒ G is identified.
Proof (⇒) Let F,G ∈ G such thatF ≡ G, where
F =m∑
i=1
ai Fi ∈ Hβ,6
G =k∑
j=1
b j G j ∈ Hδ,9,
anda1, ...,am andb1, ...,bk are the positive mixing proportions, and the dis-
tributions F1, ..., Fm ∈ Fβ,6 andG1, ...,Gk ∈ Fδ,9 are different in terms of
their means and variances, i.e.
Fi is the c.d.f. ofN2
(µ(β, πi ),�(β,6)
)
Gi is the c.d.f. ofN2
(µ(δ, γi ),�(δ,9)
). (3.12)
We need to show thatF ≡ G impliesm = k, ai = bi and Fi = Gi modulo
permutation (definition 3.1). By definition 3.1,G is identified if F ≡ G im-
plies thatm = k, ai = bi , andFi = Gi eventually after relabeling (and vice
versa) fori = 1, ..., k.
F ≡ G implies that
m∑
i=1
ai Fi =k∑
j=1
b j G j . (3.13)
Both F andG have unique representations (up to a permutation of indices) in
terms ofm, ai , andπi , respectively,k, b j , andγ j becauseHβ,6 andHδ,9 are
assumed to be identified (⇒). Hence, we know that, givenβ and6 (δ and9),
there is no other representation in terms ofm, ai , andπi (k, b j , andγ j ) that
yields F (G). In addition, in (3.13) we have two finite mixtures of bivariate
normal distribution functions. According to proposition 2 of Yakowitz and
![Page 58: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/58.jpg)
44 Chapter 3 The LIV model
Spragins (1968) such mixtures are identifiable, hence we must havem = k,
ai = bi , andFi = Gi (eventually after relabeling). ButFi = Gi implies that
µ(β, πi ) = µ(δ, γi ) and�(β,6) = �(δ,9). Thus,
β0+ β1πi = δ0+ δ1γi (3.14)
πi = γi (3.15)
β21σ
2ν + σ 2
ε + 2β1σεν = δ21ψ
2ν + ψ2
ε + 2δ1ψεν (3.16)
β1σ2ν + σεν = δ1ψ
2ν + ψεν (3.17)
σ 2ν = ψ2
ν (3.18)
for i = 1, ...,m. Since theFi = Gi are different fori = 1, ...,m, we have
πi 6= π j andγi 6= γ j for all i 6= j . Using this andm ≥ 2, it follows from
(3.14) and (3.15) thatβ0 = δ0 andβ1 = δ1. Subsequently, from (3.16) - (3.18),
we haveσ 2ε = ψ2
ε , σ 2ν = ψ2
ν , andσεν = ψεν . So,F ∈ G has an unique repre-
sentation andG is thus identified.
The reverse of the proof (⇐) follows immediately (i.e. a subset of an identified
set must be identified as well).
To conclude, from theorem 3.1 it follows that ifm≥ 2 and all the group means
π j , j = 1, ...,m, are different, the parameters of the LIV model in (3.1) with
normally distributed errors are identifiable, including the mixture probabilities.
3.3.2 Information matrix
The Fisher information matrix is a quantity for the ‘information’ in a sample.
Besides, the (asymptotic) variance of unbiased estimates is based on this ma-
trix (Cramer-Rao lower bound), and plays a role in determining the asymptotic
distribution of the maximum likelihood estimator. The information matrix is
defined as
![Page 59: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/59.jpg)
3.3 Identifiability and information 45
I(θ) = −E
[∂2 lnL(θ)∂θ∂θ ′
]= E
[(∂ lnL(θ)∂θ
)(∂ lnL(θ)∂θ ′
)], (3.19)
provided that this expression is well-defined. From the mixture literature (e.g.
Titterington, Smith, and Makov, 1985) it is known that the information loss is
larger when the mixing weights are unbalanced or when the component den-
sities are not well separated. In such cases, larger samples or adding a small
portion of fully categorized data may result in large improvements (Tittering-
ton, Smith, and Makov, 1985, or Redner and Walker, 1984). Although in some
situations the calculation of the information matrix can be simplified by using
lower-dimensional numerical (quadrature) integration, or by using existing ta-
bles that report on quantities which can be used to approximate the information
matrix quite accurately, in general the derivatives of the log-likelihood will be
complicated nonlinear functions of the data whose expected values will be un-
known.
Using the law of large numbers, the information matrix can be estimated by
evaluating the actual (not expected) second-order derivatives of the log- like-
lihood (i.e. the Hessian at the maximum likelihood estimate). A second es-
timate can be obtained using the first order derivatives (which are necessary
to solve the likelihood equation). This estimator of the information matrix is
known as the outer product of gradients (OPG) estimator (e.g. Greene, 2000,
or Davidson and MacKinnon, 1993). In the following we examine the first-
and second-order derivatives of the log-likelihood function of the LIV model
in (3.1) more closely.
Let
L(θ |y, x) =n∏
i=1
f (yi , xi |θ), (3.20)
where the parameterθ = (θ1, θ2, θ3, θ4)′ is defined as follows:θ1 = (β0, β1)
′,
θ2 = (π1, ..., πm)′, θ3 = (σ 2
ε , σεν, σ2ν )′, andθ4 = (λ1, ..., λm)
′, with∑m
j=1 λ j =1, and
![Page 60: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/60.jpg)
46 Chapter 3 The LIV model
f (yi , xi |θ) =m∑
j=1
λ j f j (yi , xi |θ1, θ2 j , θ3). (3.21)
In this notation,θ1 contains the (regression) parameters that do not differ with
j , θ2 contains the parameters that are dependent ofj , except the class sizesλ j ,
that are contained inθ4. The vectorθ3 consists of the elements of6. We also
write fi | j instead off j (yi , xi |θ1, θ2 j , θ3) and fi instead off (yi , xi |θ).
The general structure of the first order derivatives of the log-likelihood of the
LIV model (gradient) with respect toθ1, θ2, andθ3 is
∂
∂θh
logL(θ) =n∑
i=1
1
fi
m∑
j=1
λ j
∂ fi | j∂θh
, (3.22)
whereθh is an element ofθ1, θ2, andθ3. For elements ofθ4 (i.e. the group
sizes) we have forl = 1, ...,m− 1,
∂
∂θ4l
logL(θ) =n∑
i=1
fi |l − fi |mfi
, (3.23)
sinceλm = 1−∑m−1j λ j . In appendix 3B we present more detailed results
when the errors have a joint normal distribution.
The general structure of the mixed partial derivatives of the log-likelihood of
the LIV model (Hessian) with respect to the elements ofθ1, θ2, andθ3, is given
by
∂2
∂θh∂θk
logL(θ) = ∂
∂θh
n∑
i=1
1
fi
m∑
j=1
λ j
∂ fi | j∂θk
(3.24)
=n∑
i=1
−
(m∑
l=1
λl
fi
∂ fi |l∂θh
)
m∑
j=1
λ j
fi
∂ fi | j∂θk
+
m∑
j=1
λ j
fi
∂2 fi | j∂θh∂θk
,
![Page 61: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/61.jpg)
3.4 A test-statistic to test for regressor-error dependencies 47
whereθh andθk are elements ofθ1, θ2, or θ3. The second order derivatives with
respect to the group sizesθ4 have the following structure:
∂2
∂θh∂θ4l
logL(θ) = ∂
∂θh
{∂
∂θ4l
logL(θ)}
=n∑
i=1
fi∂∂θh
{fi |l − fi |m
}− { fi |l − fi |m}
∂∂θh
fi
( fi )2
, (3.25)
where the results from the gradients can be used. In appendix 3B we present
more detailed results on the derivation of the second-order partial derivatives
for the normal case.
The gradient vector and Hessian matrix can be programmed along with the
log-likelihood function and numerical optimization techniques can be used to
find the maximum likelihood estimates of the parameters of the LIV model.
We found that using these analytical expressions drastically increase the speed
of convergence of the numerical optimization routine. Furthermore, the final
results were found to be more stable than using numerical approximations of
the gradient and Hessian. This holds in particular for the Hausman test in the
next section.
3.4 A test-statistic to test for regressor-error dependen-cies
We propose to apply a Hausman test directly to test for exogeneity of the re-
gressor (see Greene, 2000) based on the parameter estimatesβLIV . The null
hypothesis is that both OLS and LIV estimates are consistent. The alternative
hypothesis states that only LIV is consistent. The Hausman-LIV statistic is
defined as
H L I V = (βLIV − βOLS)′6−1
HLIV (βLIV − βOLS), (3.26)
where6HLIV is the estimated asymptotic covariance of the difference ofβLIV −βOLS. Hausman shows that this difference can be computed by subtracting the
![Page 62: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/62.jpg)
48 Chapter 3 The LIV model
estimated asymptotic OLS covariance matrix from the estimated asymptotic
LIV covariance matrix. The latter can be obtained as the outer product of the
gradients, from the Hessian matrix, or in case of potential misspecification,
by White’s misspecification consistent covariance matrix (White, 1982). We
found that the estimated asymptotic covariance matrix based on the analytical
first- and second-order derivatives (see subsection 3.3.2) is more stable and
gives more accurate results than a numerical approximation of the gradient
or Hessian. Under the null hypothesis, the statistic follows asymptotically a
χ2(1)-distribution.
This Hausman-LIV test we propose has a great practical advantage over clas-
sical IV methods. In the classical case, one would first need to find good
observable instruments, after which a test to investigate whether or not the in-
struments were needed can be performed. If the test does not reject the null
hypothesis, the instrumental variables are simply discarded since the OLS es-
timator is used in that case. Besides, weak and/or endogenous instruments
will bias the test leading to false conclusions. The LIV variant of the test cir-
cumvents this circular problem and observed instrumental variables are not
required to perform the test.
3.5 Monte Carlo experiments
This section presents the results of a Monte Carlo experiment to demonstrate
that the proposed simple LIV model and Hausman-LIV test are well suited to
identify and resolve regressor–error dependencies. Even when the true num-
ber of the categories of the instrument is larger than two and for various dis-
tributions of the endogenous regressor, the LIV estimates are approximately
consistent and the power4 of the test appears to be satisfactory.
We present the results as follows. First we discuss the results concerning the
main parameters of interest:β1 andσ 2ε . We start with the results for fitting
4The power of a test is the probability of getting a positive result for a given test whichshould produce a positive result (cf. Weisstein, 2004c).
![Page 63: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/63.jpg)
3.5 Monte Carlo experiments 49
the simple LIV model, i.e. assuming a dummy instrument (m = 2). We
show that in all cases the parameters of interest can be recovered well, in con-
trast with OLS. Furthermore, we discuss the results of the Hausman-LIV test.
Subsequently, we present a sensitivity analysis for the simple LIV model by
assuming that the unobserved discrete instrument has three or four categories.
As will become clear, this has no significant impact on the results forβ1 andσ 2ε
in most cases. The same result holds for the power of the Hausman-LIV test
for these specifications. This is an important result, since it illustrates that the
impact of the exact choice of the number of categories on the final outcomes is
less important. Finally, in subsection 3.5.4 we present the results for theπ ’s,
theλ’s, σ 2ν , andσεν .
3.5.1 Design of the simulation study: data generation
In the simulation study the data were generated as follows. The error terms
(ε, ν) are drawn from a bivariate normal distribution with unit variances. The
endogenous regressorx was constructed by varying the correlation betweenx
andε and the true number of instrumentsm. We considered three specifica-
tions form: 2, 4, and 8, using equal group sizes 1/m. This results in a bimodal
distribution withm = 2 (bim2), and two unimodal distributions withm = 4
(unim4) andm = 8 (unim8) for the endogenousx. Furthermore we consider
two other distributions for the instruments both withm = 8 support points,
resulting in a bimodal distribution (bim8) and a skewed distribution (skew8).
The values form, σεν , andπ1, ..., πm are chosen such that the mean ofx is zero,
its variance is 2.5 and the correlation betweenx andε is 0,0.1, ...,0.5. Since
in all simulations the endogenous regressor has mean zero, the OLS estimate
of the constant is unbiased and it can be used as an estimate forβ0. Hence, we
omit further details onβ0 in the following. The Hausman-LIV test statistic for
the regression coefficient has aχ2(1)-distribution under the null hypothesis of
no regressor-error dependency. Data were generated for 1000 observations and
250 Monte Carlo replications. We use quasi-Newton numerical optimization
routines (the BFGS-method) to maximize the likelihood function that are pro-
vided with the GAUSS package (Aptech, 2000). For computing the gradient
and Hessian we use the analytical expressions discussed in subsection 3.3.2
![Page 64: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/64.jpg)
50 Chapter 3 The LIV model
Figure 3.1: Bias plotsβ1 for the simple LIV model, where 1: OLS, 2: bimodalm= 2(bim2), 3: bimodalm = 8 (bim8), 4: skewedm = 8 (skew8), 5: unimodalm = 4(unim4), and 6: unimodalm= 8 (unim8).
![Page 65: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/65.jpg)
3.5 Monte Carlo experiments 51
and derived in appendix 3B.
3.5.2 Results for the simple LIV model (m= 2)
Results forβ1 and σ 2ε
Figure 3.1 shows the bias plots forβ1 estimated for the six different corre-
lations betweenx andε by, respectively, OLS and the LIV method for data
generated withm = 2,4,8. Two observations are in order. First, increas-
ing the degree of endogeneity decreases the amount of uncertainty in the LIV
estimator. This result is expected as the proposed method is designed for situ-
ations with endogeneity. In the case of a perfectly exogenous regressor, OLS
provides the ‘best linear unbiased’ estimator and outperforms LIV, but OLS
performs worse as the correlation betweenx andε increases. Secondly, when
there are four or eight instruments in the unimodal distribution (unim4 and
unim8), some efficiency is lost with the LIV approach since the model is mis-
specified under these conditions. Furthermore, the distribution ofx tends more
to a normal distribution with mean 0 and variance 2.5 when the true number of
categories is larger. From theorem 3.1 we know that a normal distribution for
the unobserved instrument results in an unidentified model. Furthermore, less
well separated mixture components may lead to lower efficiency, as became
clear from results on information in mixture models (see subsection 3.3.2).
However, when the true instrument has an obvious grouped structure, as in the
case of the two bimodal (bim2 and bim8) and the skewed (skew8) distribution
of the instrument, it is well approximated by the assumed discrete instrument.
In these cases, the LIV model represents the true instruments quite accurately,
resulting in more efficient estimates. In all cases, the LIV estimator appears to
be consistent.
In figure 3.2 we present the results forσ 2ε . It can be seen that more or less
similar conclusions hold as forβ1. The OLS estimator forσ 2ε is the ‘best’ esti-
mator forρxε = 0. Forρxε = 0.1 andρxε = 0.2 the bias in the OLS estimator
is not large, but becomes more substantial for larger regressor–error correla-
tions in which case the OLS estimator exhibits a significant downward bias.
![Page 66: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/66.jpg)
52 Chapter 3 The LIV model
Figure 3.2: Bias plotsσ 2ε for the simple LIV model, where 1: OLS, 2: bimodalm= 2
(bim2), 3: bimodalm = 8 (bim8), 4: skewedm = 8 (skew8), 5: unimodalm = 4(unim4), and 6: unimodalm= 8 (unim8).
![Page 67: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/67.jpg)
3.5 Monte Carlo experiments 53
For the skewed and the two bimodal distributions, the LIV results are very
precise even for a situation with no endogeneity. For the two symmetric uni-
modal cases (unim4 and unim8) the distribution across the simulations of the
estimates forσ 2ε has a small positive skew. As the correlation betweenx andε
increases, the variance of the estimate forσ 2ε across the simulations is slightly
higher for all cases, unlike the results forβ1. This can be expected since a
larger correlation betweenx andε implies thaty and E(y|x) are further apart,
which will adversely affect the precision of an estimate forσ 2ε . Nevertheless,
the LIV estimates forσ 2ε appear to be approximately consistent.
Results for the Hausman test for exogeneity
Table 3.1 shows the results for the Hausman-LIV test. The degrees of en-
dogeneity are presented row-wise, each entry represents the fractions of re-
jections of the null hypothesis. Increasing the correlation betweenx and ε
increases the number of times the null hypothesis is rejected, as is to be ex-
pected. Comparing the two bimodal distributions, the test performs slightly
better for bim2 in which case the number of instruments is correctly specified,
although the results are very close5. Comparing the two unimodal distributions
(unim4 and unim8) the test tends to reject the null-hypothesis somewhat too
often when the true correlation is zero for the case withm = 8. As before,
this is caused by efficiency loss due to misspecification and the approximation
to a normal distribution. When the true instrument has a skewed distribution,
the power of the Hausman-LIV test forρxε > 0 is higher than for any other
distribution that we investigated. But, in this case the test is also too liberal for
zero regressor-error correlations because the model is misspecified. The power
of the test is highest for the bimodal distributions and the skewed distribution,
and the lowest for the unimodal distributions. If the instrument has a bimodal
or a skewed distribution, the two groups imposed on the endogenousx by the
simple LIV model are a more adequate representation, allowing for precise
LIV estimates.
5We did not report the standard deviations. These can be computed easily as√
f (1− f )/ l ,where f are the reported fractions andl is the total number of simulation runs.
![Page 68: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/68.jpg)
54 Chapter 3 The LIV model
Table 3.1: Power of the Hausman-LIV test using the simple LIV model for variousdegrees of endogeneity, for sizes (respectively)α = 0.50, 0.05, and 0.01.
Distribution
α ρx,ε bim2 bim8 skew8 unim4 unim8
0.50 0 0.53 0.51 0.54 0.47 0.540.1 0.92 0.86 0.96 0.71 0.590.2 1.00 1.00 1.00 0.95 0.860.3 1.00 1.00 1.00 0.99 0.970.4 1.00 1.00 1.00 1.00 1.000.5 1.00 1.00 1.00 1.00 1.00
0.05 0 0.04 0.06 0.08 0.06 0.070.1 0.44 0.40 0.56 0.20 0.130.2 0.95 0.97 0.99 0.52 0.420.3 1.00 1.00 1.00 0.85 0.760.4 1.00 1.00 1.00 1.00 0.970.5 1.00 1.00 1.00 1.00 1.00
0.01 0 0.02 0.01 0.01 0.02 0.020.1 0.22 0.22 0.37 0.08 0.060.2 0.87 0.82 0.96 0.27 0.240.3 1.00 1.00 1.00 0.68 0.610.4 1.00 1.00 1.00 0.97 0.940.5 1.00 1.00 1.00 1.00 1.00
The results of our simulation studies for the Hausman-LIV test suggest that the
test has a reasonable power across a wide range of regressor-error correlations
and for different kinds of distributions of the instruments. Furthermore, the
proposed model test and estimation work well even if the number of instru-
ments is misspecified. We find the test to be fairly robust under misspecifica-
tion with a small bias towards rejecting the null hypothesis somewhat too often.
These results are obtained without requiring observed instrumental variables.
![Page 69: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/69.jpg)
3.5 Monte Carlo experiments 55
3.5.3 Sensitivity analysis: usingm= 3 and m= 4
The simple LIV model assumes a dummy instrument. We saw that, regardless
of true number of categories of the instrument and the shape of the distribution
of the endogenousx, the main parameters are estimated approximately consis-
tent. Here we illustrate for the skewed (skew8), bimodal (bim8), and unimodal
(unim8) distributions from the previous subsections the effect of estimating the
LIV model with m = 3 andm = 4 categories. The results demonstrate that
the conclusions for the main parameters (β1 andσ 2ε ) and the Hausman-LIV test
are fairly robust for different choices ofm, alleviating the burden of choosing
a value form.
The results forβ1 for six different regressor-error correlations are shown in
figure 3.3. In each figure, the first panel shows the results for the bimodal
distribution when the LIV model is estimated form = 2,3 and 4, the second
panel gives the results for the skewed distribution, and the third panel shows
the results for the unimodel distribution. For all three distributions the true
number of instruments is eight. It is clear that changing the number of cate-
gories of the instrument hardly effects the estimation results forβ1 and the bias
is approximately zero in all cases. The results for the unimodal distribution are
slightly more sensitive to different choices ofm, in particular whenρxε = 0.
As before, a larger value forρxε results in more precise estimates. Forσ 2ε we
also found that the results are fairly robust against different choices form and
we omit further details.
Results for the power of the Hausman test are presented6 in table 3.2. Note
that them = 2 columns are copied from table 3.1. It can be seen that for the
bimodal and skewed cases the power of the Hausman-LIV test is fairly stable
for different choices ofm. For the bimodal case withm = 4 and the skewed
case withm = 2 the power is slightly lower, although differences are small.
For the unimodal distribution, the test has a lower power form= 3 andm= 4,
in particular whenρxε is small. This can be explained from the results forβ1
6For the sake of clarity we omittedα = 0.50 from the table.
![Page 70: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/70.jpg)
56 Chapter 3 The LIV model
Figure 3.3: Bias plotsβ1 LIV model for the 1: skewed, 2: bimodal, and 3: unimodelcases, withm= 8, and estimated withm= 2, 3,4, respectively.
![Page 71: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/71.jpg)
3.5 Monte Carlo experiments 57
that showed larger amounts of uncertainty in the estimates (figure 3.3) than
for m = 2. In this case, takingm = 2 or increasing the sample size would
contribute to the power of the test.
Table 3.2: Power of the Hausman-LIV test for different choices ofm.
α = 0.05 α = 0.01
ρx,ε m= 2 m= 3 m= 4 m= 2 m= 3 m= 4
bim8 0 0.06 0.06 0.09 0.01 0.02 0.020.1 0.40 0.37 0.38 0.22 0.18 0.200.2 0.97 0.99 0.88 0.82 0.99 0.720.3 1.00 0.99 0.99 1.00 0.99 0.990.4 1.00 1.00 1.00 1.00 1.00 1.000.5 1.00 1.00 0.99 1.00 1.00 0.99
skew8 0 0.08 0.06 0.07 0.01 0.02 0.010.1 0.56 0.63 0.62 0.37 0.42 0.400.2 0.99 1.00 0.99 0.96 0.99 0.960.3 1.00 1.00 1.00 1.00 1.00 1.000.4 1.00 1.00 1.00 1.00 1.00 1.000.5 1.00 1.00 1.00 1.00 1.00 1.00
unim8 0 0.07 0.12 0.24 0.02 0.05 0.180.1 0.13 0.17 0.24 0.06 0.08 0.140.2 0.42 0.43 0.47 0.24 0.25 0.290.3 0.76 0.73 0.68 0.61 0.55 0.550.4 0.97 0.93 0.91 0.94 0.89 0.850.5 1.00 0.99 1.00 1.00 0.99 0.99
When the LIV model is estimated withm = 3 or m = 4 we found degen-
erate solutions. A degenerate solution occurs when two (or more) estimated
category means (π j ) are equal or when one category (or more) contains no
observations (λ j = 0). In such a case, the determinant of the Hessian matrix
computed in the maximum likelihood estimate is zero. Degeneracy for a fit-
ted model can be expected to occur more often when the different components
of the mixture distribution show a large overlap. Results from, among others,
![Page 72: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/72.jpg)
58 Chapter 3 The LIV model
Redner and Walker (1984) illustrate that if the sample is from a mixture of
poorly separated components (using the Mahalanobis distance as a criterion),
the estimation problem becomes more difficult and impractically large sam-
ples may be needed in order to expect moderately precise estimates for the
category means and sizes. For the LIV model7 this does not present a problem
for the main parameters, since in such case a nondegenerate solution with a
lower number of categories can be used. This can be done without harm as the
previous results illustrate that the estimates for the main parameters (β1 and
σ 2ε ) are hardly affected by changing the number of categories.
3.5.4 Results forπ , λ, σ 2ν , and σεν
For the sake of simplicity and overview we postponed discussing the results
for the other parameters (π , λ, σ 2ν , andσεν) until here. In performing a LIV
analysis interest will usually be on the parametersβ1 andσ 2ε , since the linear
regression model is the object of study. However, the other parameters can give
valuable insight. We give the results for the simple LIV model for the bimodal
distribution withm= 2 (bim2), in which case the number of categories is cor-
rectly specified. We also present the results for these parameters form= 2,3,
and 4 for the skewed skewed distribution withm = 8 (skew8), and report for
the other cases only substantially different or noteworthy outcomes8.
In table 3.3 we present the results for the bimodalm = 2 distribution (means
and standard deviations across the 250 monte carlo simulations) for which the
simple LIV model is correctly specified. It can be seen that the group means
and sizes are estimated approximately unbiased. Similar results hold for the
varianceσ 2ν and the covarianceσεν . As for β1, a higher degree of endogeneity
results in lower standard deviations for the estimates.
For the skewed distribution withm= 8, the true number of categories is larger
7The degenerate solutions are left out of the analysis. The largest number of degeneratesolutions was found for the univariate distribution (unim8) when estimated withm = 4 (about35% of the Monte Carlo replications).
8The results can be obtained from the author upon request.
![Page 73: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/73.jpg)
3.5 Monte Carlo experiments 59
Table 3.3: Results for the bimodalm = 2 case. The true values areπ1 = −π2 =−1.23, λ1 = λ2 = 0.5, σ 2
ν = 1, and the true values forσεν are given in the lastcolumn.
Parameters
ρx,ε π1 π2 λ1 λ2 σ 2ν σεν
0-1.23 1.23 0.50 0.50 1.00 0.01 0
(0.08) (0.08) (0.03) (0.03) (0.08) (0.10)
0.1-1.22 1.23 0.50 0.50 1.00 0.16 0.16
(0.08) (0.07) (0.03) (0.03) (0.07) (0.09)
0.2-1.23 1.22 0.50 0.50 0.99 0.31 0.32
(0.06) (0.06) (0.02) (0.02) (0.07) (0.09)
0.3-1.22 1.23 0.50 0.50 1.00 0.48 0.47
(0.06) (0.06) (0.02) (0.02) (0.06) (0.08)
0.4-1.23 1.22 0.50 0.50 1.01 0.64 0.63
(0.06) (0.05) (0.02) (0.02) (0.06) (0.06)
0.5-1.23 1.23 0.50 0.50 1.00 0.79 0.79
(0.05) (0.05) (0.02) (0.02) (0.05) (0.06)
than the numbers used in estimation (m= 2,3,4), in which case the estimates
for π andλ cannot be ‘consistent’. Table 3.4 and table table 3.5 show the
results form = 2,3,4 for π andλ, and forσεν andσ 2ν , respectively, when
the distribution ofx is skewed. Two comments are in order. First, across the
simulations we found that the observed meansx = (1/n)∑
i xi and variances
s2x =
∑i (xi − x)2 are (almost) equal to the estimated meansµx =
∑j λ j π j
and variancesσ 2ν =
∑j λ j (π j−µx)
2+σ 2ν , regardless of the choice form. This
means that the estimated value forσ 2ν decreases whenm increases because of a
trade-off between ‘within-group’ and ‘between-group’ variance (see table 3.5).
Secondly, for larger correlations betweenx andε, the difference between the
largest and smallest estimated group mean gets smaller. This is the effect of
the covariance on the estimated group means and illustrates, for instance, that
both equations cannot be estimated independently of each other. As a conse-
quence, the within group variance is smaller and the estimated value forσ 2ν is
higher for larger values ofρxε , which holds in particular form = 4, see table
![Page 74: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/74.jpg)
60 Chapter 3 The LIV model
3.5.
Regardless of the choice ofm, the mean and variance ofx are estimated ap-
proximately unbiased byµx andσ 2ν . Furthermore, the parametersπ andλ are
estimated consistently if the number of categories in the LIV model is correctly
specified. But, importantly, the results forβ1 andσ 2ε are approximately unbi-
ased regardless of the choice form. Hence, whetherm = m does not present
a problem in estimating the effect ofx on y in presence of regressor-error cor-
relations.
In this section we presented the results of a simulation study to investigate the
performance of the simple latent instrumental variable (LIV) model and the
power of the Hausman-LIV test. We showed that the regression coefficients
can be estimated approximately unbiased and that the Hausman-LIV test has
a reasonable power, without the requirement of having observed instrumental
variables at hand. Furthermore, the results are fairly robust against different
specifications form. In the next section the simple LIV model is applied to a
measurement error problem. In this application a natural grouping variable ex-
ists which is likely to be a ‘perfect’ instrumental variable since it was obtained
in a laboratory setting. We show that the LIV model gives in this situation
identical results to the classical IV estimator, but without using the observed
dummy instrument. The simulation results and this empirical result illustrate
that the LIV model can be successfully used in applications where perfect in-
strumental variables are not available.
3.6 An illustrative example: a simple measurement er-ror model
The data we use in this section is taken from Madansky (1959). Madansky
considers several methods to fit a straight line when both variables are sub-
ject to error. One method is based on the grouping method of Wald (see also
Bowden and Turkington, 1984) which can be viewed as an IV method. The
advantage of this dataset is that a ‘natural’ discrete instrument comes with the
![Page 75: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/75.jpg)
3.6 An illustrative example: a simple measurement error model 61
Table 3.4: Results for the skewed (m= 8) case.
Parameters
ρx,ε π1 π2 π3 π4 λ1 λ2 λ3 λ4
0 -0.34 3.64 0.92 0.08(0.05) (0.25) (0.01) (0.01)-0.49 1.99 4.64 0.84 0.12 0.04
(0.08) (0.50) (0.42) (0.04) (0.04) (0.01)-0.94 0.51 2.85 5.14 0.54 0.36 0.08 0.03
(0.43) (0.54) (0.59) (0.48) (0.20) (0.18) (0.02) (0.01)0.1 -0.34 3.66 0.92 0.08
(0.04) (0.27) (0.01) (0.01)-0.50 1.91 4.60 0.84 0.12 0.04
(0.07) (0.45) (0.38) (0.04) (0.04) (0.01)-1.00 0.46 2.72 5.01 0.51 0.38 0.08 0.03
(0.51) (0.58) (0.64) (0.57) (0.24) (0.21) (0.03) (0.01)0.2 -0.33 3.68 0.91 0.09
(0.05) (0.25) (0.01) (0.01)-0.48 2.01 4.64 0.84 0.12 0.04
(0.07) (0.44) (0.35) (0.03) (0.03) (0.01)-0.89 0.63 2.92 5.10 0.59 0.31 0.07 0.03
(0.55) (0.60) (0.64) (0.52) (0.22) (0.20) (0.03) (0.01)0.3 -0.34 3.60 0.91 0.09
(0.05) (0.23) (0.01) (0.01)-0.48 2.01 4.58 0.85 0.11 0.04
(0.06) (0.35) (0.31) (0.03) (0.02) (0.01)-0.84 0.56 2.73 4.93 0.59 0.31 0.07 0.03
(0.39) (0.57) (0.57) (0.44) (0.22) (0.20) (0.03) (0.01)0.4 -0.35 3.52 0.91 0.09
(0.04) (0.25) (0.01) (0.01)-0.47 1.97 4.52 0.85 0.11 0.04
(0.05) (0.28) (0.28) (0.02) (0.02) (0.01)-0.63 0.93 2.87 4.88 0.73 0.18 0.06 0.03
(0.24) (0.49) (0.49) (0.32) (0.13) (0.11) (0.02) (0.01)0.5 -0.35 3.40 0.91 0.09
(0.04) (0.23) (0.01) (0.01)-0.47 1.89 4.43 0.85 0.11 0.04
(0.04) (0.18) (0.22) (0.02) (0.01) (0.01)-0.55 1.13 2.83 4.76 0.79 0.12 0.05 0.03
(0.05) (0.28) (0.37) (0.25) (0.03) (0.02) (0.01) (0.01)
![Page 76: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/76.jpg)
62 Chapter 3 The LIV model
Table 3.5: Results for the skewedm= 8 case. The true value forσ 2ν = 1 and the true
values forσεν are the same as in table 3.3.
m= 2 m= 3 m= 4
ρx,ε σ 2ν σεν σ 2
ν σεν σ 2ν σεν
0 1.33 0.01 1.08 0.01 0.85 0.01(0.08) (0.08) (0.08) (0.07) (0.12) (0.07)
0.1 1.34 0.15 1.07 0.16 0.86 0.15(0.08) (0.07) (0.09) (0.07) (0.13) (0.07)
0.2 1.33 0.32 1.08 0.32 0.90 0.32(0.07) (0.07) (0.09) (0.07) (0.12) (0.07)
0.3 1.34 0.47 1.10 0.47 0.93 0.47(0.07) (0.07) (0.07) (0.06) (0.09) (0.06)
0.4 1.34 0.63 1.10 0.63 0.99 0.64(0.07) (0.07) (0.07) (0.06) (0.07) (0.06)
0.5 1.35 0.78 1.10 0.78 1.02 0.78(0.07) (0.07) (0.06) (0.06) (0.06) (0.06)
experiment. This makes a comparison between the LIV method and classical
IV method in an empirical setting of great interest, since the ‘true’ instrument
is known.
The dataset contains a random sample of 50 measures on yield strength of ar-
tillery shells (x) and measures for hardness of the shells (y). Artillery shells are
projectiles for large guns, whose quality depends, among other things, on the
hardness of steel of which they are composed. A low hardness, for instance,
may cause premature explosions in or near the muzzle of the gun, and projec-
tiles are less effective against armor when composed of low quality metal9.
The yield strength was measured by pulling a piece of steel from two sides
for a period of time and converting the new dimension into a measure of yield
strength. The (Brinell) hardness was obtained by making a dent in each shell
with a device from which the hardness could be read from a dial. The shells
9Source: www.civilwarartillery.com and www.winterwar.com
![Page 77: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/77.jpg)
3.6 An illustrative example: a simple measurement error model 63
were manufactured from two different heats of steel, which constitutes the
‘natural’ data-grouping criterion (25 observations in each group). This vari-
able can serve as an instrumental variable as the different manufacturing tem-
peratures do not affect the measuring of yield strength or hardness afterwards
(Madansky, 1959). We do not attempt to discuss any of the technicalities asso-
ciated with measuringy andx, which is beyond our scope. Madansky argues
that measurement errors in yield strength are due to inhomogeneity of steel
and other errors in the process of measuring (human or measuring-instrument
errors). This dataset is in particular interesting because of the presence of a
laboratory instrument that allows for direct comparison of the classical IV es-
timator and the LIV estimator.
Table 3.6: Results for the Madansky measurement error data (n = 50).
Method
OLS IV LIV
β 3.288 3.204 3.204(0.426) (0.440) (0.431)
Hausman test - 0.551 1.403
The results for OLS, IV, and the simple LIV model are given in table 3.6. The
IV estimate with the temperature dummy as natural instrument for the effect
of yield strength on hardness isβIV = 3.204. The simple LIV estimate (with
m = 2) is equal to 3.204, i.e. exactly identical. The latter estimate is ob-
tained without using the observed instrument. More importantly, when the a
posteriori classification found by the LIV model is compared to the ‘natural’
dummy instrument (high heat - low heat), we find that the LIV classification
and the observed classification are identical, i.e. we are able to predict the
observed instrument exactly, see table 3.7. All posterior probabilities for the
two categories of the latent instrument are either zero or one. In both cases, the
Hausman-test does not indicate a significant bias in the OLS estimate, which is
βOLS = 3.288. The similarity between LIV and IV with an observed ‘natural’
![Page 78: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/78.jpg)
64 Chapter 3 The LIV model
dummy instrument illustrates the power of the LIV method: the LIV method
which does not rely on an observed instrument gives results that are similar
to a situation where a perfect instrument is observed. A ‘natural’ instrument,
however, will rarely be available in empirical studies in economics or market-
ing10.
Table 3.7: Predicted instrument versus observed instrument (n = 50).
Observed instrument
Low heat High heat Total
Predicted instrument LIV Group 1 25 0 25Group 2 0 25 25
Total 25 25 50
Although the IV and LIV are similar, the results raise a few questions. Firstly,
both IV and LIV indicate an upward bias in OLS, whereas the classical mea-
surement error model predicts a downward bias. The upward bias that we find,
however, is in both cases not significant. Secondly, precise estimation of the
variance covariance matrix for the LIV estimate is problematic in samples of
this size. LIV is slightly more efficient than IV11 when the Hessian matrix,
evaluated at the maximum likelihood estimate, is used12 to compute the stan-
dard errors. In this example the yield strength has an obvious group structure
and a lack of knowledge of a priori group-membership (i.e. knowing the ob-
served instrument) does not result in a large loss of information. Furthermore,
since the LIV model estimates variances and covariances simultaneously, con-
10In this particular case, another estimator forβ can be constructed, if we can assume thatX = 0 impliesY = 0, so that the intercept is zero (cf. Madansky, 1954). When this is true, aconsistent estimate forβ is given byy/x = 3.434(0.04). Fixing the intercept to zero in the LIVmodel yields 3.432 (0.433), which is similar but much less efficient than Madansky’s simpleestimator.
11Madansky reports a standard error for the grouping method of 0.22, which is much smallerthan our result. For least squares he reports 0.47. We do not know exactly how these estimateswere obtained.
12Greene (2000) states that the available estimators for the asymptotic covariance matrixusually give different results in small samples, but when a sample is small or moderate sizedthe Hessian is preferable.
![Page 79: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/79.jpg)
3.7 Conclusions 65
trary to IV, more efficient results may be obtained. However, when the out-
erproduct of gradients is used to estimate the standard deviation, the classical
IV estimator is more efficient (in this case we find that the estimated standard
error forβLIV is 0.531).
As we saw, the small size of the dataset does not allow for very precise esti-
mation of the relation between yield strength and hardness of artillery shells.
If precise inferences on the relation between yield strength and hardness are
desirable, more datapoints are needed. Nevertheless, this example shows that
even in a small dataset the LIV method yields results similar to IV when an
observed ‘natural’ dummy instrument is available. However, in absence of
such an instrument the IV estimator can not be used, whereas LIV still gives
unbiased results.
3.7 Conclusions
Searching for valid instruments is a long-standing problem in estimating IV
models in economics that account for regressor-error problems. In addition,
the identification of regressor-error correlation has been impossible without
such valid instruments. Our proposed instrument-free approach presents a
practical solution to this circular problem: it can be used to estimate regres-
sion parameters and test for regressor-error correlations without the necessity
of first finding valid instruments.
In this chapter we introduced the LIV model. We proved that the LIV model
is identified through the likelihood, which is a necessary condition for the ex-
istence of a consistent estimator. Furthermore, we discussed estimation of the
information matrix. The Monte Carlo studies show that the model yields un-
biased results for several types of distributions forx, and outperforms OLS
whenever their exists a correlation betweenx andε. The Hausman-LIV test
detects departures from independence of regressor and model error with a rea-
sonable power across a wide range of regressor-error correlations. In the case
of severe model violations, the Hausman-LIV test becomes too strict in reject-
![Page 80: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/80.jpg)
66 Chapter 3 The LIV model
ing the null hypothesis when it is true. As a result, in applications of this test
researchers may search unnecessarily for manifest instruments in a small frac-
tion of cases. However, we feel that this is a small price to pay in view of the
simplicity and ease of implementing the proposed test. Importantly, the test
results and estimates forβ1 are robust against different choices for the num-
ber of categories of the instrument. Finally, we analyzed a measurement error
application with the LIV model. We showed that the LIV model gives similar
results to the classical IV estimator when a ‘natural’ dummy instrument exists.
The results in this chapter are convergent and add credibility to the LIV ap-
proach. The LIV model presents a solution to the circular problem of search-
ing for valid instruments in empirical studies. The model and Hausman-LIV
test are fairly simple and easy to implement, and the results in this chapter il-
lustrate its usefulness across a wide variety of problems. In the next chapter
we consider several extensions of the LIV model. Furthermore, we suggest
diagnostics to examine model fit.
![Page 81: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/81.jpg)
Appendix 3A Basic theorems on identifiability of mixtures 67
Appendix 3A Basic theorems on identifiability of mix-tures
Let F andH be defined as in subsection 3.3.1. Yakowitz and Spragins (1968) provethe following theorem.
Theorem [Yakowitz and Spragins, 1968].A necessary and sufficient condition forthe classH of all finite mixtures of the familyF be identifiable is thatF be a linearlyindependent set over the field of real numbers.
A corollary to this theorem is the following.
Corollary [Yakowitz and Spragins, 1968]. A necessary and sufficient condition forthe classH of all finite mixtures of the familyF be identifiable is that the image ofFunder any vector isomorphism on span(F) be linearly independent in the image space.
This corollary tends to be easier to apply than the theorem itself as it allows us to workin terms of generating functions which are often more convenient to handle than thecorresponding c.d.f.’s. Yakowitz and Spragins (1968) use this corollary to prove thatthe familyF of n-dimensional Gaussian cdf’s generates identifiable finite mixtures(their proposition 2).
For detailed proofs, extensions to other distributions we refer to Yakowitz and Spra-gins (1968) or Teicher (1963). For a recent overview and discussion, see Hennig(2000).
Appendix 3B 1st and2nd order derivatives log-likelihood
Definezi | j = (yi − β0− β1π j , xi −π j )′ and assume that the errors of the LIV model
in (3.1) have a bivariate normal distribution. Then, using (3.3) and (3.4), we have forf j (yi , xi |θ1, θ2 j , θ3) the following:
f j (yi , xi |θ1, θ2 j , θ3) =1
2π√|�| exp
(−1
2z′i | j�
−1zi | j), (3B.1)
whereθ1, θ2, andθ3 are defined in subsection 3.3.2. The quadratic formz′i | j�−1zi | j
can be rewritten as the quotientqi (θ1, θ2 j , θ3)/d(θ3) with
qi (θ1, θ2 j , θ3) =(
yi − µyi | j)2σ 2ν +
(xi − µx
i | j)2 (
β21σ
2ν + σ 2
ε + 2β1σεν
)+
− 2(
yi − µyi | j) (
xi − µxi | j) (β1σ
2ν + σεν
)
d(θ3) = σ 2ε σ
2ν − σ 2
εν, (3B.2)
![Page 82: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/82.jpg)
68 Chapter 3 The LIV model
andµyi | j = β0+ β1π j andµx
i | j = π j . Now (3B.1) can be written as
f j (yi , xi |θ1, θ2 j , θ3) =exp
[−qi (θ1, θ2 j , θ3) / 2d(θ3)]
2π√
d(θ3). (3B.3)
To simplify notation we will write fi | j for f j (yi , xi |θ1, θ2 j , θ3), fi for f (yi , xi |θ),qi | j for qi (θ1, θ2 j , θ3), andd for d(θ3).
First order partial derivatives (Gradient)
First derivatives with respect to elements ofθ1 (fixed parameters)
We have
∂
∂θ1slogL(θ) =
n∑
i=1
1
fi
m∑
j=1
λ j∂ fi | j∂θ1s
, (3B.4)
where
∂ fi | j∂θ1s
= − fi | jq′i | j (1s)
2d, (3B.5)
with q′i | j (1s) = (∂qi | j /∂θ1s), andθ11 = β0 andθ12 = β1. It follows that:
∂qi | j∂β0
= −2σ 2ν (yi − µy
i | j )+ 2(xi − µxi | j )(β1σ
2ν + σεν)
∂qi | j∂β1
= −2π j σ2ν (yi − µy
i | j )+ 2(xi − µxi | j )
2{σ 2ν β1+ σεν} +
−2(xi − µxi | j ){−π j (β1σ
2ν + σεν)+ (yi − µy
i | j )σ2ν }.
First derivatives with respect to elements ofθ2 (group means)
We have forθ2l , l = 1, ...,m,
∂
∂θ2llogL(θ) =
n∑
i=1
λl
fi
∂ fi |l∂θ2l
, (3B.6)
since
∂
∂θ2lfi | j = 0, for l 6= j .
![Page 83: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/83.jpg)
Appendix 3B 1st and 2nd order derivatives log-likelihood 69
Taking this expression a few steps further, we get
∂ fi |l∂θ2l
= − fi |lq′i |l (2)
2d, (3B.7)
where theq′i |l (2) are equal to:
∂qi |l∂θ2l
= 2(σεν(yi − µy
i |l )− (σ 2ε + β1σεν)(xi − µx
i |l )),
whereθ2l = πl , l = 1, ...,m.
First derivatives with respect to elements ofθ3 (variances and covariance)
We have
∂
∂θ3slogL(θ) =
n∑
i=1
1
fi
m∑
j=1
λ j∂ fi | j∂θ3s
, (3B.8)
where(∂ fi | j /∂θ3s) is equal to
∂ fi | j∂θ3s
= fi | j
qi | j d′(3s) − d(q′i | j (3s) + d′(3s)
)
2d2
, (3B.9)
with q′i | j (3s) = (∂qi | j /∂θ3s), d′(3s) = (∂d/∂θ3s), θ31 = σ 2ε , θ32 = σ 2
ν , andθ33 = σεν ,and
∂qi | j∂σ 2
ε
= (xi − µxi | j )
2
∂qi | j∂σ 2
ν
= (yi − µyi | j )
2+ β21(xi − µx
i | j )2− 2(yi − µy
i | j )(xi − µxi | j )β1
∂qi | j∂σεν
= 2[β1(xi − µx
i | j )2− (yi − µy
i | j )(xi − µxi | j )], (3B.10)
and
∂d
∂σ 2ε
= σ 2ν ,
∂d
∂σ 2ν
= σ 2ε ,and
∂d
∂σεν= −2σεν . (3B.11)
![Page 84: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/84.jpg)
70 Chapter 3 The LIV model
Estimating the model via numerical optimization
When the log-likelihood function obtained from (3.20) is maximized via numericaloptimization techniques, the estimates obtained for the variances and the groups sizesλ j , j = 1, ...,m, do in general not necessarily satisfy the constraintsσ 2
ε ≥ 0, σ 2ν ≥ 0,
σ 2εν ≤ σ 2
ε σ2ν , 0 < λ j < 1, j = 1, ...,m, and
∑j λ j = 1. This can be circumvented
by optimizing the log-likelihood function not for these parameters directly, but for thetransformed parameters (say)a,b, c andλ j , j = 1, ...,m− 1, and
[a b0 c
]= chol(6),
where ‘chol’ denotes the Cholesky decomposition, implying that
σ 2ε = a2
σεν = ab
σ 2ν = b2+ c2,
and
λ j = ln
(λ j
λm
),
for j = 1, ...,m− 1 andλm = 0, implying
λ j =exp(λ j )
1+∑k exp(λk),
for j = 1, ...,m − 1 andλm = 1 −∑k λk. These expressions can be substitutedin (3.21). The derivatives should now be taken with respect toθ3 = (a,b, c) andθ4 = (λ1, ..., λm−1). This does not effect the general expressions of∂
∂θ3logL(θ)
in (3B.8) and (3B.9), but the derivatives in (3B.10) and (3B.11) need to be taken withrespect toa,b andc. The first order derivative of the log-likelihood with respect to theelements ofθ4 given in (3.23) have to be changed forθ4 = (λ1, ..., λm−1) as follows
∂
∂θ4 jlogL(θ) =
n∑
i=1
1
fi
∂
∂θ4 j
m∑
l=1
λl (λ1, ..., λm−1) fi |l
=n∑
i=1
1
fi
m−1∑
l 6= j
[−( fi |l − fi |m)exp(λl + λ j )
(1+∑k exp(λk))2
]+
+( fi | j − fi |m)exp(λ j )(1+
∑m−1k 6= j exp(λk))
(1+∑k exp(λk))2
}. (3B.12)
![Page 85: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/85.jpg)
Appendix 3B 1st and 2nd order derivatives log-likelihood 71
Second order partial derivatives (Hessian)
In the following we examine the second order derivatives in more detail:
∂2
∂θ21s
∂2
∂θ1s∂θ1t
∂2
∂θ1s∂θ2t
∂2
∂θ1s∂θ3t
∂2
∂θ1s∂θ4t
∂2
∂θ22s
∂2
∂θ2s∂θ2t
∂2
∂θ2s∂θ3t
∂2
∂θ2s∂θ4t
∂2
∂θ23s
∂2
∂θ3s∂θ3t
∂2
∂θ3s∂θ4t
∂2
∂θ24s
∂2
∂θ4s∂θ4t,
(3B.13)
wheres andt indicate the different elements inθ1, θ2, θ3, andθ4. Due to the continuityof the log-likelihood (except for some boundary points whered(θ3) = 0), we haveequality of the mixed partial derivatives (i.e.∂2 f (x, y)/∂x∂y = ∂2 f (x, y)/∂y∂x,see Apostol, 1969). We do not give the expressions for the second order derivativesof the qi | j ’s andd here, which can be derived easily from (3B.2) and the first orderpartial derivatives ofqi | j andd above.
Second order partial derivatives elements ofθ1 (fixed effects)
The second order partial derivative∂2/∂θ1s∂θ1t can be computed using (3.24), where
∂2 fi | j∂θ1s∂θ1t
= ∂
∂θ1s
{∂ fi | j∂θ1t
}
= −{∂ fi | j∂θ1s
}∂qi | j∂θ1t
1
2d− fi | j
2d
∂2qi | j∂θ1sθ1t
,
which gives, on using (3B.5)
∂2 logL(θ)∂θ1s∂θ1t
=n∑
i=1
−
(m∑
l=1
λl fi |l2d fi
∂qi |l∂θ1s
)
m∑
j=1
λ j fi | j2d fi
∂qi | j∂θ1t
+
+
m∑
j=1
λ j fi | j4d2 fi
{∂qi | j∂θ1s
∂qi | j∂θ1t− 2d
∂2qi | j∂θ1sθ1t
} .
(3B.14)
The second order partial derivative with respect to the same elements ofθ1 (i.e. s= t)is almost equal to (3B.14) but can be simplified a bit more, namely
![Page 86: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/86.jpg)
72 Chapter 3 The LIV model
∂2 logL(θ)∂θ2
1s
=n∑
i=1
−
m∑
j=1
(λ j fi | j2d fi
∂qi | j∂θ1s
)
2
+
+
m∑
j=1
λ j fi | j4d2 fi
{(∂qi | j∂θ1s
)2
− 2d∂2qi | j∂θ2
1s
} .
(3B.15)
The second order partial derivative of the log-likelihood with respect to the elementsof θ1 andθ2 is given by
∂2 logL(θ)∂θ1s∂θ2l
=n∑
i=1
−
m∑
j ′=1
λ j ′
fi
∂ fi | j ′∂θ1s
[λl
fi
∂ fi |l∂θ2l
]+
n∑
i=1
λl
fi
∂2 fi |l∂θ1s∂θ2l
, (3B.16)
with
∂2 fi |l∂θ1s∂θ2l
= ∂
∂θ1s
{∂ fi |l∂θ2l
}
= fi |l2d
{1
2d
∂qi |l∂θ1s
∂qi |l∂θ2l− ∂2qi |l∂θ1s∂θ2l
}.
The results follows using (3B.5) and (3B.7) fors= 1,2 andl = 1, ...,m.
For (∂2 logL(θ)/∂θ1s∂θ3t ), s= 1,2 andt = 1, 2, 3, we first need to compute
∂2 fi | j∂θ1s∂θ3t
= ∂
∂θ1s
{fi | j2d2
[qi | j
∂d
∂θ3t− d
(∂qi | j∂θ3t+ ∂d
∂θ3t
)]}
= fi | j2d2
{(3d − qi | j
2d
)∂qi | j∂θ1s
∂d
∂θ3t+ 1
2
∂qi | j∂θ1s
∂qi | j∂θ3t− d
∂2qi | j∂θ1s∂θ3t
},
(3B.17)
from which it follows that
![Page 87: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/87.jpg)
Appendix 3B 1st and 2nd order derivatives log-likelihood 73
∂2 logL(θ)∂θ1s∂θ3t
=n∑
i=1
−1
( fi )2
m∑
j ′=1
(−λ j ′ fi | j ′
2d
∂qi | j ′∂θ1s
)×
×
m∑
j=1
λ j fi | j2d2
[qi | j
∂d
∂θ3t− d
(∂qi | j∂θ3t+ ∂d
∂θ3t
)]+
+n∑
i=1
1
fi
m∑
j=1
λ j fi | j2d2
{(3d − qi | j
2d
)∂qi | j∂θ1s
∂d
∂θ3t+
+1
2
∂qi | j∂θ1s
∂qi | j∂θ3t− d
∂2qi | j∂θ1s∂θ3t
}. (3B.18)
The second order partial derivative of the LIV log-likelihood with respect to the ele-ments ofθ1 andθ4 is obtained using (3.25),
∂2 logL(θ)∂θ1s∂θ4l
=n∑
i=1
fi∂∂θ1s
(fi |l − fi |m
)− ( fi |l − fi |m)
∂∂θ1s
fi
( fi )2, (3B.19)
where ∂ fi |l∂θ1s
and ∂ fi |m∂θ1s
are given in (3B.5) and∂ fi∂θ1s= ∑m
j=1 λ j∂ fi | j∂θ1s
, for s = 1,2 andl = 1, ...,m− 1.
Second order partial derivatives elements ofθ2 (group means)
The structure of the second order partial derivatives of the log-likelihood with respectto the elements ofθ2 is similar to results forθ1. In fact, several simplifications can bemade because∂ fi | j /∂θ2l and∂qi | j /∂θ2l are both zero forj 6= l , j = 1, ...,m. For∂2 logL(θ)/∂θ2
2l we have
∂2 logL(θ)∂θ2
2l
=n∑
i=1
−1
( fi )2
(λl∂ fi |l∂θ2l
)2
+n∑
i=1
λl
fi
∂2 fi |l∂θ2
2l
, (3B.20)
where∂ fi |l /∂θ2l is given in (3B.7) and
∂2 fi |l∂θ2
2l
= ∂
∂θ2l
{− fi |l
2d
∂qi |l∂θ2l
}
= fi |l2d
{1
2d
(∂qi |l∂θ2l
)2
− ∂2qi |l∂θ2
2l
},
for l = 1, ...,m. The second order partial derivative with respect to different elementsin θ2 can be simplified, since
![Page 88: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/88.jpg)
74 Chapter 3 The LIV model
∂2 fi | j∂θ2k∂θ2l
= ∂
∂θ2k
{− fi |l
2d
∂qi |l∂θ2l
}= 0,
for j 6= k, j = 1, ...,m, from which it follows that
∂2 logL(θ)∂θ2k∂θ2l
=n∑
i=1
−(λk fi |k2d fi
∂qi |k∂θ2k
)(λl fi |l2d fi
∂qi |l∂θ2l
). (3B.21)
The second order partial derivatives with respect toθ2l andθ3t are
∂2 logL(θ)∂θ2l ∂θ3t
=n∑
i=1
−(λl
fi
∂ fi |l∂θ2l
)
m∑
j=1
λ j
fi
∂ fi | j∂θ3t
+
+n∑
i=1
λl
fi
∂2 fi |l∂θ2l ∂θ3t
, (3B.22)
because∂2 fi | j /∂θ2l ∂θ3t = 0, for j 6= l , and equal to
fi |l2d2
{(3d − qi |l
2d
)∂qi |l∂θ2l
∂d
∂θ3t+ 1
2
∂qi |l∂θ2l
∂qi |l∂θ3t− d
∂2qi |l∂θ2l ∂θ3t
},
for j = l , similar to (3B.17). Using (3.25) we have
∂2 logL(θ)∂θ2k∂θ4l
=n∑
i=1
fi∂∂θ2k
{fi |l − fi |m
}− { fi |l − fi |m}
∂∂θ2k
fi
( fi )2, (3B.23)
where
∂ fi∂θ2k
= ∂
∂θ2k
m∑
j=1
λ j fi | j
= λk∂ fi |k∂θ2k
,
which is given in (3B.7), and
(∂{
fi |l − fi |m}/∂θ2k
) = 0 k 6= l ,m= (
∂ fi |l /∂θ2l)
k = l= − (∂ fi |m/∂θ2m
)k = m.
![Page 89: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/89.jpg)
Appendix 3B 1st and 2nd order derivatives log-likelihood 75
Second order partial derivatives elements ofθ3 (variances-covariance)
Here we derive the second order partial derivatives of the log-likelihood with respectto different elements inθ3. The second order partial derivatives with respect to thesame elements ofθ3 can be derived in a similar way and are not given here. We have
∂2 fi | j∂θ3s∂θ3t
= ∂
∂θ3s
{∂ fi | j∂θ3t
}= ∂ fi | j∂θ3s
qi | j ∂d∂θ3t− d
(∂qi | j∂θ3t+ ∂d
∂θ3t
)
2d2
+
+ fi | j∂
∂θ3s
qi | j ∂d∂θ3t− d
(∂qi | j∂θ3t+ ∂d
∂θ3t
)
2d2
, (3B.24)
where the latter factor
∂
∂θ3s
qi | j ∂d∂θ3t− d
(∂qi | j∂θ3t+ ∂d
∂θ3t
)
2d2
=
= 1
2d2
{(qi | j − d
) ∂2d
∂θ3s∂θ3t+ ∂qi | j∂θ3t
∂d
∂θ3s+ ∂qi | j∂θ3s
∂d
∂θ3t+
− (2qi | j − d)
d
∂d
∂θ3s
∂d
∂θ3t− d
∂2qi | j∂θ3s∂θ3t
},
and (3B.24) becomes
∂2 fi | j∂θ3s∂θ3t
= fi | j4d4
{(3d2+ q2
i | j − 6dqi | j) ∂d
∂θ3s
∂d
∂θ3t+(3d2− dqi | j
)×
×(∂d
∂θ3s
∂qi | j∂θ3t+ ∂qi | j∂θ3s
∂d
∂θ3t
)+ d2∂qi | j
∂θ3s
∂qi | j∂θ3t+ (3B.25)
+2d2 (qi | j − d) ∂2d
∂θ3s∂θ3t− 2d3 ∂2qi | j
∂θ3s∂θ3t
}.
Then combining (3.24), (3B.9), and (3B.25) gives
![Page 90: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/90.jpg)
76 Chapter 3 The LIV model
∂2 logL(θ)∂θ3s∂θ3t
=n∑
i=1
−
m∑
j ′=1
λ j ′ fi | j ′2d2 fi
{qi | j ′
∂d
∂θ3s− d
(∂qi | j ′∂θ3s
+ ∂d
∂θ3s
)} ×
×
m∑
j=1
λ j fi | j2d2 fi
{qi | j
∂d
∂θ3t− d
(∂qi | j∂θ3t+ ∂d
∂θ3t
)}+
+
m∑
j=1
λ j fi | j4d4 fi
{(3d2+ q2
i | j − 6dqi | j) ∂d
∂θ3s
∂d
∂θ3t+ (3B.26)
+(3d2− dqi | j
)( ∂d
∂θ3s
∂qi | j∂θ3t+ ∂qi | j∂θ3s
∂d
∂θ3t
)+
+d2∂qi | j∂θ3s
∂qi | j∂θ3t+ 2d2 (qi | j − d
) ∂2d
∂θ3s∂θ3t− 2d3 ∂2qi | j
∂θ3s∂θ3t
})},
for s, t = 1, 2, 3, s 6= t . The results fors = t can be obtained in a similar way andthe result is more or less similar to (3B.26). The second order partial derivatives withrespect toθ3s, s= 1, 2,3, andθ4l , l = 1, ...,m− 1, are equal to
∂2 logL(θ)∂θ3s∂θ4l
=n∑
i=1
1
2d2 fi
{(fi |l qi |l − fi |mqi |m− d( fi |l − fi |m)
) ∂d
∂θ3s+
−d
(fi |l∂qi |l∂θ3s− fi |m
∂qi |m∂θ3s
)− ( fi |l − fi |m)
fi
m∑
j=1
λ j fi | j×
×[qi | j
∂d
∂θ3s− d
(∂qi | j∂θ3s
+ ∂d
∂θ3s
)])}. (3B.27)
Here we applied formula (3.25).
Second order partial derivatives elements ofθ4 (group-sizes)
The second-order partial derivatives for the group-sizes are given by
∂2 logL(θ)∂θ4k∂θ4l
=
∑ni=1−( fi |l− fi |m)2
f 2i
if k = l∑n
i=1−( fi |l− fi |m)( fi |k− fi |m)
f 2i
if k 6= l ,(3B.28)
for k, l = 1, ...,m− 1.
![Page 91: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/91.jpg)
Chapter 4
LIV implementation issues
4.1 Introduction
In this chapter we extend the LIV model proposed in the previous chapter and
we present several diagnostics to facilitate an LIV analysis. In section 4.2 we
include several exogenous regressors in the model and allow for the possibility
that observed instrumental variables (IVs) are available. Both are important
generalizations for empirical work. We prove in section 4.2 that the parame-
ters of the extended LIV model are identifiable, using a similar approach as in
subsection 3.3.1.
When a researcher has access to observed instrumental variables, an important
question is whether these instrumental variables are valid. Valid instruments
explain a considerable amount of the variance of the endogenous regressor
and have no direct effect on the dependent variable. Unfortunately, the per-
formance of the classical IV estimator, which has been used extensively in
empirical applications, critically relies on the quality of observed instruments,
see chapter 2. Classical IV models are identified using additional variables in
the form of instruments that are constructed on basis of a priori grounds, such
as economic sense or intuition. The assumptions made, however, are often
questionable, see for instance Card’s (1999, 2001) discussion on the validity
of the instrumental variables used in estimating the return to schooling. Un-
fortunately, examining instrument validity is not straightforward in a classical
77
![Page 92: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/92.jpg)
78 Chapter 4 LIV implementation issues
IV model. In section 4.3 we propose a new approach based on Wald tests
(see e.g. Greene, 2000) to investigate the validity of observed instrumental
variables, and we show by means of a simulation study that the proposed tests
have a reasonable power in identifying weak or endogenous instruments. If the
observed instruments are found to be valid, a classical IV estimator (2SLS or
LIML) can be used, or the observed IVs can be combined with a latent discrete
instrument in the LIV model, yielding potentially more efficient estimates. If
the null hypothesis of having valid observed IVs is rejected, then the classical
IV estimator is known to be biased but the LIV estimator can still be used to
make valid inferences.
Thirdly, in carrying out an LIV analysis several implementational issues need
to be addressed. Here we propose diagnostics to choose the number of cate-
goriesm, to examine the LIV residuals, and to identify outliers or influential
observations. In the previous chapter we showed using synthetic data that the
existence of a latent dummy instrumental variable (m = 2) allows for con-
sistent estimation of the regression parameters and we performed a sensitivity
analysis usingm= 3 orm= 4. It was shown that the main results and conclu-
sions are robust against different choices ofm. In empirical studies one may
wish to choose for one particular value form and we present several diagnos-
tics that can be used for this purpose. Furthermore, the normality assumption
made to compute the maximum likelihood estimates may be invalid. We inves-
tigate in a simulation study the sensitivity of the LIV estimates for misspecifi-
cation of the distribution of the error terms and we find that the results for the
regression parameters are fairly robust against a misspecified likelihood. Fur-
thermore, we propose a way to compute the ‘LIV residuals’ that can be used to
investigate the normality assumption of the disturbances and to examine het-
eroscedasticity. Outliers and influential observations may present a problem in
estimation because of their large influence on the results, and their presence in
large numbers may point out that the used model failed to capture important
aspects of the data. The available regression diagnostics (e.g. Fox, 1991, Bel-
sley, Kuh, and Welsch, 1980, or Cook and Weisberg, 1982) are not applicable
but can be extended in a straightforward manner (see also Wang et al., 1996).
![Page 93: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/93.jpg)
4.2 Additional regressors and identifiability 79
We address these issues in section 4.5.
Finally, in section 4.6 we present another method to test for regressor-error
dependencies (section 3.4). This method is based on Wald’s test-principle and
can be obtained as a by-product of an IV analysis. In section 4.7 we conclude
this chapter.
4.2 Including exogenous regressors in the LIV modeland identifiability
The simple LIV model in (3.1) is extended by including additional exogenous
regressors and instrumental variables as follows
yi = β0+ β1xi + x′2iβ2+ εi ,
xi = π ′zi + z′2i γ2+ νi ,(4.1)
where i = 1, ...,n. The l1 × 1 vectorx2i contains the observations on the
exogenous regressors, andz2i is the l2 × 1 vector of observations on the ex-
ogenous instruments. The regression parameterβ2 is anl1 × 1 vector of un-
knowns and represents the effect ofx2i on yi . Similarly, the unknown vector
γ2 has dimensionl2 × 1, and denotes the effect of the exogenous regressors
z2i on the endogenous variablexi . As before,π is anm × 1 vector of cat-
egory means andzi is the unobserved discrete instrument withm categories,
that have sizesλ j > 0, j = 1, ...,m, where∑m
j=1 λ j = 1. The errors(εi , νi )
are independently and identically distributed according to a bivariate normal
distribution with mean zero and variance-covariance matrix (3.2). The vector
z2i contains the elements ofx2i and, in addition, possible other exogenous re-
gressors that do not have a direct effect onyi . I.e.,z2i = (x′2i , x′3i )′ andl2 ≥ l1.
As will be shown later,z2i cannot contain a constant term. The variables in
x3i can be interpreted as the ‘traditional’ instrumental variables. We define
X2 = (x′21, ..., x′2n)′, X3 = (x′31, ..., x′3n)
′, andZ2 = [X2 X3].
It can be seen that the general LIV model is a mixture of normal distribution
functions. Conditionally on groupj , and the setsx2i andx3i , the reduced form
![Page 94: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/94.jpg)
80 Chapter 4 LIV implementation issues
mean is given by
µyi | j = β0+ β1π j + x′2iβ2+ z′2i γ2β1
µxi | j = π j + z′2i γ2, (4.2)
and the variance is equal to (3.4). The unconditional mean of(yi , xi ) is
µyx =(β0+ β1
∑mj=1 λ jπ j + x′2iβ2+ z′2i γ2β1∑m
j=1 λ jπ j + z′2i γ2
),
and the unconditional variance-covariance matrix is given by1
�yx = �+(β1π
′
π ′
)var(zi )
(β1π
′
π ′
)′,
where var(zi ) = diag(λ) − λλ′, λ = (λ1, ..., λm)′ (e.g. Weisstein, 2004a). In
the following we apply a similar approach as in subsection 3.3.1 to prove iden-
tifiability of all parameters of the LIV model in (4.1). Identifiability is now
conditional on a set of observationsSi = (x2i , x3i ) = z2i , i = 1, ...,n.
Let β = (β0, β1, β′2)′ and define the set
Fβ,γ2,6,Si= {FSi
|FSiis a bivariate normal c.d.f. onR2 of the pair(yi , xi )
with mean and variance(µi (β, π, γ2),�(β,6)
), π ∈ R} , (4.3)
whereµi (β, π, γ2) = (β0+β1π+x′2iβ2+z′2i γ2β1, π+z′2i γ2)′ and�(β,6) =
� as in (3.4). This defines the class of general LIV models with givenβ, γ2
and6. Let us now focus on the mixture distribution obtained from (4.3). It is
defined by the parametersπ = (π1, ..., πm)′, whereπ j ∈ R andπi 6= π j for
i 6= j , andλ = (λ1, ..., λm)′ with λ j > 0,
∑j λ j = 1, wherem = 1,2, ....
According to the mixture distribution, the outcomeπ j occurs with probability
1Use the reduced form of (4.1), the relation var(y, x) = E [var(y, x|z)] + var[E (y, x|z)]and var(a′X) = a′var(X)a.
![Page 95: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/95.jpg)
4.2 Additional regressors and identifiability 81
λ j . Let Hβ,γ2,6,Si ;π,λ be a mixture fromFβ,γ2,6,Siof orderm, defined by the
parametersπ andλ, for i = 1, ...,n. For the complete dataset we define the
class
Hβ,γ2,6,S={
Hβ,γ2,6,S;π,λ|Hβ,γ2,6,S;π,λ =n⊗
i=1
Hβ,γ2,6,Si ;π,λ, π j ∈ R,
πi 6= π j for i 6= j ; λ j > 0,m∑
j=1
λ j = 1, for i, j = 1, ...,m,m= 1,2, ...
,
where⊗
denotes the independent product of distributions (the observations
(yi , xi ) are independently, but not identically distributed). We considerm= 1
or m > 1 depending on whether an observed instrumental variable is avail-
able. Identifiability of the general LIV model is established by proving that
the classGS =⋃β,γ2,6
Hβ,γ2,6,Sis identifiable. We first proof identifiability
of Hβ,γ2,6,S.
Proof of identifiability of Hβ,γ2,6,S. The setHβ,γ2,6,S
is identifiable if and
only if Hβ,γ2,6,S;π,λ ∈ Hβ,γ2,6,Shas a unique representation in terms of the
mixing proportionsλ j and theπ j ’s, and, equivalently, eachHβ,γ2,6,Si ;π,λ, i =1, ...,n, has a unique representation. Hence, we can apply a similar reasoning
to prove identifiability ofHβ,γ2,6,Sas proving the identifiability ofHβ,6 in
subsection 3.3.1.
More specifically, defineyi = yi − x′2iβ2 − z′2i γ2β1 = β0 + β1π j and xi =xi − z′2i γ2 = π j . Givenβ, γ2, 6, andS, the model for(yi , xi ) is similar to
(3.1), and, hence,Hβ,γ2,6,Sihas a unique representation. Therefore, the set
Hβ,γ2,6,Sis identifiable.
Proof of identifiability of GS. In the following we proof theorem 4.1.
Theorem 4.1Hβ,γ2,6,Sis identifiable for allβ, γ2, and6 positive semi-
definite⇐⇒ GS is identifiable.
![Page 96: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/96.jpg)
82 Chapter 4 LIV implementation issues
Proof (⇒) Let FS,GS ∈ GS such thatFS ≡ GS, with
FS =n⊗
i=1
F Si ∈ Hβ,γ2,6,S
GS =n⊗
i=1
GSi ∈ Hδ,ζ2,9,S,
where F Si = ∑mj=1 a j F
Sij and GSi = ∑k
l=1 bl GSil , a1, ...,am and b1, ...,bk
are the mixing proportions, the distributionsF Si1 , ..., F Si
m ∈ Fβ,γ2,6,Siare all
different, andGSi1 , ...,G
Sik ∈ Fδ,ζ2,9,Si
are all different, where
F Sij is the c.d.f. of N2
(µi (β, γ2, π j ),�(β1, 6)
)
GSil is the c.d.f. ofN2
(µi (δ, ζ2, τl ),�(δ1, 9)
).
By definition 3.1, identifiability ofGS follows if it is proven thatFS can be writ-
ten uniquely in terms of the parametersm, β, γ2, 6, π j , anda j , j = 1, ...,m
(modulo permutation).FS ≡ GS implies thatF Si = GSi for i = 1, ...,n, since
FS ≡ GS if and only if the marginals for eachi = 1, ..., n are identical. Both
Hβ,γ2,6,SandHδ,ζ2,9,S
constitute identified sets (by assumption), and, hence,∑mj=1 a j F
Sij and
∑kl=1 bl G
Sil both have unique representations in terms ofπ
andτ , the mixing proportionsa andb, and the number of componentsm and
k. So, givenβ, γ2, 6 (δ, ζ2, 9) andSi , there are no two sets of parametersπ , a,
andm (τ,b,andk) that lead to the same distribution functionF Si (GSi ), which
is just a mixture of bivariate normal distributions. Using this and identifiability
of bivariate normal mixtures (e.g. appendix 3A),FS ≡ GS implies thatm= k,
a j = b j and F Sij = GSi
j , modulo permutation. Subsequently,F Sij = GSi
j im-
plies thatµi (β, γ2, π j ) = µi (δ, ζ2, τ j ) and�(β1, 6) = �(δ1, 9). Combining
for all i = 1, ...,n, and writingγ2 = (γ ′21, γ′22)′ andζ2 = (ζ ′21, ζ
′22)′ we have
![Page 97: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/97.jpg)
4.2 Additional regressors and identifiability 83
(β0+ β1π j )ιn + [X2 X3]
(β2+ γ21β1β1γ22
)=
(δ0+ δ1τ j )ιn + [X2 X3]
(δ2+ ζ21δ1δ1ζ22
)(4.4)
π j ιn + Z2γ2 = τ j ιn + Z2ζ2 (4.5)
β21σ
2ν + σ 2
ε + 2β1σεν = δ21ψ
2ν + ψ2
ε + 2δ1ψεν (4.6)
β1σ2ν + σεν = δ1ψ
2ν + ψεν (4.7)
σ 2ν = ψ2
ν (4.8)
for j = 1, ...,m. We need to prove thatβs = δs, s= 0,1,2,γ2 = ζ2, σ2ε = ψ2
ε ,
σεν = ψεν , andσ 2ν = ψ2
ν .
From (4.5) we have that(π j − τ j )ιn = Z2(ζ2− γ2), for j = 1, ...,m. Suppose
π j − τ j = c 6= 0,∀ j , thencιn = Z2(ζ2− γ2), or
[ιn Z2
] ( cγ2− ζ2
)= 0,
whereZ2 = [X2 X3]. If [ ιn Z2] has full column rank, thenc = 0, and it fol-
lows thatπ j = τ j , andζ2 = γ2.
Using this result with (4.4), we obtain
(β0+ β1π j )ιn + [X2 X3]
(β2+ γ21β1β1γ22
)=
(δ0+ δ1π j )ιn + [X2 X3]
(δ2+ γ21δ1δ1γ22
)
⇐⇒[ιn X2 X3
](δ0− β0)+ π j (δ1− β1)
δ2− β2+ (δ1− β1)γ21(δ1− β1)γ22
= 0.
Again, if [ιn Z2] has full column rank, then
![Page 98: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/98.jpg)
84 Chapter 4 LIV implementation issues
(δ0− β0)+ π j (δ1− β1) = 0 (4.9)
(δ2− β2)+ (δ1− β1)γ21 = 0 (4.10)
(δ1− β1)γ22 = 0, (4.11)
for j = 1, ...,m. By definition allπ j ’s are different. Ifm > 1, then (4.9)
yieldsβ1 = δ1 andβ0 = δ0. Subsequently, from (4.10) it follows thatδ2 = β2.
Regardless of the value ofγ22, (4.11) is satisfied. Ifm = 1, it can be seen
thatγ22 6= 0 is needed to establish identifiability (this situation is identical to
classical IV and it means that instrumentsx3i that explain part of the variance
in xi need to be available ).
Sinceβ1 = δ1, equality of the variances and covariances follows from (4.6) -
(4.8). We conclude that anyFS ∈ GS has a unique representation and henceGS
constitutes an identified set. The reverse of the proof (⇐) follows immediately
(i.e. a subset of an identified set must be identified as well).
From theorem 4.1 it can be concluded that all parameters of the general LIV
model given in (4.1) are identifiable, assuming that the errors have a bivariate
normal distribution. The following remarks are in place:
1. When x2i ⊂ z2i andγ22 6= 0 (i.e. there is a valid set of instruments
x3i available),m may be equal to 1. In this case, the LIML estimate is
identical to the classical LIML estimate. Whenx2i = z2i (i.e. there is no
valid set of instruments available),m> 1 is required for identifiability.
2. The model parameters are also identified when the elements ofγ2 cor-
responding tox2 are zero. This implies that the regressorsx2 are inde-
pendent of the endogenous regressorx, i.e. there is no multicollinearity
betweenx andx2. In practice this is unlikely to be the case and in order
to avoid using biased estimates by imposing false restrictions it is not
advisable to restrictγ2 (partly) to zero (see also Wooldridge (2002), p.
91).
![Page 99: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/99.jpg)
4.3 Investigating observed instrumental variables 85
3. From the above proof it follows that [ιn X2 X3] should have full column
rank. This implies thatn has to be larger thanm.
In appendix 4A we discuss how the first and second order derivatives presented
in subsection 3.3.2 can be extended to the more general model. The general
LIV model is estimated by maximizing the log-likelihood obtained from (4.1).
Next we argue that this model can be used to investigate the validness of avail-
able observed instrumental variables.
4.3 Investigating observed instrumental variables
In chapter 2 we discussed classical instrumental variables estimation and the
problems associated with using potential weak and endogenous instruments. In
a classical framework these assumptions cannot be tested for straightforwardly.
For instance, when an instrument is weak it does not explain any (or at most
a small amount) of the variance of the endogenous regressor. In this case,
classical asymptotic theory gives inferior approximations to the finite sample
distribution and standard test procedures are not applicable (see also chapter
2). If an observed instrument is not truly exogenous, the IV estimator (2SLS
or LIML) is biased (see e.g. Bound, Jaeger, and Baker, 1995). In this case,
the instrument has a direct effect on the dependent variable, in addition to the
usual indirect effect through the endogenous regressor. Here we propose a
simple approach based on the Wald test principle to investigate whether weak
instruments are present (subsection 4.3.1). Furthermore, we propose a test to
test for the presence of endogenous instruments (subsection 4.3.2).
4.3.1 Testing for weak instruments
In section 4.2 we proved that the general LIV model is identifiable whenever
m > 1, and, hence, also when the observed instrumental variables have no
effect on the endogenous regressor (i.e. the observed instrumental variables
are weak). A Wald-test can be used to test whetherγ22 = 0 or not. In gen-
eral, the Wald test-statistic to test for the validity ofr linear restrictions, i.e.
H0 : Rθ = q, is given byW = (Rθ − q)′[R var(θ)R′]−1(Rθ − q), and has
![Page 100: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/100.jpg)
86 Chapter 4 LIV implementation issues
approximately aχ2-distribution withr degrees of freedom underH0 (Greene,
2000). In our case,θ are the elements fromγ22, R= Il2−l1, q = 0, andvar(θ)
can be estimated using the methods in subsection 3.3.2.
From the simulation studies presented in the following it will become clear
that this test should be used with a conservative significance levelα in order to
be effective. Furthermore, it is advisable to accompany this test with more
traditional methods like theR2 and theF-statistic of the regression of the
endogenous regressor on the set of instrumental variables (see also subsection
4.5.2 and chapter 2).
4.3.2 Testing for endogenous instruments
An endogenous instrument is correlated with the error term of the regression
equation, in which case it has a direct effect on the dependent variable. I.e.,
the error term also contains an effect (say)e of the observed instruments, in
addition to the usual unobserved effects. The ‘total’ error is, in this case, given
by u = e(x3)+ ε. We propose the following procedure to investigate possible
instrument endogeneity.
Instruments that are suspected to be correlated with the error term should be
included in the main regression equation. Subsequently, a Wald-test can be
used to investigate whether or not they have a non-zero effect on the dependent
variable. To be more specific, suppose thatl2 ≤ l2−l1 instrumental variables in
(4.1) are possibly endogenous. Including these variables in the main regression
equation yields
yi = β0+ β1xi + x′2iβ2+ x′3iβ3+ εi
xi = π ′zi + z′2i γ2+ νi ,(4.12)
where all variables are defined as before, andx3i is a l2 × 1 vector contain-
ing the elements ofx3i that are possibly endogenous. The null hypothesis for
instrument exogeneity, given byH0 : β3 = 0, can be tested for by using the
Wald approach discussed in the previous subsection.
![Page 101: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/101.jpg)
4.4 A simulation study 87
4.4 A simulation study
In this section we present the results of a simulation study to demonstrate that
the general LIV model in (4.1) can be used successfully to estimate the regres-
sion parameters and the error variance in presence of an endogenous regressor.
Furthermore, we show that the proposed Wald-tests have a reasonable power
to detect invalid instruments.
Data was generated using model (4.12) withβ0 = 1, β1 = 2, andβ2 = 0.25.
Furthermore, we assume thatx2 also had an moderate effect onx by taking
γ2x2= −0.25. Throughout the simulations, we took a bimodal distribution
with two categories for the latent instrument. To investigate the performance
of the proposed tests we assumed that one potential weak and/or endogenous
instrument is available. Its effectγ2z2on the endogenous regressor is specified
as 0, 0.1, 0.2, 0.3, 0.4, and 0.5, whereγ2z2= 0 represents that the instrument
has no effect onx, and 0.5 indicates that it has a relatively strong effect. The
value for σ 2ν is chosen such that var(x) = 3 in all cases. The correlation
betweenx3i and the total error termui = β3x3i + εi is controlled viaβ3, and
its values are chosen such that the correlation coefficient is equal to 0 (i.e. the
instrument is exogenous), 0.05, 0.10, 0.15, and 0.20. The total variance of
the errorui is fixed to 1 by adjusting the value forσ 2ε . The covarianceσεν is
adjusted such that the correlation between the endogenous regressorxi andui
is equal to 0.5 in all cases. Hence, across the 5× 6 = 30 settings, the bias
in OLS will, on average, be the same, and provides a benchmark with which
the results can compared. We tookn = 1000 and a total of 250 simulated
datasets. In the following we first present the OLS, IV, and LIV results for the
regression coefficientsβ1 andβ2. Subsequently, we discuss the power of the
proposed Wald tests to investigate instrument validity.
4.4.1 Results for the regression parameters
In table 4.1 we present the means and standard deviations of the biases in the
estimated values forβ1 by LIV, OLS, and 2SLS. We estimated the LIV model
by including the observed instrumental variable in the equation fory and the
![Page 102: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/102.jpg)
88 Chapter 4 LIV implementation issues
Table 4.1: Means and standard deviations of the bias in the estimates forβ1.
γ2z2
ρzu 0 0.1 0.2 0.3 0.4 0.5
0 -0.003 -0.003 -0.002 0.000 -0.001 0.002(0.033) (0.034) (0.033) (0.032) (0.029) (0.027)
0.05 -0.003 0.003 0.000 0.000 -0.001 0.000(0.031) (0.033) (0.034) (0.034) (0.030) (0.031)
LIV 0.1 -0.001 0.000 0.002 -0.004 0.002 0.001(0.033) (0.035) (0.030) (0.030) (0.030) (0.027)
0.15 -0.001 -0.004 -0.001 -0.003 0.001 0.003(0.030) (0.033) (0.033) (0.029) (0.032) (0.030)
0.2 0.000 0.001 -0.002 -0.001 -0.001 -0.005(0.031) (0.031) (0.033) (0.030) (0.029) (0.028)
0 0.295 0.295 0.294 0.294 0.296 0.294(0.014) (0.015) (0.015) (0.014) (0.014) (0.014)
0.05 0.295 0.296 0.296 0.294 0.294 0.295(0.015) (0.015) (0.014) (0.014) (0.015) (0.015)
OLS 0.1 0.295 0.293 0.295 0.294 0.295 0.294(0.015) (0.016) (0.015) (0.014) (0.014) (0.014)
0.15 0.294 0.293 0.293 0.294 0.295 0.296(0.014) (0.015) (0.015) (0.013) (0.015) (0.015)
0.2 0.294 0.294 0.295 0.293 0.295 0.294(0.015) (0.015) (0.014) (0.014) (0.015) (0.015)
0 0.242 -0.007 -0.027 -0.011 -0.003 -0.004(0.833) (0.455) (0.182) (0.121) (0.082) (0.070)
0.05 0.058 0.498 0.253 0.158 0.131 0.100(2.444) (0.479) (0.162) (0.097) (0.076) (0.060)
IV 0.1 -0.216 0.966 0.520 0.337 0.245 0.199(6.241) (0.610) (0.152) (0.099) (0.066) (0.053)
0.15 -0.800 1.474 0.814 0.503 0.374 0.305(8.279) (0.907) (0.237) (0.105) (0.070) (0.056)
0.2 1.821 1.848 1.058 0.687 0.500 0.398(10.794) (1.616) (0.294) (0.130) (0.080) (0.058)
![Page 103: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/103.jpg)
4.4 A simulation study 89
equation forx. The 2SLS results are obtained using only the observed instru-
ment.
It can be seen that, in all cases, the average bias of the LIV estimate forβ1
is approximately zero, whereas the bias in OLS is approximately 0.29 (which
is a 15% upward bias). Hence, the general LIV model gives approximately
unbiased results in presence of regressor-error dependencies, regardless of the
quality (and the availability) of the observed instrument. The results for LIV
are slightly more efficient when the observed instrument is stronger (i.e. when
γ2z2is larger) and when the observed instrument has a larger direct effect ony
(i.e. whenρzu is larger). We note that the LIV results forβ1 will be biased, if
the observed instrumentx3 is wrongfully omitted from the main equation.
From the 2SLS results it can be seen that when the instrument is exogenous
(ρzu = 0) and not too weak (e.g.γ2z2> 0.2), the 2SLS estimate forβ1 is
approximately unbiased, where the standard deviations are smaller when the
instrument used is stronger. In all other cases, the 2SLS method yields biased
results and the bias is larger when the correlation betweenz andu is higher,
and when the instrument used is more weak2. As can be seen, in many cases
the results for 2SLS aremorebiased than OLS, an observation that was also
made by Bound, Jaeger, and Baker (1995).
The biases found in the estimates forβ2 show a similar pattern as the results for
β1, and these can be found in appendix 4B. The simulation results presented in
this subsection illustrate the problems associated with classical IV estimation
in presence of weak or endogenous instruments, and indicate that its results
cannot be relied upon without a proper investigation of the validity of the in-
struments used. In all cases, however, the LIV model gives approximately
unbiased results. In the next subsections we present the results of investigating
the validity of the available observed instrumental variable using the Wald test
2In fact, the first two columns for the 2SLS results, that correspond to a situation with aweak instrument, are the median bias and the IQR across the simulation results, since the meanand standard deviation in these situations gave extreme results because of the presence of manylarge outliers.
![Page 104: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/104.jpg)
90 Chapter 4 LIV implementation issues
approach proposed in section 4.3.
4.4.2 Results testH0 : observed IV is exogenous
Table 4.2: Power of test for instrument exogeneity (H0 : ρzu = 0).
γ2z2
ρzu 0 0.1 0.2 0.3 0.4 0.5
0 0.44 0.49 0.48 0.52 0.50 0.560.05 0.76 0.87 0.81 0.81 0.82 0.76
α = 0.5 0.1 1.00 1.00 1.00 1.00 0.99 0.990.15 1.00 1.00 1.00 1.00 1.00 1.000.2 1.00 1.00 1.00 1.00 1.00 1.00
0 0.05 0.04 0.03 0.06 0.06 0.070.05 0.32 0.35 0.34 0.27 0.34 0.30
α = 0.05 0.1 0.84 0.88 0.90 0.86 0.82 0.840.15 0.99 1.00 1.00 0.99 1.00 0.990.2 1.00 1.00 1.00 1.00 1.00 1.00
0 0.02 0.01 0.01 0.02 0.02 0.010.05 0.12 0.17 0.18 0.13 0.15 0.14
α = 0.01 0.1 0.71 0.65 0.73 0.69 0.59 0.630.15 0.98 0.99 0.99 0.97 0.98 0.960.2 1.00 1.00 1.00 1.00 1.00 1.00
Table 4.2 gives the fractions of rejections of the null hypothesis of instrument
exogeneity, as computed by the proposed Wald test3. It can be seen that, under
H0 : ρzu = 0, the size of the test is fairly close to the true sizesα = 0.50,0.05,
and 0.01. Forα = 0.5 and 0.05 the test underestimates the true size slightly
when the instrument is weak (γ2z2< 0.3) and has the tendency to be too con-
servative when the instrument is stronger (γ2z2≥ 0.3). If the null hypothesis is
not rejected, the LIV model can be re-estimated with the observed instrumen-
tal variable excluded from the main equation, but this is not necessary. The
probability to detect an endogenous instrument increases whenρzu gets larger
3Based on the Hessian matrix.
![Page 105: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/105.jpg)
4.4 A simulation study 91
and the power to detectρzu > 0 is not very much affected by the strength of
the instrument, although it is slightly lower for an instrument that has no effect
on x, and for the strongest instrument. When the observed instrument in the
LIV model is relatively stronger compared to the unobserved discrete instru-
ment, the group structure gets more contaminated. This may have a negative
effect on the precision of the estimates and, therefore, may reduce the power
of the test. Nevertheless, in any case the proposed test has a reasonable power
to detect instrument endogeneity.
The bias in the 2SLS estimates that arises from using endogenous instruments,
illustrates the importance of examining instrument exogeneity. However, the
2SLS model is exactly identified in our case, and it is not possible to test for
instrument exogeneity within the classical IV framework. As was shown, it is
straightforward to use the LIV model for this purpose, which is an important
extension of LIV. In the next subsection we present the results for testing for
the presence of an instrument that does not explain any of the variance of the
endogenous regressor (the instrument is weak).
4.4.3 Results testH0 : observed IV has no effect onx
In table 4.3 we present the results for testing4 for the presence of an instrument
that has no effect on the endogenous regressorx. We find for an exogenous
instrument (i.e.ρzu = 0), that has no effect onx (i.e. γ2z2= 0), that the
fractions of rejections of the proposed test are close to the true sizesα = 0.5,
0.05, and 0.01. When the instrument is endogenous (i.e.ρzu > 0), the test
still performs well underH0, but gives results that are sometimes larger and
sometimes smaller than the true sizes of the test. Furthermore, the test has a
large power in detecting departures fromH0 : γ2z2= 0, regardless of whether
the instrument used is exogenous or not.
The advantage of the LIV approach in examining the weakness of observed
instrumental variables is that identifiability of the LIV model does not depend
4Based on the Hessian matrix.
![Page 106: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/106.jpg)
92 Chapter 4 LIV implementation issues
Table 4.3: Power of test zero-effect instrument (H0 : γ2z2 = 0).
γ2z2
ρzu 0 0.1 0.2 0.3 0.4 0.5
0 0.52 0.94 1.00 1.00 1.00 1.000.05 0.52 0.92 1.00 1.00 1.00 1.00
α = 0.5 0.1 0.46 0.94 1.00 1.00 1.00 1.000.15 0.56 0.93 1.00 1.00 1.00 1.000.2 0.52 0.94 1.00 1.00 1.00 1.00
0 0.05 0.60 1.00 1.00 1.00 1.000.05 0.08 0.53 0.99 1.00 1.00 1.00
α = 0.05 0.1 0.04 0.57 1.00 1.00 1.00 1.000.15 0.05 0.62 0.98 1.00 1.00 1.000.2 0.07 0.59 0.99 1.00 1.00 1.00
0 0.01 0.35 0.97 1.00 1.00 1.000.05 0.02 0.33 0.97 1.00 1.00 1.00
α = 0.01 0.1 0.01 0.32 0.97 1.00 1.00 1.000.15 0.00 0.38 0.96 1.00 1.00 1.000.2 0.00 0.34 0.96 1.00 1.00 1.00
on the strength of the instruments. A test to investigate whether the instru-
ments explain part of the variance ofx is not readily available in the classical
IV framework, due to lack of identifiability. Instead, instrument weakness is
usually examined via theR2 of the first stage regression ofx on the set of
instrumental variables. We note, however, that theR2 also reflects the effect
of the exogenous regressorsx2 on x, sincex2 is typically included in the set
of instruments. Hence, the researcher may conclude that the instruments are
strong enough, while, in fact, most of the contribution to theR2 is from the
exogenous regressors and not from the instruments.
It can be seen from the simulation studies that the proposed test has a fairly
strong power in detecting departures fromH0 : γ2z2= 0. This is, however,
not necessarily an advantage in this case, since a rejection of the the null-
![Page 107: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/107.jpg)
4.4 A simulation study 93
hypothesis suggests that the instrument used has a nonzero effect onx, and the
researcher may be tempted to use it in a classical IV regression. However, it
was seen from table 4.1 that the 2SLS results forγ2z2= 0.1 and 0.2 still suffer
from a possible bias and, importantly, large standard deviations. Although
the instrument has a nonzero effect, it is considerable weak and results from
using such an instrument should be interpreted with extreme caution. Hence,
in applying this test, we are more willing to accept a false acceptance ofH0
rather than a false rejection. We therefore recommend to use this test only with
conservative significance levels (i.e.α ≤ 0.01), in particular for large sample
sizes, and in combination with traditional measures of instrument weakness.
4.4.4 Concluding remarks simulation study
The simulation study illustrates the potential usefulness of the LIV model in
estimating the regression parameters of a linear model in presence of an en-
dogenous regressor and several exogenous regressors. Furthermore, we intro-
duce two new tests to examine instrument validity. We presented simulation
results for a situation where the OLS estimates for the regression coefficients
are biased and IV or LIV estimation is desirable.
We showed that the proposed tests have a reasonable power in detecting ‘in-
valid’ instruments for a wide variety of settings. However, the results of the
test to investigate whether the instrument has a zero-effect on the endogenous
regressor suggest to take a more conservative choice forα. Nevertheless, the
simulation study illustrates that the LIV model can be used successfully to ex-
amine the validity of a set of observed instrumental variables. Importantly, the
results clearly point out the problems with classical IV estimation when the
instrument used is endogenous or weak, and in particular when it is both weak
and endogenous. The bias that arises in the 2SLS estimates from using invalid
instruments can be larger than the bias in OLS. Hence, in absence of good
quality instrumental variables, 2SLS may actually do more harm than good,
in which case it is better to simply ignore the endogeneity ofx and use the
biased OLS estimates (see also the remarks of Shugan, 2004). The simulation
study presented here, however, illustrates that the LIV model can be used suc-
![Page 108: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/108.jpg)
94 Chapter 4 LIV implementation issues
cessfully to estimate the regression parameters, because it does not require the
availability of observed instruments.
We also examined the performance of the LIV model and the two tests in ab-
sence of regressor-error dependencies, in which case OLS is the best linear
unbiased estimator. As in the previous chapter, we found that the LIV results
exhibit more variability across the simulation results than for a situation where
x is endogenous. The performance of the tests to examine instrument valid-
ity does not differ substantially from the results reported in tables 4.2 and 4.3,
but the power to detect a non-zero effect of the instrument onx is slightly
lower whenγ2z2= 0.1, than the results reported in table 4.3. Importantly,
we find that the 2SLS estimates for the regression parameters are biased and
have large standard deviations when invalid instruments are used, even though
no regressor-error dependencies are present and OLS yields unbiased results.
Obviously, it is important to test for regressor-error dependencies to decide
whether OLS can be used, but it becomes clear from these results that a test
based on classical IV estimates, while having invalid instruments, likely leads
to false conclusions. In section 4.6 we discuss testing for regressor-error de-
pendencies using the model in (4.1) without relying on observed instruments.
In the following we present several diagnostic tools that can be used to com-
plete a LIV analysis when using empirical data.
4.5 LIV model diagnostics
4.5.1 Selection of the number of categories of the discrete instru-ment
In empirical studies, it has to be decided how many categories of the discrete
instruments are needed, i.e. how largem should be. From classical IV estima-
tion it is know that the number of instruments should not be too large, since the
finite sample bias is a function of the number of instruments (see also chap-
ter 2). Besides, a large set of instruments reduces the degrees of freedom and
the first–stage regression (X on Z) can overfit the data (Buse, 1992, Bound,
Jaeger and Baker, 1995, Bowden and Turkington, 1984). Standard model se-
![Page 109: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/109.jpg)
4.5 LIV model diagnostics 95
lection methods, like AIC, CAIC, or BIC are often found to overestimate the
number of groups. Furthermore, they are ad hoc and their performance may
depend on the usage context (Naik, Shi and Tsai, 2003, Andrews and Currim,
2003, or Biernacki, Celeux and Govaert, 2000). Naik, Shi and Tsai (2003)
argue that information criteria like AIC are designed for selecting regressors,
but not groups. Indirect support is provided by a recent study of Andrews
and Currim (2003) who note that AIC3 performs well in finding groups in fi-
nite mixture regression models. The integrated classification likelihood (ICL)
criterion (Biernacki, Celeux and Govaert, 2000) has also been shown to be
suitable for selecting the number of components in mixture models. Since our
aim is to select the number of categories for the discrete instruments, i.e. the
number of groups representing the endogenous regressor best, and given the
importance of not overestimating the number of components, we prefer the
ICL criterion, which is more conservative than the other statistics.
The ICL criterion is a modification of BIC. Biernacki, Celeux and Govaert
(2000) find that BIC often overestimates the true number of clusters in mix-
ture models. They suggest to choose the model that maximizes the complete
integrated maximum likelihood and show that the resulting ICL criterion is
essentially the BIC statistic penalized by the subtraction of the mean entropy
−2∑
i
∑j zi j log pi j , where pi j are the posterior probabilities that observa-
tion i comes from categoryj andzi j = 1 wheneverpi j = maxj pi j , and zero
otherwise. It follows that if the categories are not well separated, this term
has a large value and BIC is penalized more severely. If the groups found by
the LIV model are not well separated, it resembles a situation in classical IV
where the instruments are weak. Furthermore, overfitting in terms of the num-
ber of groups results in using a too large number of instruments which is not
preferred, since degrees of freedom are lost which reduces efficiency. Further
support for this strategy is provided in chapter 3 where it is shown that the LIV
results are relatively insensitive to an under-specification of the true number of
instruments.
Although the ICL criterion can be used in determining the number of instru-
![Page 110: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/110.jpg)
96 Chapter 4 LIV implementation issues
ments, we emphasize that several choices ofm should be examined. When the
estimated regression coefficients are substantially different for these choices
of m, or when the diagnostic measures point towards solutions that lead to
substantial different conclusions, the final results have to be interpreted with
caution. In this case, it may prove useful to consider different model speci-
fications by changing the set of exogenous regressors. However, in general,
we advocate against over-fitting by using the extra penalization term proposed
by Biernacki, Celeux and Govaert (2000) that adjusts the BIC statistic more
severely when the instruments yield posterior groupings that are fuzzy.
4.5.2 Residuals, outliers, and influential observations
In this section we extend several diagnostics, originally proposed for the clas-
sical regression model (Fox, 1991, Belsley, Kuh and Welsch, 1980, Cook and
Weisberg, 1982) to the LIV case. Outliers and influential observations can be
problematic because they may influence estimation results, and their presence
in large numbers may point out that the used model failed to capture impor-
tant aspects of the data. Analyzing residuals can reveal important information
for assessing model assumptions. Although maximum-likelihood is still ap-
proximately valid in all but small samples, highly non-normal residuals, in
particular, skewed and bimodal ones, are to be distrusted.
Strength of the instruments. In applying classical IV estimation it is recom-
mended to report theR2 or F-statistic from the regression of the endogenous
regressors on the instrumental variables (Bound, Jaeger and Baker, 1995). In
section 4.4 we saw that when the instruments explain only a small part of the
variation of the endogenous regressors, the instruments are weak and using the
IV results in this case is not recommendable. Instruments can be computed as
a byproduct of the LIV results by computing a posteriori category membership
using Bayes theorem. In addition to the tests proposed in section 4.3, these es-
timates can be ‘treated’ as observed instruments and used to compute theR2 as
a diagnostic. ThisR2 can, if observed instruments are available, be compared
to theR2 from classical IV.
![Page 111: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/111.jpg)
4.5 LIV model diagnostics 97
Analyzing residuals. In most empirical work it is reasonable to assume bivari-
ate normality of the residuals. If some other distribution generated the data, the
MLE based on joint normality might lose its appeal. The model residuals from
(4.1), however, can be examined to investigate the normality assumption of the
disturbances. Furthermore, they can be used to detect potential outliers and
to examine heteroscedasticity. Defining the LIV residuals, however, is not a
straightforward matter, since the complete model is a bivariate mixture model
(see also Wang et al., 1996). We look at two type of residuals: the conditional
residuals, and IV-type residuals.
Conditional residuals.One way to examine residuals is to look at the condi-
tional distribution ofy given x in the LIV model (for the sake of notational
simplicity we omit other exogenous regressors here). Conditional on category
j and assuming normality, we have5: (y|x, j ) ∼ N(µy|x, j , σ
2y|x, j
), where the
conditional mean ofy|x, j is
E (yi |xi , j ) = (β0−σεν
σ 2ν
π j )+ (β1+σεν
σ 2ν
)xi (4.13)
= β0+ β1xi +σεν
σ 2ν
(xi − π j ), (4.14)
with xi − π j = νi , and var(y|x, j ) = σ 2ε − σ2
εν
σ2ν
.
This conditional distribution yields two valuable insights. Firstly, the resid-
ual ei = yi − yi , with yi = µyi |xi , j, is equal toei = (β0 − β0) + (β1 −
β1)xi − (σεν/σ 2ν )νi + εi , with νi = xi − xi = (π j − π j )+ νi . Because of the
presence ofνi , normality ofεi cannot be examined via this type of residuals.
Secondly, the above conditional mean has an interesting interpretation. If we
consider the OLS modely = α0 + α1x + ε, whereσxε 6= 0, the mean ofx
is µx and the variance isσ 2x , then the probability limit of the OLS estimator
for α0 is α0 − (σxε/σ2x )µx and forα1 equal toα1 + σxε/σ
2x , which resembles
5Use the result:(v,w) ∼ N2(µv, µw, σ2v , σ
2w, ρ), with σvw = ρσvσw. Then, fv(v) =
N(µv, σ2v ), fw(w) = N(µw, σ2
w), and f (v|w) = N(α0 + α1w, σv(1 − ρ2)), with α0 =µv − α1µw, α1 = σvw/σ2
w.
![Page 112: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/112.jpg)
98 Chapter 4 LIV implementation issues
(4.13) in form. This shows that treatingx as given and estimating the mean
in (4.13) with least squares, yields inconsistent results forβ0 andβ1, unless
σxε = 0. From the simulation studies we know that the LIV model yields
consistent results in both cases (similar observations can be made when con-
sidering the unconditional mean (with respect toj ) E (yi |xi )). Since a priori it
is not know to which category/group individuali belongs, we use the a poste-
riori group probabilitiespi j , and compute the residuals asei = yi − yi , where
yi =∑m
j=1 pi j µyi |xi , j.
IV-type residuals.In classical IV estimation with observed instruments, the
residual is estimated asy − Xβ and noty − Xβ, see e.g. Greene (2000)
or Pagan (1984). Using the latter residuals results in an incorrect estimate
of the standard errors. Applying this result to LIV givesei = yi − yi =(β0− β0)+ (β1− β1)xi + εi , whereyi = β0+ β1xi and the estimated values
are the maximum likelihood estimates from LIV. Note that there is no ‘direct’
effect of νi . Furthermore, there is no need in using the a posteriori group
memberships. Unfortunately, we found this type of residual to be misleading
in detecting heteroscedasticity6.
In table 4.4 we examine the results of robustness of LIV against misspecify-
ing the error-distribution (see also Honore and Hu, 2004). Furthermore, we
investigate whether the above proposed residuals can effectively be used to
detect departures of normality. We used three different specifications for the
distribution ofεi : (1) a normal distribution, (2) aχ21 distribution, and (3) at3
distribution7, all normalized to have mean 0 and variance 1. The error term
for the endogenous regressorx was computed asνi = aεi + bui , whereui is
from a normal distribution with mean 0 and variance 1, anda andb are cho-
6The problem in using this residual can be described as follows. If, for instance,x is posi-tively correlated withε, the OLS line will biased upward (too steep). Since LIV estimates onaverage the correct value, the estimated LIV line will be more flat. Consequently, in this case,for the larger values ofx, the differencey− y will be more positive, and for the smaller valuesmore negative, i.e. there is a positive correlation observed betweenx ande.
7The t3 distribution resulted in approximately 5% of the generated datasets in an very ex-treme observation which would cause numerical under(over) flows. These observations werediscarded.
![Page 113: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/113.jpg)
4.5 LIV model diagnostics 99
Table 4.4: Effect of misspecifying the regressor-error distribution on biases and stan-dard errors.
ρxε = 0 ρxε = 0.5
β1 σ 2ε σεν β1 σ 2
ε σεν
N OLS 0.001 -0.005 0.345 0.299(0.020) (0.042) (0.013) (0.030)
LIV -0.003 -0.004 0.008 0.002 0.001 0.003(0.044) (0.043) (0.092) (0.025) (0.064) (0.011)
χ2 OLS 0.000 0.007 0.346 0.304(0.020) (0.122) (0.030) (0.059)
LIV 0.000 0.007 0.000 -0.004 0.010 0.006(0.041) (0.122) (0.095) (0.027) (0.123) (0.017)
St. t OLS -0.002 0.033 0.342 0.306(0.018) (0.132) (0.039) (0.083)
LIV 0.001 0.032 -0.002 0.001 0.008 0.006(0.046) (0.132) (0.106) (0.025) (0.177) (0.020)
sen8 such that that variance ofνi is 1 and the correlation betweenx andεi is
0, 0.1, 0.2, 0.3, 0.4 and 0.5. We assumed the presence of two other exogenous
regressors. The error termεi accounts for approximately 25% of the variance
in y. We take a moderate sized sample (n = 1000) and use 250 Monte Carlo
simulations. The results in table 4.4 are the mean biases (i.e. true value minus
estimated value) and standard deviations of the biases across the simulations
for β1, σ2ε , andσεν . It can be seen that the LIV results for the regression pa-
rameter are fairly insensitive9 to the misspecification of the error distribution
for εi . The variance componentsσ 2ε andσεν are also estimated unbiasedly, but
at a cost of lower efficiency. We presented the results for the OLS estimator for
comparison and it can be seen that in particular for high degrees of regressor-
8The Cholesky decomposition can be used for that purpose.9We did not consider the potential multi-modality of the log-likelihood function due to mis-
specification of the error distribution and the effect of different starting values for the numericaloptimization routine here, but note that this may be an issue in case of heavily skewed, ‘fat’, orbimodal errors.
![Page 114: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/114.jpg)
100 Chapter 4 LIV implementation issues
error dependencies, the standard deviations for the OLS estimator are much
larger than in the normal case. In table 4.5 we illustrate the effect of misspeci-
fication of the error-distribution on the Hausman-LIV test on a 5%-level. It can
be seen that underH0 the test is too conservative for the fat-tailedt-distribution
and the skewedχ21 distribution, but the power to detect a nonzeroρx,ε is not
much lower than for the normal distribution. In all cases we investigated the
skeweness, the kurtosis, and QQ plots of the ‘IV-type’ residuals, which clearly
indicated that the disturbances were fat-tailed or skewed. This investigation
can be accompanied by the test suggested in Greene (2000).
Table 4.5: Effect of misspecifying the error distribution on power of the Hausman-LIV test onα = 0.05.
Distribution
ρx,ε N χ2 t
0 0.05 0.09 0.090.1 0.46 0.51 0.430.2 0.99 0.96 0.920.3 1.00 1.00 1.000.4 1.00 1.00 1.000.5 1.00 1.00 1.00
In summary, this simulation study shows that the LIV method is fairly ro-
bust against misspecification of the error-term, at least in large samples. In
such a case, the maximum likelihood LIV estimator may not be fully efficient
anymore and alternative more efficient estimators may exists (see e.g. the dis-
cussion in Honore and Hu, 2004). One possible caveat of the LIV model in
presence of a severely misspecified error distribution, such as for theχ21 dis-
tribution, is that the log-likelihood may be multimodal. In such a case, the
starting values for numerically optimizing the log-likelihood equation become
important, since the LIV model may wrongfully mix on the skewed error dis-
tribution instead of on the latent instrument. Here we do not attempt to solve
that question, and leave it for further research, since lack of normality can be
![Page 115: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/115.jpg)
4.5 LIV model diagnostics 101
effectively examined for by using the IV-type residuals to compute kurtosis and
skewness. Furthermore, heteroscedasticity can be detected by using the con-
ditional residuals from (4.13), i.e. examine scatterplots of the residuals versus
explanatory variables and the predicted values. Presence of heteroscedastic-
ity does not affect consistency or unbiasedness of the regression parameters,
but leads to a loss in efficiency. To correct the estimated standard errors for
heteroscedasticity, White’s (1980) method can be used. A more detailed dis-
cussion of potential strategies to remedy heteroscedasticity or non-normality
can be found in Fox (1991). Finally, outliers can be identified by examining
standardized versions of the above residuals, or using the methods presented
in the following.
Analyzing influential observations and outliersSince standard closed form
expressions of outlier diagnostics available for OLS cannot be generalized for
LIV, we propose to approximate the Jacknife LIV estimateθ (i ) by a few nu-
merical optimization steps with the maximum likelihood estimate of the com-
plete sample as starting value (Cook and Weisberg, 1982, Belsley, Kuh and
Welsch, 1980, or Fahrmeir and Tutz, 1994). Once these estimates are avail-
able, we propose to use the following measures to determine the influence of
observationi on:
1. the likelihood, measured by the likelihood distance LD(i ) = 2[LL(θ)−LL(θ(i ))], where LL(θ) denotes the value of the log-likelihood for the
complete sample in pointθ .
2. the estimated parameters, measured by Cook’s distance CD(i ) = (θ −θ (i ))′H(θ)(θ − θ (i )), whereH(θ) is the Hessian evaluated atθ .
3. the estimated covariance matrix, which is measured by COVRATIO1(i ) =det[V(θ(i ))]/det[V(θ)], whereV(θ) denotes the estimated variance co-
variance matrix forθ .
4. the estimated covariance matrix of(ε, ν), given by COVRATIO2(i ) =det[�(i )]/det[�], where� is given in (3.4). Because� is essential in
![Page 116: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/116.jpg)
102 Chapter 4 LIV implementation issues
correcting for endogeneity, this measure may point towards observations
having a large effect on the relation betweenε andν.
Our experience based on simulation studies10 is that the four measures men-
tioned above, together with an examination of the residuals, can be fruitfully
applied to detect outliers and influential observations. We propose examining
the ranking of the largest values of LD(i ) or CD(i ), and of|COVRATIO1(i )−1| and|COVRATIO2(i )−1|, where large jumps between subsequent observa-
tions indicate potential influential or outlying observations.
The next section introduces another test to test for regressor-error disturbances
that may give more stable results that the earlier proposed Hausman-LIV test.
4.6 The Hausman-LIV test revised
The Hausman-LIV test proposed in the previous chapter was shown to have
a reasonable power across a wide range of regressor-error correlations and for
several distributions of the endogenous regressor. The Hausman-LIV test com-
pares the difference between the LIV estimate and the OLS estimate and this
difference is used to construct a test-statistic which has aχ21 distribution under
the null hypothesis. When the difference is substantial enough to be impor-
tant, the probability that the test rejects should be large otherwise the test does
not provide much information. This test uses the complete vector of estimated
regression coefficientsβ = (β0, β1, β′2)′ for OLS and LIV and the correspond-
ing estimated variance-covariance matrices. However, a more simple test that
focusses only on the covariance betweenεi andνi can be constructed. A test
of no regressor-error correlation is equivalent to testingH0 : σεν = 0. This
hypothesis is a linear restriction on the parameter vector and can be tested for
by using the same Wald test11 as in section 4.3. The potential advantage of this
test is that it relies on fewer parameters.
10Not reported here.11In fact, the Hausman test is also based on a Wald statistic, see e.g. Greene (2000).
![Page 117: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/117.jpg)
4.6 The Hausman-LIV test revised 103
We found that the Hausman-LIV statistic may give negative values when other
exogenous regressorsx2 are present, because one or more eigenvalues of the
difference matrix6LIV in (3.26) were negative. In such a case, it is falsely
concluded that LIV is ‘more efficient’ than OLS. We did not find, however,
that the estimated standard deviation ofβ1 is smaller (i.e. more efficient) for
LIV than for OLS, but the estimated covariances betweenβ0, β1, andβ2 may
be larger, and the estimated standard deviations ofβ0 andβ2 may be smaller
using LIV. In these cases, the difference matrix6LIV is indefinite, and the re-
sulting Hausman-LIV statistic is not necessarily larger than or equal to zero12.
It is not uncommon to find a non-positive definite matrix when applying the
Hausman-test within a classical IV framework. In fact, unless the set of regres-
sors and the set of instrumental variables have no variables in common, the
ordinary inverse of the estimated asymptotic covariance matrix in the Haus-
man test will not exist. In other situations, this singularity may be a finite
sample problem (cf. Greene, 2000). In the classical IV framework one nor-
mally uses6aIV = [(X′PZ X)−1 − (X′X)−1]−1/s2 to compute the Hausman
statistic, wheres2 is the common estimator forσ 2ε based either on IV or OLS.
Another approach is to subtract the estimated variance covariance matrices for
IV and OLS, i.e.6bIV = [s2
IV(X′PZ X)−1− s2
OLS(X′X)−1]−1, which is, according
to Dhrymes (2003), a ‘naive’ application of the Hausman test. However, the
Hausman-LIV test uses the latter approach (see section 3.4), which may ex-
plain, given Dhrymes’ conclusion, part of the problems we observe. Besides,
the explicit presence of the exogenous regressorsx2 in the equation forx may
induce a higher estimated covariance between the estimated values forβ1 and
β2. The fact that some of the estimated standard deviations are smaller than the
corresponding OLS estimates is because the LIV model includes a model for
the endogenous regressor which may imply an efficiency advantage for some
of the estimated regression parameters of the exogenous regressors.
In table 4.6 we compare the performance of the two tests. We use the same
skewed, bimodal, and unimodal distribution as in the previous chapter, where
12E.g. Greene (2000) p.46-p.49.
![Page 118: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/118.jpg)
104 Chapter 4 LIV implementation issues
Table 4.6: Power of the Hausman test (H) and Wald test (W) for exogeneity.
α = 0.5 0.05 0.01
ρx,ε H W H W H W
bim8 0 0.45 0.45 0.05 0.05 0.02 0.020.1 0.91 0.92 0.51 0.54 0.31 0.320.2 1.00 1.00 0.98 0.98 0.92 0.920.3 1.00 1.00 1.00 1.00 1.00 1.000.4 1.00 1.00 1.00 1.00 1.00 1.000.5 1.00 1.00 1.00 1.00 1.00 1.00
skew8 0 0.49 0.49 0.04 0.04 0.01 0.010.1 0.98 0.98 0.68 0.69 0.47 0.510.2 1.00 1.00 1.00 1.00 1.00 1.000.3 1.00 1.00 1.00 1.00 1.00 1.000.4 1.00 1.00 1.00 1.00 1.00 1.000.5 1.00 1.00 1.00 1.00 1.00 1.00
unim8 0 0.55 0.55 0.06 0.08 0.03 0.040.1 0.64 0.65 0.18 0.21 0.09 0.110.2 0.93 0.93 0.55 0.57 0.30 0.380.3 1.00 1.00 0.90 0.92 0.78 0.820.4 1.00 1.00 0.99 1.00 0.96 0.990.5 1.00 1.00 1.00 1.00 1.00 1.00
the true number of categories of the instrument is eight. We include two ex-
ogenous regressors, where one has a moderate effect on the endogenous re-
gressor and the other has no effect. The mean ofx is zero, the total variance
of x is equal to 3, and the regressor-error correlations are taken to be 0, 0.1,
0.2, 0.3, 0.4, 0.5. The LIV model is estimated with two categories. We used
n = 1000 and 250 Monte Carlo simulations. In table 4.6 we present the frac-
tion of rejections of the null hypothesis across the simulation runs. It can be
seen that both tests perform about equally well, but the Wald based test has
a slightly larger power to detect endogeneity ofx. Both tests are close to
their true sizes under the null hypothesis and are more conservative for the
unimodal distribution. The same observation was also made in the previous
![Page 119: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/119.jpg)
4.7 Conclusions 105
chapter and can be explained by the efficiency loss due to misspecifying the
true number of categories of the latent instrument and the less well separated
mixture components in the unimodal distribution. The power of both tests is
higher than the results reported in table 3.1 for bim8, skew8, and unim8, be-
cause of the presence of two exogenous regressors to explain the variance in
y. We found that the Hausman-LIV test performs better when the estimated
variance-covariance matrix ofβ is based on the Hessian matrix. Furthermore,
for lower regressor-error correlations, the difference matrix6LIV is more of-
ten non-positive definite. The Hausman-LIV test statistic is not necessarily
negative, and the results suggest that the Hausman-LIV test has nevertheless
a reasonable power and a size close to its true value. However, the proposed
alternative test, based on Wald’s principle, gives results that are at least as good
as the results for the Hausman-LIV test, but does not have the problem of po-
tential ambiguous results. These results suggest to use the Wald test statistic
in empirical applications, but we recommend to compute both and to compare
their results13.
4.7 Conclusions
The simple LIV model proposed in chapter 3 adequately solves for regressor-
error correlations in linear models without requiring observed instrumental
variables. This is a great advantage over existing methods that rely on observed
instrumental variables, in particular in empirical applications where good in-
struments are not available. In this chapter we propose several methods to
extend the simple LIV model.
Firstly, we include several exogenous regressors and observed instrumental
variables in the model. This is an important generalization since in most em-
pirical applications additional regressors are part of the model. We extend the
identifiability proof in subsection 3.3.1 and we prove that all model parameters
13In subsection 8.2.2 we elaborate upon another test (the Lagrange multiplier test) to test forendogeneity without requiring observed instruments, that can potentially be computed straight-forwardly using standard computing packages, because it operates under the restricted model(σεν = 0). This test is asymptotically equivalent to the tests presented here (Greene, 2000).
![Page 120: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/120.jpg)
106 Chapter 4 LIV implementation issues
are identified for the more general LIV model. Furthermore, observed instru-
mental variables can be included in the LIV model, and, importantly, Wald-
based test results are easily obtained to investigate their validness. To the best
of our knowledge, this is not straightforwardly possible in a classical IV frame-
work because of identifiability problems. The proposed Wald-based tests have
a reasonable power in detecting weak and/or endogenous instruments. Our re-
sults suggest to use a conservative significance level for the Wald-test that tests
for a zero effect of the instruments on the endogenous regressor. Instruments
that are found to be ‘valid’ can be used either stand alone, to compute a 2SLS
or LIML estimate, or they can be included in the LIV model and used jointly
with the latent discrete instrument to obtain an LIV-IV estimate. If the ob-
served instruments are ‘invalid’, the classical instrumental variables estimates
have to be distrusted and the LIV estimates are the most appropriate to use.
In section 4.5 we propose several diagnostics that can be used to complete a
LIV analysis. In empirical applications, the number of categories of the un-
observed instrument has to be chosen. The AIC3, BIC or ICL criteria can be
used for this purpose. The ICL criterion is equal to BIC, with an additional
penalty for fuzzy clustering. As such, it tends to be more conservative, which
avoids overfitting and adds tractability to our specification. We emphasize that
several specifications ofm should be examined, where substantial different
conclusion based on AIC3, BIC, and ICL are to be distrusted. The residuals
can be analyzed to investigate the normality assumption of the errors. When
the errors have a non-normal distribution, the LIV estimates may not be fully
efficient. The results of the simulation study in subsection 4.5.2 illustrate that
the LIV model is fairly robust against fat-tailed and/or skewed error distribu-
tions, which is a desirable property. However, furhter research is required to
investigate this in more detail, in particular the presence of severely skewed,
‘fat’, or multimodal distributed errors and the possible multimodality of the
LIV log-likelihood surface in such cases. Heteroscedasticity and outliers or
influential observations can be dealt with in a similar way, and the methods we
propose were found to be effective in identifying heteroscedasticity and outly-
ing or influential observations. Finally, in section 4.6 we discuss another test to
![Page 121: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/121.jpg)
4.7 Conclusions 107
test for regressor-error correlations, because the Hausman-LIV test was found
to give possible ambiguous results in presence of several exogenous regres-
sors. This new test is a Wald test and focusses on the covariance between the
error terms. We recommend to compute both tests in empirical applications
and similar conclusions should inspire confidence in the results found.
In the next chapter we apply the LIV method to estimate the return to edu-
cation for three empirical applications. We show that reasonable estimates are
obtained without using observed instruments, that are to be preferred over stan-
dard OLS and classical IV estimates for the return to education. The size and
magnitude of the bias found across the three applications in the OLS estimate,
as indicated by the IV estimate, depends on the type of instruments used. Our
results are more consistent, they are in accordance with the traditional ability
bias argument, and they resemble more closely recent results on twin studies.
Furthermore, the LIV diagnostics proposed in this chapter indicate that the
LIV model assumptions are reasonable.
![Page 122: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/122.jpg)
108 Chapter 4 LIV implementation issues
Appendix 4A 1st and2nd order derivatives log-likelihoodof the general LIV model
The 1st and 2nd order derivatives of the simple LIV model presented in subsection 3.3.2can be extended easily to a more general situation wherel1 exogenous regressorsx2i
andl2− l1 observed instrumental variablesx3i are available.
Let z2i = (x′2i , x′3i )′. The expected value of(yi , xi ) in (3B.2) now becomes
µyi | j = β0+ β1π j + x′2iβ2+ z′2i γ2β1
µxi | j = π j + z′2i γ2, (4A.1)
whereβ2 is an l1 × 1 vector,γ2 is an l2 × 1 vector,x2i is an l1 × 1 vector, andz2i
is an l2 × 1 vector. The structure of the first- and second-order derivatives does notchange by the inclusion ofx2i and x3i , but (4A.1) should be used to compute thegradient and hessian, instead of (3B.2). In addition, the results in appendix 3B need tobe augmented with the derivatives of the log-likelihood with respect to the elementsof β2 andγ2, that both have a similar structure as the elements inθ1. Furthermore,the derivative ofqi | j with respect toβ1 changes due to the presence of the product ofz′2i γ2 andβ1 in µy
i | j . Lettingθ1 = (β0, β1, β′2, γ′2)′, we obtain
∂qi | j∂β1= −2σ 2
ν (yi − µyi | j )(π j + z2i γ2)+ 2(xi − µx
i | j )2(σ 2
ν β1+ σεν)+
− 2(xi − µxi | j ){−(β1σ
2ν + σεν)(π j + z2i γ2)+ (yi − µy
i | j )σ2ν } (4A.2)
∂qi | j∂β2l
= x2i l∂qi | j∂β0
(4A.3)
∂qi | j∂γ2k
= z2ik∂qi | j∂π j
, (4A.4)
for l = 1, ..., l1 andk = 1, ..., l2. Similar calculations can be done for the secondorder derivatives.
![Page 123: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/123.jpg)
Appendix 4B Simulation results for the exogenous regressor 109
Appendix 4B Simulation results for the exogenous re-gressor
Table 4B.1: Means and standard deviations of the bias in the estimates forβ2.
γ2z2
ρzu 0 0.1 0.2 0.3 0.4 0.5
0 0.000 -0.002 -0.002 -0.001 0.000 0.002(0.036) (0.033) (0.035) (0.031) (0.032) (0.035)
0.05 -0.003 0.003 0.001 0.003 0.001 0.003(0.035) (0.034) (0.032) (0.032) (0.033) (0.032)
LIV 0.1 -0.001 -0.003 0.000 -0.001 0.000 0.000(0.033) (0.031) (0.032) (0.033) (0.035) (0.033)
0.15 0.000 -0.004 0.000 -0.002 -0.003 0.002(0.033) (0.035) (0.032) (0.035) (0.034) (0.031)
0.2 0.001 0.001 0.000 0.000 -0.001 -0.004(0.031) (0.030) (0.031) (0.031) (0.032) (0.031)
0 0.077 0.072 0.074 0.073 0.074 0.076(0.030) (0.028) (0.028) (0.027) (0.027) (0.028)
0.05 0.070 -0.076 0.074 0.076 0.074 0.076(0.030) -(0.028) (0.029) (0.028) (0.029) (0.027)
OLS 0.1 0.073 0.069 0.073 0.073 0.074 0.074(0.027) (0.026) (0.027) (0.027) (0.030) (0.028)
0.15 0.074 0.072 0.072 0.072 0.071 0.074(0.028) (0.028) (0.028) (0.030) (0.029) (0.028)
0.2 0.074 0.075 0.074 0.073 0.072 0.071(0.027) (0.026) (0.027) (0.028) (0.028) (0.025)
0 0.069 -0.005 -0.008 -0.003 0.000 0.000(0.223) (0.119) (0.062) (0.045) (0.039) (0.039)
0.05 0.024 0.130 0.063 0.042 0.033 0.027(0.571) (0.131) (0.048) (0.039) (0.035) (0.033)
IV 0.1 -0.013 0.230 0.129 0.085 0.061 0.050(1.490) (0.177) (0.051) (0.037) (0.035) (0.032)
0.15 -0.167 0.357 0.202 0.124 0.090 0.076(1.974) (0.241) (0.072) (0.040) (0.032) (0.030)
0.2 0.326 0.473 0.263 0.172 0.123 0.098(2.500) (0.348) (0.091) (0.050) (0.037) (0.030)
![Page 124: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/124.jpg)
![Page 125: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/125.jpg)
Chapter 5
Estimating the return toeducation using LIV
5.1 Introduction
We apply the methods developed in the previous chapters to three empirical
datasets to examine the return to education on income. Education is an impor-
tant topic in public debates. Over the past decades, much research has been
conducted to estimate the causal effect of education on earnings, see for in-
stance Griliches (1977), Card (1999, 2001), or Uusitalo (1999). Most of the
studies in question have focused on estimating a version of the following linear
regression equation:
yi = β0+ β1Si + Xiβ2+ εi , (5.1)
whereyi is the logarithm of a measure of earnings,Si is a measure of education
and Xi is a collection of other explanatory variables assumed to influenceyi .
β1 measures the effect of education on income and is expected to be positive.
The disturbancesεi represent all other influences not explicitly accounted for.
If the disturbances are distributed independently of the explanatory variables
Si andXi , the simple OLS estimator can be used to estimateβ1. However, the
independence assumption may not be realistic and in this case it can be shown
that the OLS estimator is biased (e.g. Greene, 2000). Four major potential
111
![Page 126: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/126.jpg)
112 Chapter 5 Estimating the return to education using LIV
sources of bias (ability bias, measurement error bias, heterogeneity bias, and
optimizing behavior bias) have been identified in the literature on the relation-
ship between education and income, each of which is discussed in section 5.2.
As will become clear from this discussion, there is little agreement on the di-
rection and magnitude of the potential bias in the OLS estimator of the return
to education effect. This situation is not surprising in view of the many sources
of potential regressor-error dependencies, with each of them having their own
specific impact on the direction and magnitude of the bias in OLS. A further
complicating factor is that these causes offset or enforce each other.
One way to circumvent problems of endogeneity is to find instruments and
apply two-stage least squares or limited maximum likelihood estimation tech-
niques (see e.g. Bowden and Turkington, 1984, Verbeek, 2000, or Greene,
2000). Instruments are variables that mimic the troublesome regressors as well
as possible but are uncorrelated with the error term. Hence, instrumental vari-
ables cannot have a direct effect on the dependent variable. In practice it is not
obvious how or where to find valid instruments. Furthermore, instruments are
often weak, i.e. they only explain a small part of the variance of the endoge-
nous regressor. This may result in estimates that are even more biased than
the OLS estimates (Staiger and Stock, 1997, Bound, Jaeger and Baker, 1995),
see also sections 4.3 and 4.4. In section 5.3 we discuss the problems of IV
estimation for the model given in (5.1), and it is shown that instrumental vari-
able estimation in estimating the return to education is not a straightforward
exercise.
Card (1999, 2001) surveys several empirical studies on the return to education
and finds regression estimates ranging from about 0.03− 0.14. Quite often,
the OLS estimates were not found to be statistically different from the instru-
mental variable estimates. As suggested above and discussed in more detail
in section 5.3, instrumental variables estimates for these kind of studies are
potentially biased as well, because the instruments used are possible weak and
endogenous. We find empirical evidence for this in section 5.4. Recent evi-
dence from twin studies suggests an upward bias in the OLS estimator of about
![Page 127: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/127.jpg)
5.2 Sources of bias in the OLS estimate of the return to education 113
10%-15% (cf. Card, 1999). The major advantage of using data on twins is that
no observed instruments are required as the within–family estimator can be
used (see also subsection 5.3.3).
The latent instrumental variable (LIV) method proposed in the previous chap-
ters provides approximately unbiased estimates of the model parameters with-
out relying on observed instruments. It is based on the assumption that a dis-
crete latent variable splits the endogenous regressorx into an exogenous com-
ponent and an endogenous component that is correlated with the error term. In
section 5.4 we estimate the return to education for three datasets using LIV. We
show, by using the previously proposed diagnostics, that the LIV model is not
particularly sensitive to outliers and fits the data sets fairly well. The instru-
ments that are estimated by LIV from the data are shown to be much stronger
than the available observed instruments that are typically used in these ap-
plications. Furthermore, we investigate the validity of the available observed
instrumental variables in these datasets using the methods proposed in section
4.3. We find considerable evidence that in two of the three applications the
observed instruments are weak and/or endogenous. Overall, the LIV approach
yields results that are more consistent than the classical IV results. We find a
moderate upward bias in OLS of≈ 7% which is close to recent results from
twin studies, and supports the ability bias hypothesis. On the other hand, the
bias in OLS, as indicated by the classical IV estimates, ranges from−80% to
+30% for the three applications, which illustrates that opposite answers may
be obtained, if one uses different sets of instrumental variables to address the
same substantive research question. Section 5.5 presents a summary of our
findings.
5.2 Sources of bias in the OLS estimate of the returnto education
5.2.1 Ability bias
Much work has focused on the issue whether the presence of a –so called–
‘ability bias’ overstates the true causal effect of education on earnings (e.g.
![Page 128: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/128.jpg)
114 Chapter 5 Estimating the return to education using LIV
Angrist and Krueger, 1991, Harmon and Walker, 1995, Verbeek, 2000). ‘Abil-
ity’ can be seen as an omitted variable that enables (certain) individuals to
obtain more income. In this case, the true model isyi = β0+β1Si +ρAi + εi ,
whereAi denotes ‘ability’ andρ is the effect of ‘ability’ on income which is
expected to be positive. We assume for the moment that there are no other
explanatory variables present. The probability limit of the OLS estimator for
β1, while omitting ‘ability’, is1
plim βOLS1 = plim
∑ni=1(Si − S)(yi − y)∑n
i=1(Si − S)2= β1+ ρ
σS A
σ 2S
,
whereσS A denotes the covariance between education and ability, andσ 2S is
the variance ofS. When individuals with higher ability have chosen to obtain
more education (σS A> 0), the effect of education on income is overstated, as
ρ > 0, since the effect of unobserved ability is falsely attributed to it. As such,
exogenous shocks in education levels will have less effect on individual wages
than what is predicted by the OLS regression model, and education seems more
valuable than it actually is.
5.2.2 Measurement error bias
Although ‘ability’ bias may induce a positive bias in OLS, error in the mea-
surement of the education variableSi may result in downward biases. Often,
the only data available to measure education is ‘years of schooling’. How-
ever, it can be questioned whether ‘years of schooling’ adequately measures
‘total education’. Griliches (1977) shows that if the measures for education
are imperfect, OLS estimates can have a large downward bias. This bias is
magnified (even if the error of measurement is small) when more variables are
included in the model. This can be seen as follows (Griliches, 1977). Let the
true wage-education equation be
yi = β1S∗i + Xiβ2+ εi ,
1Substitute the expression foryi in the expression forβOLS1 , complete the terms, and use the
law of large numbers and Slutsky’s theorems (see Ferguson, 1996).
![Page 129: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/129.jpg)
5.2 Sources of bias in the OLS estimate of the return to education 115
with β1 = 0.1 andβ2 = 0.01, say. S∗i is the true but unobserved level of
education. The observed level of schoolingSi is measured with error, such
that Si = S∗i + ui , with ui a random term independent ofS∗i and εi . Let
λ = σ 2u/σ
2S be the fraction of the observed variance in schooling that is due
to measurement error and assume thatXi (e.g. ability or other explanatory
variables) is measured without error. Regressingy on S, while ignoring X,
gives
βOLS1 =
∑ni=1(Si − S)(yi − y)∑n
i=1(Si − S)2= β1− β1
∑ni=1(Si − S)(ui − u)∑n
i=1(Si − S)2+
+β2
∑ni=1(Si − S)(Xi − X)∑n
i=1(Si − S)2,
which has probability limit
plim βOLS1 = β1− β1
σSU
σ 2S
+ β2
σSX
σ 2S
,
where the covarianceσSU = σ 2U , sinceS∗ is independent ofu, andσSX is the
covariance betweenS and X. Suppose that 10% of the observed variance in
schooling is due to measurement error, i.e.λ = 0.1, thatσS = 3,σX = 15, and
that the correlation betweenX andS is ρX S= 0.5. Then the inconsistency of
βOLS1 follows from
plim βOLS1 = 0.1− 0.1× 0.1+ 0.01× 3× 15× 0.5
9= 0.115,
from which it can be seen that the simple OLS estimator is biased upward by
15%. If the additional explanatory variableX is added, the probability limit
for βOLS1 becomes (see Judge et al., 1985, p.708)
plim βOLS1 = β1− λ
β1
1− R2SX
= 0.1− 0.1× 0.1/0.75= 0.087,
whereRSX is the multiple correlation coefficient betweenS and X, and OLS
exhibits a downward bias of 13%. It can be seen that: (1) measurement error in
![Page 130: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/130.jpg)
116 Chapter 5 Estimating the return to education using LIV
S induces a negative bias in the OLS estimator forβ1, (2) this negative bias can
be offset by an upward bias due to the omission ofX and, hence, the total bias
in βOLS1 may be positive, (3) adding explanatory variablesX correlated with the
systematic components of schooling increases the measurement error bias in
βOLS1 , even when the additional variables do not explain much of the variance
in the observed (log) wages (Griliches, 1977).
5.2.3 Heterogeneity bias
Heterogeneity in the regression coefficients of (5.1) is a third source of poten-
tial bias in the OLS estimates. People differ with respect to their marginal re-
turn to education, their marginal cost for education, and their tastes or beliefs.
Now the return to education is not a single parameter but a random variable
that potentially differs with background characteristics of individuals as well.
Unobserved heterogeneity might induce a dependency betweenS andε. This
can be seen as follows, where we omit other explanatory variables and make
some simplifying assumption on the distribution ofSi to present a constructive
example, see also Card (1999, 2001). Let
yi = β0i + β1i Si + εi , (5.2)
with β0i = β0+ u0i andβ1i = β1+ u1i , such that E(β0i ) = β0 and E(β1i ) =β1. Biases in the OLS estimator for schooling arise when the unobserved het-
erogeneity(u0i ,u1i ) is correlated with schoolingSi . This can be illustrated as
follows. Following Card (1999, 2001), let
u0i = β0i − β0 = λ(Si − µS)+ ν0i
u1i = β1i − β1 = ψ(Si − µS)+ ν1i ,
wherev0i andv1i are independent of each other and ofεi . We also assume that
E (ν0i |Si ) = E (ν1i |Si ) = 0, andµS = E (Si ). Now
λ = cov(β0i , Si )
var(Si )and ψ = cov(β1i , Si )
var(Si ),
![Page 131: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/131.jpg)
5.2 Sources of bias in the OLS estimate of the return to education 117
andyi = β0+(β1+λ−ψµS)Si+ψS2i +ν0i+ν1i Si+εi , whereβ0 = β0−λµS.
It can be shown that (see also Card, 1999, 2001)2
βOLS1 =
∑ni=1(Si − S)(yi − y)∑n
i=1(Si − S)2= β1+ λ− ψµS+
+ ψ∑n
i=1 S3i − nS
(S◦ S
)∑n
i=1(Si − S)2+∑n
i=1 ν1i S2i − nS
(S◦ ν1
)∑n
i=1(Si − S)2+
+∑n
i=1 Si (ν0i + εi )− nS(ν + ε)∑ni=1(Si − S)2
,
and by using the weak law of large numbers (multiply nominators and denom-
inators by(1/n))3 and Slutsky’s theorem (Ferguson, 1996)
plim βOLS1 = β1+ λ− ψµS+ ψplim
1n
∑ni=1 S3
i − S(S◦ S
)1n
∑ni=1(Si − S)2
.
The latter fraction is equal to the regression coefficient ofS2i on Si which has
probability limit 2µS (it is assumed that the distribution ofSi is symmetric)4.
Hence,
plim βOLS1 = β1+ λ+ ψµS. (5.3)
This relation generalizes the conventional analysis of ability bias (Griliches,
1977). Ifβ1i = β1 for all i , i.e. there is no heterogeneity in the schooling co-
efficient, thenψ = 0 and the inconsistency of the OLS estimate forβ1 is equal
to λ, which can be interpreted as the conventional ability bias. If both intercept
and slope vary across individuals, i.e.λ 6= 0 andψ 6= 0, the OLS estimator
for β1 may be biased in another way. According to Card (1999), people with
higher returns to education tend to acquire more schooling (ψ > 0), and hence
2We write X ◦ Y = (1/n)∑n
i=1 xi yi , i.e. X ◦ Y is the sample mean of the productsxi yi .3E (ν1i |Si ) = 0 implies that E(ν1i S2
i ) = E [E (ν1i S2i |Si )] = 0.
4It can be shown that cov(X, X2) = E X3 − µXσ2X − µ3
X and E(X − µX)3 = E X3 −
3µXE X2 + 2µ2X = E X3 − µ3
X − 3µXσ2X , as EX2 = σ2
X + µ2X . If X is symmetric, E(X −
µX)3 = 0, and it follows that cov(X, X2) = 2µXσ
2X .
![Page 132: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/132.jpg)
118 Chapter 5 Estimating the return to education using LIV
a cross-sectional regression of earnings on schooling yields an upward-biased
estimate of the average marginal return to schoolingβ1. Verbeek (2000), on the
other hand, argues that the individual–specific returns to schooling are poten-
tially higher for individuals with low levels of schooling (ψ < 0), and hence
a downward bias in the OLS estimate forβ1 is to be expected. For now, we
do not favor one interpretation over the other, but emphasize that in both situ-
ations the OLS estimator is potentially biased, which may be either upward or
downward5.
5.2.4 Optimizing behavior bias
Finally, a fourth source of possible bias of the OLS estimator in the schooling
equation is due to the optimizing behavior of individuals. This is discussed
to some extent in Griliches (1977) and Card (1999, 2001). Schooling can be
regarded as the result of optimizing behavior of individuals or households.
Individuals try to reach an optimal schooling decision by maximizing ‘wealth’
or ‘utility’ based on anticipated earnings, that depends on schooling, ability,
other unknown factors, and certain (opportunity) costs of schooling (depending
on for instance interest rates and tuition fees). Garen (1984) views this as a
self-selection problem with a continuous choice variable. Uusitalo (1999) and
Harmon and Walker (1995) argue that this behavior could induce a positive
bias in the OLS estimator. Griliches (1977) shows that it might as well lead
to a downward bias. A more extensive discussion is beyond the scope of this
manuscript and we refer to the aforementioned works.
5.3 IV estimation of the returns to education
Given the divergent and a priori unknown sources of potential regressor-error
dependencies in estimating the return to education, it is not an easy task to find
appropriate instruments that alleviate regressor-error dependencies in model
5Card (1999, 2001) shows for model (5.2) that presence of measurement error inSi biasesthe OLS estimate forβ1 towards zero, implying that a relatively small amount of measurementerror may (partly) offset a modest upward ability bias, depending on the magnitudes ofλ andψµS. This can also be seen from the example in subsection 5.2.2
![Page 133: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/133.jpg)
5.3 IV estimation of the returns to education 119
(5.1). Card (1999, 2001) gives an overview of recent studies that use instru-
mental variables to estimate the return to schooling. He distinguishes two sets
of instrumental variables that are commonly used: (1) those that are based
on institutional features of the school system and (2) those that are based on
family background characteristics, both of which are discussed next.
5.3.1 Institutional features of the schooling system
When instrumental variables based on institutional features are used, the re-
sulting IV estimates are approximately 30% higher than the corresponding
OLS results. This finding does not agree with current beliefs in the litera-
ture about the traditional ability bias. Card (1999, 2001) provides four expla-
nations. Firstly, instruments based on institutional features of the schooling
system may not be truly exogenous, since a direct effect of the instruments on
earnings may exist. For instance, ‘college proximity’ is sometimes used as an
instrument (see e.g. section 5.4), but it may have a direct effect on earnings,
since families that place a strong emphasis on education may choose to live
near a college, while their children may have higher abilities and/or motiva-
tion to achieve labor market success (cf. Verbeek, 2000). Bound and Jaeger
(1996) argue that the quarter-of-birth dummy instruments used in Angrist and
Krueger’s (1991) study may have a direct association with the dependent vari-
able, as their results suggest that the association between quarter-of-birth and
earnings is too strong to be fully explained by school attendance laws. Card
(1999)6 argues that instruments based on schooling reforms (treatment), such
as changes in compulsory school attendance laws, are biased further upward
compared to OLS because of unobserved differences between the character-
istics of the treated and non-treated group, since these reform treatments are
often not random. Bound, Jaeger and Baker (1995) argue that Angrist and
Krueger use a large number of weak instruments and show that, in finite sam-
ples, IV estimates based on weak instruments are biased in the same direction
as OLS.
6par. 3.4 (p.1819-p.1822), and p.1841.
![Page 134: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/134.jpg)
120 Chapter 5 Estimating the return to education using LIV
Secondly, the downward bias in OLS can be a result of error in the measure-
ment of education. However, the strength of this effect is doubtful in view of
Card (1999) who argues that it is unlikely that measurement error alone can
account for the large positive gap between the IV and OLS estimates.
Thirdly, factors like compulsory schooling or schooling availability are most
likely to affect individuals who otherwise would have had relatively low school-
ing. If, because of potential heterogeneity, these individuals have higher than
average marginal returns to schooling, then instruments based on these vari-
ables tend to recover the returns to education for a subset of individuals with
relatively high returns to education, resulting in estimates higher than OLS.
Uusitalo (1999) notes in this respect that presence of heterogeneity in the co-
efficient of the returns to education yields an additional error termν1i Si . Since
the instrumentZi is correlated withSi , it cannot be uncorrelated with the error
term of the wage equation.
5.3.2 Family background
The second type of instrumental variables commonly used are instruments
based on family background characteristics, for instance measures on educa-
tion levels of family members. The use of these variables as instruments is
motivated by the fact that children’s education tend to exhibit a high correla-
tion with parents’ education. However, Card (1999) shows that when the OLS
estimator is biased upward because of unobserved ability, the bias in the IV es-
timator is at least as large, and potentially larger, depending on the strength of
the instruments and its possible direct effect on the dependent variable. Hence,
if the OLS estimator is biased upward, one would expect that an IV estimator
based on family background is biased upward even more. For a more detailed
discussion we refer to Card (1999).
5.3.3 Alternative, non-IV approaches
Although using instrumental variables is a common way to solve for omit-
ted ‘ability’ bias, some other methods are available (see e.g. Harmon and
![Page 135: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/135.jpg)
5.3 IV estimation of the returns to education 121
Walker, 1995). If good measures for ‘ability’ are available, they can be in-
cluded as proxy variables in the wage equation (5.1). Then, if no other sources
of regressor-error dependencies or measurement error problems remain, stan-
dard OLS techniques can be applied to estimate the return-to-schooling effect.
This approach, however, is doubtful. Empirical results using this approach
suggest an upward ‘ability’ bias in least-squares estimates (see Blackburn and
Neumark, 1993, Wooldridge, 2002). Griliches (1977), however, argues that it
is difficult to find a good measure for ability. Often proxies for unobserved
ability are measures of IQ. If, however, unobserved ability has no relation with
IQ, but is instead related to, say, ‘motivation’, any proxy for ability based on
IQ induces large measurement error biases in the OLS estimator, while unob-
served ability may still not be accounted for (see also the discussion in section
5.2.2).
A particular powerful approach to address regressor-error dependencies in school-
ing models is to use data on twins (or siblings) (Card, 1999). This approach
attempts to eliminate possible omitted variable biases by assuming that some
of the unobserved factors (e.g. ability or motivation) are identical within fam-
ilies (or twin/sibling pairs). In this case, differences of levels of schooling and
education for the twins or siblings can be exploited to estimate the effect of
education on wage. Card (1999) gives an overview of several studies that use
twin-data. He concludes that under the assumption that identical twins have
identical abilities, the within-family estimator gives a consistent estimate for
the average marginal returns to schooling. Furthermore, this estimator can be
corrected for measurement error. Card (1999) concludes from his survey that
the OLS estimator obeys a slight upward-bias of the order of 10%− 15%.
A drawback of these methods is the (possible) lack of generalization to non-
twins and the potential failure of the identical abilities assumptions for iden-
tical twins and siblings. If the assumption does not hold, twin studies might
still overestimate to some extent the effect of education on earnings. In a re-
cent study, Hertz (2003) also finds that the OLS results are biased upward. His
results are based on various measurement–error corrected, within–family esti-
mators for South–African households.
![Page 136: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/136.jpg)
122 Chapter 5 Estimating the return to education using LIV
The review of the literature on estimating the return to education in this and
the previous sections demonstrates that IV estimation has produced a less than
satisfactory solution to the endogeneity problem of the schooling effect. In the
next section we present the LIV estimates for model (5.1) for three empirical
applications and compare these estimates with classical IV (2SLS). We show
that the LIV results are more stable across the three datasets and are more in
line with recent evidence from twin studies. In addition, the LIV model fits the
data fairly well, based on the diagnostics in section 4.5. We argue that the LIV
estimates are to be preferred to the classical IV results in these applications.
5.4 Empirical results
In this section we present the results of three applications to examine the ef-
fect of education on income. Each of these three applications are based on
previously published data. First we briefly describe the three datasets, where
a more detailed description can be found in appendix 5A. We then estimate
model (5.1) with latent instrumental variables and compare these results with
the traditional IV and OLS estimates. Furthermore, we investigate the avail-
able observed instruments thoroughly, and conclude that the LIV results are to
be preferred over IV and OLS.
5.4.1 Data description
NLSY data
The first dataset is a sample of 3010 men taken from the US National Lon-
gitudinal Survey of Young Men (NLSY) from 1976. This dataset is analyzed
in Card (1995) and Verbeek (2000). The dataset contains several exogenous
variables and one dummy instrumental variable measuring the presence of a
nearby college, i.e. an instrument based institutional features of the school
system.
![Page 137: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/137.jpg)
5.4 Empirical results 123
Brabant data
The second dataset was originally sampled in 1952 from the Dutch province
‘Noord-Brabant’. Thirty years later the same individuals were contacted to
collect data on, among other things, educational level, income, and social back-
ground statistics. The labor market information used here is from 1983, and
the dataset used contains observations on 833 men who had reached a stable
labor market position. As with the NLSY dataset, several exogenous explana-
tory variables are available. We have two instrumental variables: measures on
the educational level of the respondents’ father and mother, i.e. family back-
ground characteristics (see also Hartog, 1988, for a more detailed description
of the data).
PSID data
The third dataset contains data on 424 working, married white women be-
tween the ages 30 and 60 in 1975, and comes from the University of Michigan
Panel Study of Income Dynamics (PSID), analyzed in Wooldridge (2002) and
Mroz (1987). The labor market information is from 1975. This dataset has
several exogenous variables. The available instruments are family background
variables: the respondents’ fathers and mothers level of education and the hus-
bands level of education. For more details on the datasets and the used regres-
sors and instruments, we refer to appendix 5A.
The three datasets differ on various key aspects (sample sizes, region, sex of re-
spondents, year of labor market information), which makes direct comparison
of the estimated regression coefficients superfluous. However, we compute the
relative bias in OLS with respect to the LIV and IV estimates, which, as will
become clear, allows for a straightforward comparison of the results across
the three applications. The application of LIV with its assumption of discrete
levels of the latent variable may well correspond to the existence of discrete
levels of schooling, underlying the measured education variables, that are free
of measurement error and that represent the levels of education that on would
obtain regardless of ability, but is not predicated on that. Alternatively, as one
![Page 138: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/138.jpg)
124 Chapter 5 Estimating the return to education using LIV
reviewer to this study pointed out, LIV can be interpreted to identify ‘latent
twins’ and using an analogue of the twin estimator, conceptually.
5.4.2 LIV results for schooling
We analyze all datasets with the LIV model and methods presented in chapters
3 and 4. We estimate the LIV model form= 2, ...,5 and with the inclusion of
extra exogenous variables. We emphasize that the LIV model doesnot require
the availability of instrumental variables, and the results in this subsection are
obtainedwithout using the available observed instruments mentioned in the
previous subsection. In subsection 5.4.3 these are included in the model in
order to examine their validity, using the methods in section 4.3. We also
present here the results for the standard OLS estimator, the IV estimator, and
LIV model fit diagnostics, but postpone a detailed discussion of the IV results
until subsection 5.4.3.
Estimated coefficients
Table 5.1: Results of OLS, IV and LIV for the schooling coefficient for the threedatasets. LIVx means that the LIV model is estimated withm= x categories.
βS OLS IV LIV2 LIV3 LIV4 LIV5
NLSY 0.074 0.133 0.050 0.065 0.068 0.069(0.0035) (0.0518) (0.0099) (0.0041) (0.0040) (0.0040)
Brabant 0.043 0.056 0.040 0.042 0.040 –(0.0044) (0.0075) (0.0051) (0.0049) (0.0049)
PSID 0.102 0.073 0.134 0.099 0.099 0.096(0.0139) (0.0321) (0.0282) (0.0160) (0.0153) (0.0142)
In table 5.1 we present the results for the estimated schooling coefficients for
the datasets using OLS, IV, and LIV. It can be seen that for all specifications for
m in the LIV model (denoted by LIVx), the resulting estimate for the schooling
coefficient is below the OLS estimate, indicating a small upward bias in the
![Page 139: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/139.jpg)
5.4 Empirical results 125
OLS estimate. On the other hand, the direction of the bias for the IV results
using the observed instruments is not unanimous, and we discuss this in more
detail in subsection 5.4.3. The only downward bias found by LIV is for the
PSID data whenm = 2. This can be expected if a dummy variable exists
which is identical or nearly identical to the unobserved instrument. In this
case, there is a situation of (almost) perfect multicollinearity in the second
stage of the LIV model and the parameters are only nearly identified. This
also explains why the results form = 2 have larger standard deviations than
what could have been expected and why relative large improvements in model
fit occur form> 2. In these applications several dummy regressors are present
(see appendix 5A). In addition, the PSID dataset is the smallest we have and
the likelihood surface may be less smooth in this case. For the Brabant data the
maximized value of the likelihood is degenerate atm = 5 and no estimate for
LIV5 is given in table 5.1. We found form > 5 also degenerate solutions for
the PSID data. Here the LIV method indicates that the number of instruments
(number of categories) should not be too large. Overall it can be seen that
the LIV results are fairly stable for different choices ofm and we consider
choosing the ‘best’m next.
Choosing the number of categories of the latent instrument
As argued in the previous chapter, we choose among the different values for
m by looking at the ICL criterion, and for comparison and validity we also
present AIC3 and BIC in table 5.2. For the NLSY data the ICL statistic yields
a minimum atm = 4 and AIC3 and BIC atm = 5. For the Brabant dataset
ICL yields m = 2 and AIC3/BICm = 4. All three measures givem = 5
for the PSID data. In accordance with recent evidence on the performance
of the ‘classical’ selection criteria, we also find in two of the three cases that
AIC-based statistics point to a larger number of categories for the discrete
instrument. Importantly, it can be seen from table 5.1 and from results in
appendix 5B that the estimated regression coefficients and the estimates for
the schooling equation are not very different for the optimal values ofm as
indicated by ICL and AIC3/BIC. As we will show this result also holds for
testing for (absence of) endogeneity. In the following we will only consider
![Page 140: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/140.jpg)
126 Chapter 5 Estimating the return to education using LIV
Table 5.2: Computed values for BIC, AIC3 and ICL. Boldface values indicate theminimum value (row-wise), and a∗ denotes a degenerate solution.
m= 2 m= 3 m= 4 m= 5
NLSYBIC 5832.06 5404.04 5309.55 5291.73AIC3 5751.91 5313.86 5209.36 5181.52ICL 6942.75 5703.59 5611.37 5995.09
BrabantBIC 1867.02 1837.67 1835.73 1849.18∗
AIC3 1799.967 1763.174 1753.774 1759.774∗
ICL 1931.07 1974.67 1990.93 2004.44∗
PSIDBIC 1164.49 1023.99 1005.42 905.26AIC3 1103.498 956.894 932.227 825.97ICL 1199.23 1042.04 1020.93 914.55
the LIV results for the optimal number of categories for the latent instrument,
i.e. m= 4 for NLSY, m= 2 for Brabant, andm= 5 for PSID.
Testing for endogeneity
Table 5.3 shows the results for the relative bias7 in the estimated regression
coefficient for schooling with respect to OLS for the IV and optimal LIV re-
sults. Furthermore, the test results for testing for absence of endogeneity are
presented. We present the results for IV (2SLS) as well, but discuss the IV
estimates and the used instruments in more detail later on. The test-statistics
for LIV are computed without using the observed instrumental variables. The
Hausman-test is based on comparing the complete vectorsβOLS andβLIV (and
βIV ) (see appendix 5B for the estimates of the complete vector of regression
coefficients). The Wald-test examines the covariance between the error terms
of the main regression equation and the equation for the endogenous schooling
(see section 4.6). This test cannot be computed for 2SLS.
Overall, we see that the differences between LIV and OLS are not large, which
7We computed this percentage as 100× (1− βLIV1 /βOLS
1 ) and 100× (1− β IV1 /β
OLS1 ).
![Page 141: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/141.jpg)
5.4 Empirical results 127
Table 5.3: Relative biases with respect to OLS and results for Hausman- and Wald-tests to test for endogeneity (based on Hessian).
Data Estimator %1 H-test W-test
NLSY IV -79.9 1.31 -Opt. LIV m= 4 7.9 9.28 9.31Opt. LIV m= 5 6.5 7.18 7.19
Brabant IV -30.1 4.35 -Opt. LIV m= 2 7.0 0.97 0.97Opt. LIV m= 4 7.0 2.63 2.63
PSID IV 27.8 0.95 -Opt. LIV m= 5 5.5 4.20 4.53
is also indicated by the Hausman- and Wald tests (presented in the last two
columns of table 5.3)8. Both tests give similar conclusions. The optimal LIV
solutions for the NLSY data and the PSID data indicate a significant upward
bias in OLS, but for the Brabant data the estimated value forβ1 by LIV (for
m = 2) is not significantly different from OLS. Here, the classical IV estima-
tor indicates a significant downward bias in OLS.
Before discussion the classical IV results in more detail, we first examine var-
ious diagnostics for the above presented LIV estimates, where we only report
the results for the LIV model indicated by the (preferred) ICL-criteria and re-
port, in case of the NLSY data and the Brabant data, only the results for the
model selected by AIC3 when these are substantially different. We note that
for the Brabant data theR2 measure for the strength of the LIV instrument, as
discussed in subsection 4.5.2, is substantially better form= 4 than form= 2,
which is discussed in more detail in subsection 5.4.3.
8The critical 5%-value of aχ21 distribution is 3.84.
![Page 142: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/142.jpg)
128 Chapter 5 Estimating the return to education using LIV
Diagnostics: outliers, influential observations, normality and heteroscedas-ticity
We examined the various diagnostics presented in the previous section to in-
vestigate the fit of the (optimal) LIV model and to identify potential outliers
and influential observations. Residual plots for the three datasets are given in
figure 5.1.
For the NLSY data (n = 3010), residual checks did not reveal heteroscedastic-
ity, and residuals had a skewness of -0.28 and a kurtosis of 3.5. All standard-
ized residuals were smaller than (in absolute value) 4.5. Examining the outliers
and influential observations diagnostics did not identify highly unusual data.
For the PSID data (n = 424) there is evidence of weak heteroscedasticity
for the variable ‘experience’, but this effect is rather small. The residuals are
slightly skewed (-0.26) and are leptokurtic9 (kurtosis is 5.1). One observation
was identified as an influential observation, but no outliers are present. When
this observation is removed results and conclusions do not change, and all stan-
dardized residuals are smaller (in absolute value) than 4.
As for the PSID data, the results for the Brabant data (n = 833) indicate slight
evidence of weak heteroscedasticity, here for the dummy variable whether the
father is self employed at the age of 12. For this dataset, the residuals are more
skewed (skewness is -1.25) and more leptokurtic (kurtosis is 12.7). However,
examination for potential outliers and influential data identifies three observa-
tions that clearly do not ‘fit’ the rest of the data. We re-estimated the model
without these three observations, and found that the estimates and test statis-
tics are not strongly affected. The Hausman- and Wald statistic for them = 4
solution, however, now become 3.47 and 3.48, respectively, which are both
significant atα = 0.10. After omission of these outlying data, the residuals
are less skewed and leptokurtic. All but four of the absolute values of the stan-
dardized residuals are smaller than 4.5, with a maximum of 5.9. We note that
9A distribution with a high peak is called leptokurtic (Weisstein, 2004b).
![Page 143: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/143.jpg)
5.4 Empirical results 129
Figure 5.1: Residuals.
![Page 144: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/144.jpg)
130 Chapter 5 Estimating the return to education using LIV
the measures for skewness and kurtosis are based on higher order moments,
which are known to be sensitive to outliers.
5.4.3 Relative biases and comparison with classical IV
From the relative percentage bias in OLS with respect to the optimal LIV and
IV estimates in table 5.3 (the column indicated by %1), it can be seen that the
LIV method reveals an upward bias of OLS ranging from 5.5%− 8%. When
traditional IV is used, the conclusions are very different for the three studies,
ranging from an≈ 80% downward bias to an≈ 30% upward bias in OLS.
For the NLSY data, the IV estimate for the return to schooling, based on a
dummy for college proximity, is about 80% higher than OLS and equal to (ap-
proximately) 0.13 (0.052). For the Brabant data, we find that the IV estimate
is 0.056 (0.008), which is also substantially higher (≈ 30%) than OLS. Here
the instruments are the levels of education of the respondents’ parents. Using
similar instrumental variables, we find for the PSID data anupward bias of
≈ 30% in the OLS estimate. It can be seen that in all cases, the IV estimate
has a standard deviation that is substantially higher than OLS. The instability
of the 2SLS results and the high standard deviations may be a result of weak
and/or endogenous instruments, which is investigated in the next subsection.
For the Brabant data and the PSID data we had more instruments available than
necessary for identification, allowing us to examine the sensitivity of IV to dif-
ferent choices for the set of instruments10. When only mothers’ education is
used as IV in the Brabant data, i.e. the number of instruments is decreased, the
estimated coefficient for the return to education becomes 0.059 (0.009), which
is slightly higher than the estimate obtained from using the full set of instru-
ments. Similarly, in the PSID data we also have husband’s education as an
additional covariate. When this variable is included in the set of instruments,
i.e., the number of instruments is increased, the IV estimate becomes 0.065
(0.023). This value is about 11% lower than the IV results for the smaller set
of instruments. These results indicate that, in particular for the PSID data, the
10This can be seen as changing the number of categories of the latent instrument in the LIVmodel.
![Page 145: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/145.jpg)
5.4 Empirical results 131
2SLS estimates may be sensitive to different choices for the set of instruments.
In the following we examine the strength of the available observed instruments.
Strength of the available observed instruments
Table 5.4: Results strength of observed versus predicted LIV instruments. Instru-ments NLSY: ‘Nearc’, Brabant: ‘FatherEd’ and ‘MotherEd’, respectively, and PSID:‘husbanded’, ‘mothered’, and ‘fathered’, respectively (based on Hessian).
Data Method 1R2 γ2z21γ2z22
γ2z23Test
NLSY Obs IV 0.0029 0.34 –(0.11)
LIV4 IV 0.7503 0.15 6.83(0.06)
LIV5 IV 0.7976 0.16 7.85(0.06)
Brabant Obs IV 0.0922 1.08 1.27 –(0.18) (0.22)
LIV2 IV 0.3906 0.89 1.02 104.41(0.16) (0.19)
LIV4 IV 0.5247 0.61 0.93 48.48(0.14) (0.21)
PSID Obs IV 0.3225 0.34 0.12 0.10 –(0.03) (0.03) (0.03)
LIV5 IV 0.8312 0.04 -0.01 0.02 14.87(0.01) (0.01) (0.01)
Table 5.4 shows the results of the diagnostics proposed in chapter 4 that exam-
ine the strength of the observed instruments. Investigating the strength of the
observed instruments is important when using classical IV (2SLS) estimation.
We examine theR2 of the first-stage regression for the observed and optimal
LIV instruments (see subsection 4.5.2), and discuss the results of the test pro-
posed in subsection 4.3.1. Here we used all available observed instruments11
11In fact, we estimate model (4.12). These estimation results are also used in the next subsec-tion to examine the exogeneity of the observed instruments. Including the observed instruments
![Page 146: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/146.jpg)
132 Chapter 5 Estimating the return to education using LIV
and present conclusions only for the optimalm given in table 5.2.
The third column of table 5.4 reports the difference inR2 of the regression of
schooling on the explanatory variables and the available observed instruments,
or, in case of LIV, the optimal LIV instruments, and theR2 of the regression
of schooling on the exogenous explanatory variablesonly. Hence, a large in-
crease inR2 indicates that the instruments explain a substantial amount of the
variance in the endogenous schooling variable. It can be seen that in particular
for the NLSY data the observed instrument ‘Nearc’ appears to be weak. The
family background instruments (Brabant and PSID data) explain a larger part
of the variance in schooling, in particular for the PSID dataset. However, the
increase inR2 is in all cases substantially larger when using the optimal LIV
instruments. It follows that the optimal LIV instruments do a much better job
in explaining the variance ofx than the available observed instruments. These
findings explain the loss of efficiency in the 2SLS estimates for the regres-
sion coefficients in table 5.1, where the IV estimated standard deviations are
(0.052), (0.008), and (0.032) and, respectively, 14.8, 1.7, and 2.3 times higher
than the OLS standard deviations. Not surprisingly, the estimated standard de-
viations for (the optimal) LIV estimates are only 1.14, 1.16, and 1.02 times the
OLS estimated standard deviations.
The results for the Wald-test to test for a zero-effect of the observed instru-
ments on schooling is given in the last column of table 5.4, and the reported
coefficients in the columnsγ2z2are the estimated direct effects of the observed
instruments on schooling. For instance, for the NLSY data it can be seen that
individuals who lived near a college have slightly more education, and from
the Brabant and PSID data it follows that parents education is positively re-
lated to the years of schooling of their children. Under the null hypothesis
H0 : γ2z2= 0, the test-statistic has approximately aχ2-distribution with de-
grees of freedom 1, 2, and 3, for, respectively, the NLSY, Brabant, and PSID
data. It can be seen that in all cases the null hypothesis is rejected, indicat-
yields the following relative biases (%1): for the NLSY data 7.9% (m= 4) and 6.6 % (m= 5),for the Brabant data 6.6% (m= 2) and 9.4% (m= 4), and for the PSID data 9.0% (m= 5).
![Page 147: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/147.jpg)
5.4 Empirical results 133
ing that the observed instruments have a non-zero direct effect on schooling12.
For the NLSY data, however, theP-values are 0.009 (m = 4) and 0.005
(m = 5), which provides evidence that the instrument used is considerable
weak, given the remarks in subsection 4.3.1 and the substantial sample size
of this dataset. For the Brabant and PSID data we also testedH0 for each in-
strument separately. Only for the instrument ‘mothered’ in the PSID data the
null hypothesis of zero effect on schooling was not rejected. Although in all
cases the null hypothesis of a (joint) zero effect is rejected usingα = 0.01,
the incrementalR2’s were not large, in particular for the NLSY data. Here the
available instrument seems to be weak. However, in all cases the optimal LIV
instruments were found to be substantially stronger than the available observed
instruments. Their exogeneity is considered next.
Examining exogeneity of available observed instruments
Table 5.5: Results endogeneity test of available observed instruments. InstrumentsNLSY: ‘Nearc’, Brabant: ‘FatherEd’ and ‘MotherEd’, respectively, and PSID: ‘hus-banded’, ‘mothered’, and ‘fathered’, respectively (based on Hessian).
Data Method β2z21β2z22
β2z23Test
NLSY Opt. LIV m= 4 0.022 1.810(0.02)
Opt. LIV m= 5 0.021 1.760(0.02)
Brabant Opt. LIVm= 2 0.015 0.055 6.075(0.02) (0.03)
Opt. LIV m= 4 0.017 0.058 6.920(0.02) (0.03)
PSID Opt. LIVm= 5 -0.019 -0.006 0.000 2.731(0.01) (0.01) (0.01)
In table 5.5 we present the results of the Wald-test for testing whether the co-
efficient of the direct effect of the observed instruments on the dependent vari-
12The 2SLS model is not identified under the null hypothesis.
![Page 148: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/148.jpg)
134 Chapter 5 Estimating the return to education using LIV
able (wage) is zero. A nonzero effect would violate the exogeneity assumption
of the instrument. We emphasize that the test in the previous subsection (table
5.4) tested whether the effect of the instruments on the endogenous regressor
was zero.
The estimated coefficients are reported in the columns indicated byβ2z2and
the values of the test-statistic are given in the last column. The estimated coef-
ficientsβ2z2are the direct effects of the instruments on the dependent variable
wage. The degrees of freedom of theχ2 null-distribution are, as before, 1, 2,
and 3, for, respectively, the NLSY, Brabant, and PSID data. For the Brabant
data the null hypothesis of no (joint) direct effect of the observed instruments
on the dependent variable is rejected forα = 0.05, suggesting that these in-
strumental variables are not exogenous, i.e. parental education levels have a
significant positive effect on the respondent’s wage. Performance of the 2SLS
estimator critically relies on the exogeneity of the used instruments and these
results suggest that the 2SLS estimates for the Brabant data are to be distrusted.
For the other two datasets there was no evidence of significant non-zero effects
of the observed instruments on the dependent variable.
5.4.4 Wrap-up
Our results illustrate the difficulties associated with IV estimation in these ap-
plications. The conclusions for the three datasets with respect to the magni-
tude and sign of the bias in the estimated OLS coefficient for schooling differ
highly, even with a similar set of instruments. The results on the validity of the
available observed instruments in the previous subsections may explain part of
this variability.
First of all, the instrument ‘Nearc’ for the NLSY data was found to be the
weakest available instrumental variable, inducing only a minor increase in the
R2 of the first stage regression. The simulation results in subsection 4.4.1
showed that presence of weak instruments results in large swings in the 2SLS
![Page 149: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/149.jpg)
5.4 Empirical results 135
estimates13, which may account for the large downward bias in the OLS es-
timate found for the NLSY data (see also Card, 1999, 2001). Accordingly,
the 2SLS estimator suffers from large standard deviations in these cases. Sec-
ondly, the family background instruments used for the Brabant data were found
to have a direct effect on the dependent variable, in which case the bias in the
2SLS estimates may be of the same sign and potentially larger in magnitude
than the bias in OLS, which explains the downward bias in OLS found in
the Brabant application when using 2SLS (see also Bound, Jaeger, and Baker,
1995). Finally, our results suggest that the instruments for the PSID data are
the ‘best’ among the three applications, although evidence suggests that they
are somewhat weak. The 2SLS results for the PSID data do indicate an upward
bias in OLS, but the magnitude is much larger than for the LIV model.
Card (1999) argues that instruments based on family background characteris-
tics are likely to be endogenous, in the presence of omitted ability, which was
supported by our findings for the Brabant data. We did not find evidence for
this hypothesis for the PSID data, where the respondent’s husband, father, and
mother’s levels of education are the available observed instruments. One ex-
planation is that the power of the test is lower for the PSID data because of the
smaller sample size. Furthermore, the PSID data contains labor market infor-
mation on women obtained in 1976, while in that period education, income,
and other labor market issues, may have been less of an issue for women than
for their male siblings. Hence, it can be expected that family education has a
lower correlation with the respondents data.
These results are contrary to the optimal LIV solutions, which are more effi-
cient than standard IV since the LIV method optimally estimates instruments
from the available data. Furthermore, the best available evidence from the
latest studies on identical twins suggests a small upward bias on the order of
10–15% in the OLS estimator (cf. Card, 1999), which is not supported by the
standard IV estimates from the three datasets analyzed here. Our estimates
13In fact, the results in table 4.1 for the weak instrument cases report median bias and IQRinstead of mean bias and standard deviation because of the presence of too many outliers.
![Page 150: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/150.jpg)
136 Chapter 5 Estimating the return to education using LIV
have the same order of magnitude found in the twin studies but do not fully
recover the 10% difference. A reason for this result might be that estimating
the model by simple OLS yields in general only a modest fit (the OLS results
presented in table 5.1 have R-squares of respectively 29%, 23%, and 17%),
i.e. the regressors do not explain a large part of the variance in wage. The
fact that LIV finds a smaller positive bias might also indicate that a part of the
positive ability bias is offset by negative biases due to e.g. measurement error
or heterogeneity, which is expected to be less in the twin studies14. Further, in
the twin studies there may still be a limited amount of unobserved ability if the
abilities of twins and siblings differ.
5.5 Conclusions
The studies of Card (1999, 2001) clearly indicate the difficulties associated
with applying standard IV methods to estimate the returns to education. The
results are often found to be counterintuitive and different across studies. Fur-
thermore, in many instances it can be questioned whether the instruments used
were ‘valid’. Unfortunately, classical instrumental variable methods do not
allow for straightforward testing of the validity of a specific instrument. We
show that the LIV method can be successfully applied to solve these prob-
lems. The OLS estimates are found to be biased upward by about 7%. Equally
important, the available observed instruments that have been used seem to be
mostly inadequate, and produce results that are both more biased than the OLS
results and have much lower efficiency, see table 5.6.
The advantage of the LIV approach is that no observable instruments are needed.
Furthermore, once estimates have been obtained, endogeneity can be tested for
in a straightforward manner. We showed that for the different specifications
of m and across three different datasets the estimates are consistent. For the
14For instance, let the true schooling effect beβS = 10. With a+4 ability bias and a−2measurement error bias, the estimated schooling effect by OLS is 12, resulting in an upwardbias of≈ 16% in OLS. With fewer measurement error, e.g.−1, the OLS estimate is 13, whichyields an upward bias of≈ 23%.
![Page 151: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/151.jpg)
5.5 Conclusions 137
Table 5.6: Summary main conclusions LIV results effect of education on earnings.
Data
NLSY Brabant PSID
Sample size 3010 833 424Rel. bias OLS by LIV 7.9% 7% 5.5 %Rel. bias OLS by IV -79.9% -30.1% 27.8%Test for endogeneity + - +Instrument strength weak moderate moderateInstrument exogeneity exogenous endogenous exogenous
NSLY and the PSID dataset we find significant evidence of an ‘ability’ bias15.
Furthermore the standard errors of the estimates are much smaller than the
standard errors for standard IV, and not much larger than OLS. Because of the
relative large number of observations in the NLSY data, it is to be expected
that the power of the Hausman- and Wald-test is larger. In using LIV to test for
endogeneity it is recommended to use datasets of substantial size to ensure a
reasonable power. We proposed several diagnostics that may complete an anal-
ysis using LIV. We do not find any evidence that the LIV models used for the
three applications here do not fit the data well. Especially, in view of the large
samples sizes for the three datasets, the small deviations from the assumptions
that were found may not pose a problem in making inferences.
The relative size and magnitude of the bias in the OLS estimator that was found
is somewhat smaller, but still close to the numbers reported in Card (1999) for
the twin studies: 6–8% for all three datasets. This shows considerable con-
vergent validity and it lends additional credibility to the LIV approach. In the
next chapter we consider estimation of multilevel models in presence of en-
dogenous regressors.
15After deleting three outliers, the tests used also point out a significant (α = 0.10) upwardbias in OLS for the Brabant data (subsection 5.4.2).
![Page 152: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/152.jpg)
138 Chapter 5 Estimating the return to education using LIV
Appendix 5A Descriptive statistics datasets used
Figure 5A.1: Histogram of ‘Schooling’.
![Page 153: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/153.jpg)
Appendix 5A Descriptive statistics datasets used 139
5A.1 NLSY data
The total sample consists of 3010 men taken from the National Longitudinal Surveyof Young Men (see Verbeek, 2000, and Card, 1995)16. In this survey, a group ofindividuals in the age of 14− 24 years is followed since 1966. The labour marketinformation used is from 1976. In this year, the individuals had on average a littlemore than 13 years of schooling, with a maximum of 18 years (see figure 5A.1). Theaverage working experience was about 8.86 years (in 1976 those men aged 24− 34)with an average hourly wage rate of $5.77. The variables used can be found in table5A.1. We used the values centered around the mean for schooling in estimation.
5A.2 Brabant data
The initial dataset17 used in this paper consisted of 839 observations, but we deleted 5observations with very low wages (log hourly wages< 0). Another observation withan extreme large reported wage was also removed (> 9× IQR from median). Thisdata was collected in 1983 in the Netherlands’ southern province of Noord-Brabant.At that time the average age of the men in the sample was about 43. This cohort wasconfronted with compulsonary schooling until 12 years of age. The schooling measureused is the number of post–compulsonary years of schooling; on average 4.35 years(see figure 5A.1). The average hourly wage was Dfl. 16.72 and the individuals had,on average, 25 years of work experience at the time of the survey. See table 5A.2 formore information on the variables used. As before, we centered the schooling variablearound the mean.
5A.3 PSID data
As for the Brabant dataset, we removed a few observations prior to data analysis: fourobservations had an obvious lower (log) wage (<< −1) than the rest and were notused for estimation. This data come from the University of Michigan Panel Studyof Income Dynamics (PSID)18, obtained in 1976 (also used in Mroz, 1987). Thesample consists of working married white women, who were aged in between 30 and60 in 1975. They earned on average $4.18 per hour. The women reported an average12.7 years of schooling (see figure 5A.1) and a little over 13 years of labor marketexperience. For a detailed description of the used variables, see table 5A.3, where forestimation, the schooling variable was mean-centered.
16http://www.econ.kuleuven.ac.be/GME/.17We thank Hans van Ophem (University of Amsterdam) for making this dataset available to
us.18http://mitpress.mit.edu/Wooldridge-EconAnalysis.
![Page 154: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/154.jpg)
140 Chapter 5 Estimating the return to education using LIV
Table 5A.1: Descriptive statistics NLSY dataset (n = 3010).
Variable Description Mean Std.
Regressorsconstant (β0) Model constant - -schooling (β1) Years of schooling in 1976 13.26 2.68experience (β2) Potential experience 8.86 4.14black (β3) Equals 1 if black 0.23 0.42smsa (β4) Equals 1 if lived in metropolitan
area in 1976 0.71 0.45south (β5) Equals 1 if lived in south in 1976 0.40 0.49
Dependentlog wage Logarithm of hourly wage 6.26 0.44
InstrumentsNearc Grew up near a 4 year college 0.68 0.47
Table 5A.2: Descriptive statistics Brabant dataset (n = 833).
Variable Description Mean Std.
Regressorsconstant (β0) Model constant - -schooling (β1) Years of schooling after age 12 4.35 4.00experience (β2) Potential experience 25.52 4.19nr. children (β3) Number of children present at age 12 4.91 2.68av. mark (β4) Average school mark in final year of
primary education 5.62 1.42anti-social (β5) Equals 1 comes from antisocial background 0.10 0.29fself (β6) Equals 1 if father is self employed at age 12 0.31 0.46
Dependentlog wage Logarithm of hourly wage 2.70 0.42
InstrumentsFather Ed. Education level father 2.35 0.70Mother Ed. Education level mother 2.22 0.54
(levels: 1− 6, higher categories = highereducation)
![Page 155: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/155.jpg)
Appendix 5B Results optimal LIV model for the three datasets 141
Table 5A.3: Descriptive statistics PSID dataset (n = 424).
Variable Description Mean Std.
Regressorsconstant (β0) Model constantschooling (β1) Years of schooling 12.66 2.29experience (β2) Actual labor market experience 13.09 8.05kidslt6 (β3) Number of children younger than 6 0.14 0.39kidsgr6 (β4) Number of children older than 6 1.34 1.32unempl (β5) Unemployment rate in county of residence 8.54 3.04city (β6) Equals 1 if lives in SMSA 0.64 0.48nwincome (β7) Family income less total income wife / 1000 18.992 10.62
Dependentlog wage Logarithm of hourly wage 1.22 0.67
Instrumentsfather ed. Years of schooling father 8.80 3.57mother ed. Years of schooling mother 9.24 3.37husband ed. Years of schooling husband 12.50 3.02
Appendix 5B Results optimal LIV model for the threedatasets
Table 5B.1: NLSY data.Results for OLS, IV, and optimal LIV. Hereβ0 is the constant,β1 is ‘schooling’,β2 is ‘experience’,β3 is ‘black’, β4 is ‘smsa’, andβ5 is ‘south’.
β0 β1 β2 β3 β4 β5 σ 2ε
OLS 6.262 0.074 0.039 -0.188 0.165 -0.129 0.142(0.007) (0.004) (0.002) (0.018) (0.016) (0.015)
IV 6.262 0.133 0.040 -0.103 0.109 -0.100 0.164(0.007) (0.052) (0.002) (0.077) (0.051) (0.030)
Opt. LIV 6.262 0.068 0.039 -0.197 0.170 -0.132 0.142m= 4 (0.007) (0.004) (0.002) (0.018) (0.016) (0.015)
Opt. LIV 6.262 0.069 0.039 -0.196 0.169 -0.132 0.142m= 5 (0.007) (0.004) (0.002) (0.018) (0.016) (0.015)
![Page 156: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/156.jpg)
142 Chapter 5 Estimating the return to education using LIV
Table5B
.2:NL
SY
da
ta.O
ptimalLIV
resultsfor
theestim
atedgroup
probabilitiesπ
j ,group
sizesλj
(inita
lics),and
γ2 .
Here
γ21
is‘age’,γ
22is
‘black’,γ
23is
‘smsa’,andγ
24is
‘south’.
π1
π2
π3
π4
π5
γ1
γ2
γ3
γ4
σ2ν
ρεν
-8.50-4.70
-1.032.96
0.01-0.60
0.29-0.14
1.100.091
(0.34)(0.12)
(0.03)(0.04)
(0.01)(0.07)
(0.06)(0.06)
0.0
10
.07
0.5
90
.34
-8.25-4.53
-1.072.31
3.90-0.01
-0.610.28
-0.090.84
0.091(0.29)
(0.10)(0.03)
(0.08)(0.13)
(0.01)(0.07)
(0.06)(0.06)
0.0
10
.07
0.5
60
.24
0.1
2
![Page 157: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/157.jpg)
Appendix 5B Results optimal LIV model for the three datasets 143
Tabl
e5B
.3:B
rab
an
td
ata
.Res
ults
for
OLS
,IV,
and
optim
alLI
V.H
ereβ
0is
the
cons
tant
,β1
is‘s
choo
ling’
,β2
is‘e
xper
ienc
e’,β
3is
‘nr.c
hild
ren’
,β4
is‘a
v.sc
hool
mar
k’,β
5is
‘ant
i-soc
ial’,
andβ
6is
‘fsel
f’.
β0
β1
β2
β3
β4
β5
β6
σ2 ε
OLS
2.70
10.
043
0.00
40.
004
0.03
3-0
.141
0.00
80.
133
(0.0
13)
(0.0
04)
(0.0
04)
(0.0
05)
(0.0
10)
(0.0
29)
(0.0
45)
IV2.
701
0.05
6-0
.004
0.00
50.
011
-0.1
600.
051
0.13
7(0
.013
)(0
.008
)(0
.006
)(0
.005
)(0
.015
)(0
.031
)(0
.050
)
Opt
.LI
V2.
701
0.04
00.
005
0.00
30.
038
-0.1
37-0
.001
0.13
2m=
2(0
.013
)(0
.005
)(0
.004
)(0
.005
)(0
.011
)(0
.029
)(0
.046
)
Opt
.LI
V2.
701
0.04
00.
006
0.00
30.
039
-0.1
36-0
.004
0.13
2m=
4(0
.013
)(0
.005
)(0
.004
)(0
.005
)(0
.011
)(0
.029
)(0
.045
)
Tabl
e5B
.4:B
rab
an
td
ata
.Opt
imal
LIV
resu
ltsfo
rth
ees
timat
edgr
oup
prob
abili
ties
πj,
grou
psi
zesλ
j(in
italic
s),
andγ
2.
Her
eγ
21is
‘age
’,γ
22is
‘nr.
child
ren’
,γ23
is‘a
v.sc
hool
mar
k’,γ
24is
‘ant
i-soc
ial’,
andγ
25is
‘fsel
f’.
π1
π2
π3
π4
γ1
γ2
γ3
γ4
γ5
σ2 ν
ρεν
-1.0
06.
380.
32-0
.02
0.71
0.82
-1.6
64.
390.
057
(0.1
0)(0
.30)
(0.0
3)(0
.03)
(0.0
7)(0
.19)
(0.2
8)0
.86
0.1
4
-1.6
22.
636.
9211
.17
0.36
0.00
0.61
0.59
-1.4
82.
350.
105
(0.1
0)(0
.32)
(0.4
5)(0
.81)
(0.0
3)(0
.03)
(0.0
6)(0
.17)
(0.2
3)0
.72
0.1
90
.08
0.0
1
![Page 158: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/158.jpg)
144 Chapter 5 Estimating the return to education using LIV
Table5B
.5:P
SID
da
ta.R
esultsfor
OLS
,IV,
andoptim
alLIV.H
ereβ0
isthe
constant,β1
is‘schooling’,β
2is
‘experience’,β3
is‘kidslt6’,
β4
is‘kidsgr6’,β
5is
‘unempl’,β
6is
‘city’,andβ
7is
‘nwincom
e’.
β0
β1
β2
β3
β4
β5
β6
β7
σ2ε
OLS
1.2180.102
0.0140.004
-0.007-0.002
0.0290.004
0.376(0.030)
(0.014)(0.004)
(0.079)(0.025)
(0.010)(0.065)
(0.003)
IV1.218
0.0730.014
0.030-0.012
0.0000.038
0.0050.380
(0.030)(0.032)
(0.004)(0.083)
(0.025)(0.010)
(0.066)(0.004)
Opt.
LIV1.218
0.0960.014
0.010-0.008
-0.0010.031
0.0040.369
m=
5(0.029)
(0.014)(0.004)
(0.078)(0.024)
(0.010)(0.065)
(0.003)
Table5B
.6:PS
IDd
ata
.Optim
alLIVresults
forthe
estimated
groupprobabilities
πj ,
groupsizesλ
j(in
italics),
andγ
2 .H
ereγ
21is
‘experience’,γ22
is‘kidslt6’,
γ23
is‘kidsgr6’,γ
24is
‘unempl’,γ
25is
‘city’,andγ
26is
‘nwincom
e’.
π1
π2
π3
π4
π5
γ1
γ2
γ3
γ4
γ5
γ6
σ2ν
ρεν
-5.25-2.97
-0.641.39
3.760.00
0.110.00
0.000.05
0.010.22
0.092(0.11)
(0.09)(0.03)
(0.10)(0.06)
(0.00)(0.07)
(0.02)(0.01)
(0.06)(0.00)
0.0
40
.07
0.6
00
.09
0.1
9
![Page 159: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/159.jpg)
Chapter 6
Regressor and random-effectsdependencies in multilevelmodels1
6.1 Introduction
In many situations data have a hierarchical structure. For example, when it
is investigated how workplace characteristics affect worker productivity, both
workers and firms are units in the analysis. Similarly, hierarchical data arise
in the context of panel research, when multiple observations are available on
the ‘objects’ under study. Typically, these types of data are analyzed with
multilevel or hierarchical linear models. The model we consider is given by
yi j = X′i j β + Z′i γ + αi + ηi j , (6.1)
whereyi j is the dependent variable,Xi j ∈ Rk×1 are level-one or individual spe-
cific regressors,Zi ∈ Rl×1 contains level-two or group specific regressors2, ηi j
is a random (error) component with E(ηi j ) = 0 and var(ηi j ) = σ 2η , and where
i = 1, ...,n and j = 1, ..., ni . Throughout this chapter, matrices are printed in
1This chapter is published as Ebbes, Bockenholt and Wedel (2004).2Previously we denoted the matrix of instrumental variables byZ. Here we indicate the
matrix of instruments byV , see e.g. section 6.5.
145
![Page 160: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/160.jpg)
146 Chapter 6 Regressor and random-effects dependencies
capitals and scalars as lowercase. Greek symbols denote unobserved parame-
ters that are to be estimated. The unit–specific interceptαi may be specified to
be random (with E(αi ) = 0 and var(αi ) = σ 2α ) or fixed depending on the con-
text of the study and the types of inferences that can be drawn (Verbeek, 2000,
Judge et al., 1985, Wooldridge, 2002, Bryk and Raudenbush, 2002, Snijders
and Bosker, 1999).
In the modeling of hierarchical data structures it is frequently assumed that
the explanatory variablesX andZ are independent of the random (error) com-
ponents. If independence holds, the regressors are said to be ‘exogenous’ (or
determined outside the model). However, in many applications it is unrealistic
to assume that regressors and random components are independent. For the
model given in (6.1) we consider two types of independence:
1. Level-2 independence orXα– andZα–independence, and
2. level-1 independence orXη– andZη–independence.
This chapter shows that even in the presence of modest dependencies, regres-
sion effects can be biased substantially. Different approaches for testing the
independence assumption are presented and illustrated with the help of simu-
lation studies.
Importantly, the independence assumptions can be violated easily. Examples
include (1) relevant omitted variables (Card, 1999, 2001, Uusitalo, 1999, or
Spencer and Fielding, 1998a, 1998b), (2) measurement error in the regressors
(Plat, 1988, Bagozzi, Yi and Nassen, 1999, Wansbeek and Meijer, 2000, or
Carroll, Ruppert and Stefanski, 1995), (3) self-selection3 (Hamilton and Nick-
erson, 2003), (4) simultaneity (White, 2001, or Greene, 2000), and (5) seri-
ally correlated errors in the presence of lagged dependent variables (White,
3The problem of self-selection arises when individuals tend to select themselves in a certainstate – such as union vs. non-union member (e.g. Vella and Verbeek, 1998) and treated vs.not treated (e.g. Angrist, Imbens and Rubin, 1996) – on basis of economic or other, usuallyunknown, arguments.
![Page 161: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/161.jpg)
6.1 Introduction 147
2001, or Ruud, 2000). In the standard (single level) regression model, the or-
dinary least squares (OLS) estimator can be written asβOLSn = (X′X)−1X′y =
β + (X′X)−1X′ε, where E(ε) = 0. If the assumption of independence of
regressors and errors does not hold (i.e. when E(ε|X) 6= 0), it follows imme-
diately that the OLS estimator is biased. This bias can be reduced, at least in
large samples, by using instrumental variables estimation techniques (Bowden
and Turkington, 1984, White, 2001, or Wooldridge, 2002). Instrumental vari-
ables (IVs) should be uncorrelated with the error termε, and should explain
part of the variability in the endogenous regressorsX. Once instruments are
available, unbiased estimates for the regression parameters can be obtained.
Furthermore, Hausman-like tests (Hausman, 1978) can be used to test for re-
gressor error dependencies in this standard linear regression model. The gen-
eral idea of this approach is to compare two estimators, one that is consistent
under both the null hypothesis of regressor–error independence and the alter-
native hypothesis, and one that is only consistent under the null hypothesis.
The null hypothesis is rejected once a significant difference between these two
estimators is found (cf. Verbeek, 2000)4.
In multilevel models additional random components reflect the nesting struc-
ture in the data. Henceforth, an investigation of independence of explanatory
variables and random terms becomes even more important. Because of the po-
tentially severe consequences when these independence assumptions are vio-
lated, they need to be tested for explicitly in any application of multilevel mod-
els. The literature suggests performing the following diagnostic steps when
endogeneity is suspected, which serves as a roadmap to the remainder of this
paper. First, a diagnostic check to examineXα–independence is readily avail-
able for multilevel models based on the work by Hausman and Taylor (1981).
Fixed–effects (FE) estimation gives an unbiased estimate forβ in model (6.1)
regardless of violation ofXα–independence, whereas random–effects (RE) es-
timation yields biased estimates (see section 6.2). If the test, which is based on
the Hausman test (Hausman, 1978), proposed by Hausman and Taylor (1981)
4See Appendix 6A for a more detailed explanation of instrumental variables techniques andthe Hausman test.
![Page 162: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/162.jpg)
148 Chapter 6 Regressor and random-effects dependencies
(which we denote by theHα-test) does not reject the independence hypothesis,
both fixed– and random–effects estimation forβ can be used. Once rejected,
only fixed–effects estimation yields consistent results, provided the regressors
are independent of level-1 random components. We show how the inclusion of
group means can be used to examineXα–dependencies (Mundlak, 1978). We
present the Hausman-Taylor (HT) estimator as an alternative to fixed–effects
estimation, which is potentially more efficient and which, in contrast to the
fixed–effects estimator, can be used to estimate level-2 effects. TheHα-test,
Mundlak’s π approach, and the Hausman-Taylor estimator are discussed in
section 6.3.
However, these above steps should be considered with caution. As will be
shown in Section 6.4, the performance of theHα-test relies on the indepen-
dence of regressors and level-1 random components. Unfortunately, endogene-
ity at this level can often not be ruled out a priori. Although this type of endo-
geneity is often ignored, it is a crucial assumption in using standard multilevel
estimators. As a first diagnostic check for it, one should carefully consider
whether or not, based on theoretical grounds, level-1 independence can be as-
sumed. If not, IV estimation techniques can be adopted to estimate regression
parameters in model (6.1) (Bowden and Turkington, 1984, Woodridge, 2002).
Several different multilevel IV estimators can be derived to estimate the regres-
sion parameters in model (6.1), depending on the specific assumptions about
the exact form of the endogeneity problem (see appendix 6B). This approach
is illustrated and discussed in section 6.5. To test for level-1 independence,
another test based on the general approach of Hausman (1978) can be con-
structed. We will refer to this test as theHη-test and illustrate its usefulness in
section 6.5.
The diagnostics steps for investigating independence assumptions in two-level
multilevel models are presented in table 6.1. For the sake of simplicity, no
distinction is made between level-1 and level-2 regressors. Now three types
(cases (ii) – (iv)) of violations of regressor–error dependencies can be distin-
guished. This table specifies the various sections in which tests and estimators
![Page 163: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/163.jpg)
6.2 Biases caused by level-1 (Xα)– and level-2 (Xη)– dependencies 149
Table 6.1: Overview of diagnostic tests to determine independence between regres-sors and random effects in a linear two-level regression model, where ‘yes’ (‘no’)means that the specific independence assumption is (not) satisfied.
Case E Xα E Xη Section Table Test Estimators= 0 = 0
(i) Yes Yes 6.2, 6.5 6.2, 6.7 Hα or Mund- FE or RElak’s π , andHη
(ii) No Yes 6.2, 6.3 6.2, 6.3, Hα or Mund- FE or HT6.4 lak’sπ , andHη
(iii) Yes No 6.2, 6.4, 6.2, 6.5, Hη External IV6.5 6.6, 6.7
(iv) No No 6.2, 6.4, 6.2, 6.5, Hη External IV6.5 6.7
for each case are discussed in more detail.
6.2 Biases caused by level-1 (Xα)– and level-2 (Xη)–dependencies
The parametersβ in the multilevel model given in (6.1) can be estimated by
fixed- or random–effects methods (Verbeek, 2000, Baltagi, 2001, Goldstein,
1995, Longford, 1993). We do not discuss the estimators here, but details can
be found in appendix 6B. To illustrate the effects ofXα– and Xη– depen-
dencies under fixed–effects and random–effects estimation, consider table 6.2
which summarizes the simulation results for the model:
yi j = β0+ β1xi j + αi + ηi j , (6.2)
where i = 1, ...,150, j = 1, ...,10, αi ∼ N(0, σ 2α ) andηi j ∼ N(0, σ 2
η ),
and the following four cases are specified: (i)ρ(x, α) = ρ(x, η) = 0, (ii)
ρ(x, α) = 0.3 andρ(x, η) = 0, (iii) ρ(x, α) = 0 andρ(x, η) = 0.3, and
(iv) ρ(x, α) = ρ(x, η) = 0.3. The table presents means and standard devi-
![Page 164: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/164.jpg)
150 Chapter 6 Regressor and random-effects dependencies
Table 6.2: Results of simulation study to examine bias fixed–effects (FE) andrandom–effects (RE) estimator for level-1 and level-2 endogeneity. True values:β0 = 10,β1 = 2, σ 2
η = 1, andσ 2α = 1.
Case
(i) (ii) (iii) (iv)
FE β0 - - - -β1 1.99 (0.04) 2.00 (0.04) 2.43 (0.04) 2.42 (0.04)
RE β0 10.04 (0.21) 8.87 (0.29) 7.88 (0.20) 5.79 (0.30)β1 1.99 (0.04) 2.23 (0.05) 2.42 (0.04) 2.84 (0.06)σ 2α 1.01 (0.14) 0.10 (0.02) 0.99 (0.13) 0.00 (0.00)σ 2η 1.00 (0.04) 1.00 (0.04) 0.90 (0.03) 0.91 (0.04)
ations computed across 250 replications. As expected, both the fixed–effects
and random–effects estimator yield unbiased results forβ1 and unbiased esti-
mates for the variances when the regressor is truly exogenous (case (i)). The
fixed–effects estimator cannot estimate the constantβ0 (nor the effects of other
level-2 variables). Unbiased results for these parameters are obtained with the
random–effects estimator. Ifρ(x, α) = 0.3 andρ(x, η) = 0 (case (ii)), the
random–effects estimator is biased upward andσ 2α exhibits a severe downward
bias, but fixed–effects estimation is possible forβ1 and an unbiased estimate
for σ 2η can be obtained. Ifρ(x, α) = 0 andρ(x, η) = 0.3 (case (iii)), both
the fixed–effects and the random–effects estimator yield biased results for the
regression parameters and similar conclusions hold forσ 2η . However,σ 2
α can
be estimated consistently in this case. Finally, if all independence assumptions
are violated (case (iv)), it can be seen that both fixed–effects and random–
effects estimation yields biased results for the regression parameters. The bias
in the fixed–effects estimator for case (iv) is independent of the presence of
Xα–dependency. It can be seen that random–effects estimation yields an even
larger bias in this case. In all replications, the estimate ofσ 2α was negative, and
therefore set to 0. The bias in the estimate ofσ 2η is approximately equal to its
bias for case (iii).
![Page 165: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/165.jpg)
6.3 The case of level-2 (Xα) dependencies only 151
These results indicate clearly that one should consider carefully whether to use
random–effects estimation if there are reasons to assume that independence
may not hold. Even a moderate (positive) correlation betweenx andα in
model (6.2) induces in this case an (upward) bias of approximately 10% in the
random–effects estimator forβ1 and an approximately 90% downward bias
in σ 2α . Dependencies between the regressors andα can be accommodated by
using a fixed–effects estimation. However, failure to correct for dependencies
between the regressor andη leads to biases inboththe random–effects and the
fixed–effects estimator. A moderate positive correlation between the regressor
andη induces a significant upward bias in both the fixed– and random–effects
estimate forβ1. Finally, if the regressor is correlated withboth αi andηi j ,
the bias in the random–effects estimator for the regression parameters is even
larger and under case (iv) it would be concluded incorrectly that no random–
effects are present in the data. The following section focusses on the case when
only Xα– but noXη–dependencies are present.
6.3 The case of level-2 (Xα) dependencies only
6.3.1 Testing forXα–dependencies
In this section we first discuss two test statistics to examineXα–dependencies.
It is assumed that noXη–dependency is present. In case this type of depen-
dency cannot be rejected, we present and illustrate alternative estimators.
Hausman and Taylor (1981) show that the multilevel structure of the data and
the presence of a consistent estimator regardless of the correlation between
regressors andαi (but with X andη independent), facilitate tests for this type of
endogeneity in model (6.1) using the general idea of a Hausman test (Hausman,
1978). This Hausman test statistic can be computed as follows:
Hα = (βF E − βRE)′6−1(βF E − βRE), (6.3)
where6 is an estimate of the covariance matrix ofβF E − βRE and computed
ascov(βF E)− cov(βRE). The resulting test statisticHα can be shown to have a
![Page 166: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/166.jpg)
152 Chapter 6 Regressor and random-effects dependencies
chi-square distribution under the null hypothesis of independence ofX, Z and
αi . If the null hypothesis is rejected, the fixed–effects estimator should be used.
A great advantage of multilevel over single level applications is the possibility
to test for regressor-error disturbances of this type. This is not possible in
single–level applications, as there is no estimator that is consistent under both
the null hypothesis and alternative hypothesis when IVs are not available.
6.3.2 Mundlak’s approach for Xα–dependencies
One approach for investigating potential correlations betweenX and the ran-
dom–effectsαi is to model the dependence betweenαi and the regressors ex-
plicitly. Mundlak (1978) suggests the inclusion of group means by estimating
αi = Xiπ + ξi . Snijders and Bosker (1999) argue that the inclusion of group
means as explanatory variables in multilevel models can yield interesting sub-
stantive results. It can be shown that the test proposed by Hausman and Taylor
(1981) and Mundlak’s approach, are closely related, and, in fact, yield nu-
merically identical results (Baltagi, 2001, p.65-72). Modeling this dependence
explicitly allows for unbiased random–effects estimation forβ, regardless of
whetherX andα are independent or not. This approach is attractive when
fixed–effects estimation is undesirable, butXi andαi cannot be assumed inde-
pendent. However, this procedure does not yield unbiased estimates for level-2
effects/parameters (γ andσ 2α ).
These methods are illustrated in table 6.3 where we present the results for the
Hα-test and Mundlak’s approach. The data were simulated according to the
same design as in the previous simulation study (model (6.2)). It can be seen
that when there is no regressor-error dependency (case (i)), the proportion of
replications in which theHα-test rejects the null hypothesis is very close to the
nominal P–value of 5%. With a correlation betweenx andα of 0.3, the null
hypothesis of no level-2 (Xα) dependency is rejected in all replications. The
same conclusions follow from Mundlak’sπ , which is significantly different
from zero forρx,α = 0.3 but not forρx,α = 0. Furthermore, random–effects
estimation in Mundlak’s model allows for unbiased estimates of the level-1
predictor, but the constant (and other potential level-2 predictors) cannot be
![Page 167: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/167.jpg)
6.3 The case of level-2 (Xα) dependencies only 153
Table 6.3: ResultsHα-test and Mundlak’s approach (αi = π xi . + ξi ). True values:β0 = 10,β1 = 2, σ 2
η = 1, andσ 2α = 1.
Case
(i) (ii)
Hα-test 4 % 100 %Mundlak β0 9.90 (1.85) -11.17 (0.86)
β1 1.99 (0.04) 2.00 (0.04)π 0.03 (0.38) 4.23 (0.18)σ 2α 1.01 (0.14) 0.10 (0.02)σ 2η 1.00 (0.04) 1.00 (0.04)
estimated unbiasedly. The same holds for the varianceσ 2α , butσ 2
η can be esti-
mated unbiasedly. In the next section we present a more satisfying solution to
the problem whenXα–dependencies, but noXη–dependencies, are present5.
6.3.3 The Hausman-Taylor estimator underXα–dependencies
Although Mundlak’s approach allows for random–effects estimation, no un-
biased results can be obtained for the level-2 (group-specific) variables. As
a solution, Hausman and Taylor (1981) suggested an estimator that consis-
tently and efficiently estimates both level-1 and level-2 parameters. It re-
quires a priori knowledge about which of the level-1 and level-2 regressors
are uncorrelated with the random components. LetXi j = [X1i j : X2i j ] and
Zi = [Z1i : Z2i ], where the variables in setsX1 and Z1 are assumed to be
uncorrelated withαi and all regressors are assumed to be independent ofηi j .
The idea is thatX1i j andZ1i serve as their own instruments;X2i j − X2i can be
used as instruments forX2i j (as in the fixed–effects approach), andX1i serves
as instrument forZ2i . To identify all the regression parameters, the number
5Manchanda, Rossi and Chintagunta (2004) apply a ‘generalization’ of models developedby Chamberlain (1980, 1984), that are related to Mundlak’s approach. Their method appliesto situations where the levels of marketing mix variables are chosen with potential knowledgeof the sales response parameters, and the regressors are potentially correlated to the randomeffects of all the market response model parameters. They obtain results for the effect of salescalls made to physicians on their prescription behavior for specific drugs.
![Page 168: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/168.jpg)
154 Chapter 6 Regressor and random-effects dependencies
Table 6.4: Results Hausman-Taylor (HT) estimator forρx2,α = 0.3 andρz2,α = 0.3,but no level-1 dependencies (case (ii)). True values:β0 = 10,β1 = β2 = γ1 = γ2 =2, σ 2
η = 1, andσ 2α = 1.
FE RE HT
β0 – 9.25 (0.34) 10.06 (0.97)β1 2.00 (0.05) 1.59 (0.08) 2.00 (0.05)β2 2.00 (0.04) 2.38 (0.07) 2.00 (0.04)γ1 – 1.22 (0.17) 2.02 (0.44)γ2 – 2.40 (0.11) 1.97 (0.44)σ 2η 1.00 (0.04) 1.00 (0.04) 1.00 (0.04)σ 2α – 0.01 (0.01) 1.13 (0.36)
of variables contained in setX1 needs to be at least as large as the number of
variables in setZ2. An attractive feature of the Hausman-Taylor estimator is
that no external instruments (i.e. variables that are not included in the main
regression equation) are needed, as this estimator constructs instruments from
available data (‘internal’ instruments). More recent studies suggest modifica-
tions (to improve efficiency) of the Hausman-Taylor estimator, see Arellano
and Bover (1995).
Table 6.4 illustrates the Hausman-Taylor estimator. The previously considered
model to generate the data is extended as follows:
yi j = β0+ β1x1i j + β2x2i j + γ1z1i + γ2z2i + αi + ηi j (6.4)
for i = 1, ...,150 andj = 1, ...,10. We specifyx1 andz1 to be independent
of the random components.x2 andz2 are related toα (ρx2,α= ρz2,α
= 0.3),
but independent ofηi j (i.e. case (ii)). Table 6.4 contains the means and stan-
dard deviations of the estimated parameters computed across 250 simulation
replications. As can be seen, the fixed–effects (FE) estimator yields consistent
results for level-1 effects, but no estimator for level-2 effects can be obtained.
The random–effects (RE) estimator yields biased results for all regression pa-
rameters andσ 2α , which is in agreement with the results in table 6.2. The
![Page 169: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/169.jpg)
6.4 Limitations in the presence of level-1 (Xη)– dependencies 155
Hausman-Taylor estimator uses the additional information thatx1 andz1 are
exogenous. These ‘internal’ instruments can be used to estimate the effects of
all regression parameters consistently. Furthermore, an approximately unbi-
ased estimate forσ 2α can be obtained. In all cases,σ 2
η can be estimated unbias-
edly.
The Hausman-Taylor estimator is very powerful as it does not require external
instruments. We agree with Verbeek (2000) that despite this obvious advan-
tage, the method has played a surprisingly minor role in empirical work. In
practice one does not know which X and Z’s are independent of theα, but it is
possible to test for this assumption (Hausman and Taylor, 1981).
In this section we assumed independence of regressors andηi j . Unfortunately,
the methods presented in this section become unreliable and yield incorrect
conclusions in the presence ofXη–dependencies. Similar observations were
made for the fixed– and random–effects estimators in section 6.2. This is illus-
trated and discussed in the following section.
6.4 Limitations in the presence of level-1 (Xη)– depen-dencies
This section considers two problems in using the methods discussed so far.
First, as noted when discussing the results in table 6.2, both random–effects
and fixed–effects estimation fails when endogeneity arises from level-1 depen-
dencies (case (iii) and (iv)). Second, although successful in testing and solving
for Xα–dependencies, we will show that theHα-test, Mundlak’s approach, and
the Hausman-Taylor estimator also break down in this case.
In section 6.2 we illustrated the consequences of using the fixed–effects and
the random–effects estimator when regressors are correlated with the lowest
level error termηi j . It was illustrated that even a small correlation between
x andη in model (6.2) induced biases in both the fixed– and random–effects
estimators. Similar limitations apply to theHα-test and Mundlak’s approach
![Page 170: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/170.jpg)
156 Chapter 6 Regressor and random-effects dependencies
Table 6.5: ResultsHα test and Mundlak’s approach (αi = π xi . + ξi ). True values:β0 = 10,β1 = 2, σ 2
η = 1, andσ 2α = 1.
Case
(iii) (iv)
Hα-test 6 % 100 %Mundlak β0 8.00 (1.95) -13.34 (0.20)
β1 2.42 (0.04) 2.42 (0.04)π -0.03 (0.39) 4.25 (0.05)σ 2α 0.99 (0.13) 0.00 (0.00)σ 2η 0.90 (0.03) 0.91 (0.04)
discussed in subsections 6.3.1 and 6.3.2. Based on model (6.2), the simulation
results in table 6.5 illustrate this situation. First, it can be seen that a situation
with endogeneity at the first level (Xη–dependency) but noXα–dependency,
cannot be detected by theHα-test and Mundlak’s approach (case (iii)). This
is not surprising, as the test is not designed for investigating this hypothesis.
However, the estimates for bothβ1 andσ 2η are still significantly biased due to
Xη–dependencies. Researchers who are not aware of potential endogeneity
problems at the first level may incorrectly conclude from these tests that either
fixed– or random–effects estimation can be used, although, in fact, both meth-
ods yield biased results. WhenXα–dependencies andXη–dependencies are
present (case (iv)), theHα-test and Mundlak’sπ diagnose theXα–dependency.
Given thatXα–dependency is detected, one should now use fixed–effects es-
timation (or the Hausman-Taylor estimator). However, it was seen in table 6.2
that in the presence ofXη–dependencies fixed–effects estimates forβ are bi-
ased. The researcher in this case correctly concludes thatXα–dependencies
are present, but misses theXη–dependencies and, henceforth, still uses biased
estimates.
The same fallacious conclusion follows from the Hausman-Taylor estimator
based on internal instrumental variables, as can be seen from table 6.6. These
results are based on model (6.4), wherex1 and z1 are specified to be inde-
![Page 171: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/171.jpg)
6.4 Limitations in the presence of level-1 (Xη)– dependencies 157
Table 6.6: Results Hausman-Taylor (HT) estimator forρx2,η = 0.3 andρz2,η = 0.3.True values:β0 = 10,β1 = β2 = γ1 = γ2 = 2, σ 2
η = 1, andσ 2α = 1.
FE RE HT
β0 – 8.48 (0.54) 9.64 (1.27)
β1 1.58 (0.06) 1.57 (0.06) 1.58 (0.06)
β2 2.21 (0.03) 2.21 (0.02) 2.21 (0.03)
γ1 – 1.59 (0.13) 1.78 (0.23)γ2 – 2.19 (0.08) 1.99 (0.21)σ 2η 0.95 (0.04) 0.95 (0.04) 0.95 (0.04)σ 2α – 0.95 (0.13) 1.06 (0.19)
Hα-test — 15% —
pendent of all random components, butx2 andz2 are correlated withηi j (but
not with αi ). The table shows that both the fixed–effects and the random–
effects estimators yield biased results. TheHα-test does not diagnose this type
of endogeneity and rejects the null hypothesis in 15% of the cases, when the
nominal rate is at 5%. Thus, importantly, this test indicates too often that there
is a Xα–dependency while in fact there is none, as the dependency is caused
by correlation betweenX andη. The Hausman-Taylor estimator is also biased
in general, but becausex1 is truly exogenous it is a valid instrument forz2, and
the Hausman-Taylor estimate forγ2 is unbiased. The bias in the estimate for
σ 2η is small, as is the one observed forσ 2
α .
We conclude that when endogenous regressors are present at the lowest level of
the hierarchical model, caused by correlations betweenX andη, all available
tests and estimators presented in section 6.3 yield invalid inferences. In the
next section we discuss possible solutions to this problem.
![Page 172: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/172.jpg)
158 Chapter 6 Regressor and random-effects dependencies
6.5 Testing and solving forXη–dependencies
6.5.1 External Instruments
We consider potential remedies to the situation whereXη–dependencies are
present in the form of ‘classical’ IV methods. These methods are similar to
the Hausman-Taylor estimator, but require the availability of ‘external’ instru-
ments.
External instrumental variables are desirable for unbiased and consistent es-
timation whenXη–dependencies are present in the data (Bowden and Turk-
ington, 1984, Wooldridge, 2002). The main ideas behind these estimators are
similar to the ones of classical IV estimators developed for cross–sectional sit-
uations, with an additional step to account for nonspherical disturbances due
to the hierarchical structure (Wooldridge, 2002). Two multilevel IV estimators
that yield unbiased estimation of the parameters in model (6.1) in the presence
of Xη–dependencies are the (multilevel) two– and three–stage least squares
(SLS) estimators (see appendix 6B), where the latter estimator takes the ran-
dom error component structure into account yielding a potential more efficient
estimator (Im et al., 1999, Wooldridge, 2002, Bowden and Turkington, 1984).
In the following, we will use the multilevel 2SLS estimator to illustrate the
usefulness of external IV estimators when theXη–independence assumption
is violated. We also show how this estimate can be used to construct another
Hausman-based test (Hη-test) to test forXη–independencies. The results of
this test can be used to decide whether fixed–effects, random–effects or the
Hausman-Taylor estimator, or multilevel (external) IV estimators should be
used.
Using model (6.2) we illustrate the multilevel IV estimator with one level-1
instrument. The endogenous regressor is now simulated asxi j = c+ vi j +φi j ,
whereρφ,η = 0.3, c is a constant, andvi j is the instrument generated inde-
pendent of all error terms. In addition, a Hausman–based test is computed
that compares the multilevel IV estimate for(β0, β1) with the random–effects
estimate for(β0, β1) (or the fixed–effects estimate forβ1). The results are pre-
![Page 173: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/173.jpg)
6.5 Testing and solving forXη–dependencies 159
Table 6.7: Results multilevel IV for case (iii) and case (iv) violations. True values:β0 = 10,β1 = 2, σ 2
η = 1, andσ 2α = 1.
Case(i) (iii) (iv)
Hη-test 3.2% 96.4 % 100 %Multilevel IV β0 10.00 (0.21) 9.98 (0.14) 9.98 (0.20)
β1 1.99 (0.07) 2.00 (0.07) 2.01 (0.07)σ 2ε 1.99 (0.12) 1.98 (0.14) 1.99 (0.15)
sented in table 6.7. Note that in table 6.7 we estimateσ 2ε , which is the variance
of εi j = αi + ηi j , from the residuals computed from the IV regression. The
table shows that once valid external instruments are available, we obtain ap-
proximately unbiased estimates for the model parameters. Furthermore, these
estimates are unbiased regardless ofXα–independence (case (iii) vs. (iv)). The
Hη-test based on these estimates detects both case (iii) and case (iv) endogene-
ity, indicating that the multilevel (external) IV estimators should be used. A
disadvantage of this method is that it is less efficient than fixed– and random–
effects estimators. Furthermore, valid instruments that have no direct effect
on y and explain a substantial part of the variance inx, have to be available,
which is often a limitation in empirical work.
Although external IVs can be useful for dealing withXη–dependencies, it
should be noted that IV estimators can be seriously biased in small samples
and may exhibit poor asymptotic properties when weak instruments are ap-
plied. An instrument is said to be ‘weak’ when it explains none or only a small
part of the variance in the endogenous regressor (i.e. it is only weakly corre-
lated). There is a considerable literature that investigates the potential pitfalls
in IV estimation when weak instruments are used and several recommenda-
tions to deal with these problems are made (Staiger and Stock, 1997, Bound,
Jaeger and Baker, 1995, Nelson and Startz, 1990, and Kleibergen and Zivot,
2003).
![Page 174: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/174.jpg)
160 Chapter 6 Regressor and random-effects dependencies
To address the problem of weak instruments, Hahn and Hausman (2002) re-
cently developed a test for the validity of instruments. Their approach is also
based on the general Hausman specification test approach (Hausman, 1978).
The test statistic is fairly simple to compute and is shown to have at distribu-
tion under the null hypothesis. Rejection of the null hypothesis might indicate
a failure of the orthogonality assumption of the instruments or that the instru-
ments are weak. Hanh and Hausman (2002) suggest a two step approach,
based on this test, to decide which IV estimator, or none, should be used. This
approach may provide a helpful guide in guarding against weak instruments.
Furthermore, it is relatively straightforward to use and it could prevent the re-
searcher from relying on results obtained with weak instruments.
However, although IV methods are attractive in theory, they can be difficult to
apply in practice because it may prove difficult to locate ‘good’ IVs as indi-
cated by the Hanh and Hausman test. As a possible solution, we next consider
Lewbel’s (1997) method for computing instrumental variables from the data at
hand and demonstrate that this method could potentially be extended to multi-
level models with generalXη–dependencies.
6.5.2 Internal instruments: Lewbel’s approach
Lewbel (1997) provides a method for constructing internal instruments when
Xη–dependencies exists. This approach has been proposed originally in the
context of measurement error models, but we argue that it is also useful in
the context of general correlated-regressor error (see also appendix 6C). To
the best of our knowledge, the issue of constructing internal instruments from
available data in multilevel models whereXη–dependencies are present, has
not been addressed before. Lewbel’s (1997) idea is based on the observa-
tion that when the endogenous regressor in model (6.2) has a skewed distribu-
tion, the following transformations of the available data may yield valid instru-
ments:
![Page 175: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/175.jpg)
6.6 Discussion and future research 161
v1i j = (yi j − y)(xi j − x)
v2i j = (yi j − y)2
v3i j = (xi j − x)2 (6.5)
The results in table 6.8 illustrate the internal instrumental variable approach for
model (6.2) and compare it with the external instrumental variable approach in
the previous subsection. The same simulation data as in table 6.7 was used,
where the endogenous regressor was generated asxi j = c + vi j + φi j , with
ρφ,η = 0.3 andvi j is the exogenous instrument. We compare Lewbel’s ap-
proach to the benchmark, where we assume that thevi j are observed instru-
ments. Thus the results from the multilevel IV estimator in table 6.8 are the
same as in table 6.7, and were obtained by usingvi j as ‘observed’ instruments,
whereas the Lewbel approach uses the constructed instruments in (6.5) instead.
Table 6.8 indicates that the Lewbel IVs may yield approximately unbiased re-
sults. Using these IVs is less efficient than using the true observed IVs, which
is not surprising as the former uses less information. Nevertheless, the Lewbel
approach appears to be quite promising since it provides a method to con-
struct instruments from the available data. These instruments can either be
used alone, or to augment a set of existing instruments in order to improve
efficiency.
6.6 Discussion and future research
Although the previous discussion may suggest that regressor and random com-
ponents dependencies can be adequately addressed in multilevel models, much
care is required in using these methods in actual applications. First, the estima-
tion methods and test procedures to solve and test forXα–dependencies rely
critically on the independence ofX andη. Second, methods that rely on IVs
are known to be biased in small samples and standard asymptotic results break
down when instruments are weak (i.e. they are poorly correlated withX). This
holds in particular for the IV-based methods to solve forXη–dependencies and
![Page 176: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/176.jpg)
162 Chapter 6 Regressor and random-effects dependencies
Table 6.8: Results multilevel and Lewbel’s internal IVs for cases (iii) and (iv) com-pared with (i). True values:β0 = 10,β1 = 2, andσ 2
α = σ 2η = 1.
Case
(i) (iii) (iv)
Multilevel IV β0 10.00 (0.21) 9.98 (0.14) 9.98 (0.20)β1 1.99 (0.07) 2.00 (0.07) 2.01 (0.07)σ 2ε 1.99 (0.12) 1.98 (0.14) 1.99 (0.15)
Lewbel IV β0 10.05 (0.75) 9.86 (0.61) 9.81 (0.58)β1 1.98 (0.28) 2.05 (0.23) 2.07 (0.22)σ 2ε 2.04 (0.16) 2.00 (0.17) 1.97 (0.21)
for the Hausman-Taylor estimator to solve forXα–dependencies.
Although the issues about the validity and the number of instrumental variables
have primarily been investigated in cross-sectional applications, it is clear that
they are relevant for multilevel applications as well. For instance, when for
the simulation study in table 6.4 the instrumentx1i . is weakly correlated with
the endogenous regressorz2 asz2i = 0.01× x1i . + 0.01× z1i + ζi , whereζi
is a random component correlated withαi , and with all other input parame-
ters unchanged, the Hausman-Taylor estimator yieldsγ2 = 3.45(21.72) and
σ 2α = 238.01(2611.22). Similar observations can be made for the ‘exter-
nal’ multilevel IV estimates concerning table 6.7. To deal with these prob-
lems, Bound, Jaeger and Baker (1995) suggest that theR2 or the F statis-
tic of the regression of the endogenous regressors on the instruments serve as
rough guides to the quality of the instruments and should routinely be reported.
The Hahn and Hausman (2002) test or the method suggested by Donald and
Newey (2001) to choose the number of instruments could potentially be ex-
tended to serve as a guide for identifying and selecting ‘valid’ instruments for
the Hausman-Taylor estimator or multilevel IV estimators.
Further, it is often suggested in cross-sectional applications to use the ‘limited
information maximum likelihood’ (LIML) estimator instead of least squares
![Page 177: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/177.jpg)
6.6 Discussion and future research 163
estimators, since it is found to be less sensitive to weak instruments (e.g.
Davidson and MacKinnon, 1993, Staiger and Stock, 1997). To the best of
our knowledge, this issue has not been addressed for multilevel models, but
perhaps it should be because it may lead to improved results for the Hausman-
Taylor estimator or the multilevel IV estimators discussed in section 6.5.
Finally, the Lewbel approach has been shown to yield consistent results for
a simple multilevel model withXη–dependency. This method deserves more
attention and could potentially be powerful in situations where no or weak in-
struments are available. The performance of this method depends critically on
its underlying assumptions as is shown in Lewbel (1997) and Wansbeek and
Meijer (2000). Most importantly, the method may be sensitive to outliers as
it relies on third order moments. Furthermore, the constructed instruments are
weak when the distribution of the endogenous regressor is not strongly skewed.
It is well known that in this case IV estimators can be seriously biased (Staiger
and Stock, 1997, Bound, Jaeger and Baker, 1995, Wansbeek and Meijer, 2000).
As a result, additional work is needed to determine the exact conditions under
which this approach can be used effectively in multilevel applications.
In some applications where endogeneity arises, however, the nature of the data
generating process itself suggests suitable instruments. This holds in partic-
ular for measurement error models, autoregressive models, and simultaneous
equation models. Possible approaches for measurement error models are dis-
cussed by Wansbeek and Meijer (2000), Carroll, Ruppert and Stefanski (1995),
or Bowden and Turkington (1984). These models can be estimated using IV
techniques, for instance by using other (potentially) mismeasured variables
(see White, 2001). Another method is based on Wald (1940), which assumes
that the observations can be divided into groups. This classification should
be independent of the error terms and discriminate between high and low val-
ues of the unobservable true construct (see also Madansky, 1959). Lewbel’s
(1997) idea presented in subsection 6.5.2 was originally proposed to solve for
measurement error problems. We showed however that that approach can be
fruitfully applied in the analysis of the general IV problems as well and de-
![Page 178: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/178.jpg)
164 Chapter 6 Regressor and random-effects dependencies
serves more attention. In autoregressive models one can often use lagged de-
pendent or independent variables as instruments (see for instance White, 2001,
or Wooldridge, 2002). Similarly, in simultaneous equations models instru-
ments for each equation can be obtained from the set of excluded exogenous
variables for that equation (e.g. Greene, 2000).
Our discussion of various methods did not address estimation methods in (gen-
eral) random coefficient and non-linear models (like probit- or logit models)
having endogenous regressors. In both cases, however, a similar reasoning ap-
plies as for linear (random intercept) models. Bowden and Turkington (1984)
discuss IV approaches for nonlinear models with additive disturbances (i.e.
y = g(θ, x)+ ε). Techniques developed for linear models, in particular (gen-
eralized) method of moments (GMM) techniques, can be used to estimate these
models. Blundell and Powell (2001a,b) investigate endogeneity issues in sev-
eral generalizations of the linear model. These authors discuss the extent to
which commonly used methods in linear models can be applied to the general-
ized models and show that the methods’ applicability depends on the structural
form.
Random coefficient models assume that differences between the level-2 ob-
jects are not only reflected by different intercepts as in model (6.1) but also by
different slope coefficients. These models can be written asyi = Xiβi + ηi ,
whereβi = β + µβ,i , with µβ,i a random component having mean zero,
E (µβ,iµ′β,i ) = 1 and E(µβ,iµ
′β, j ) = 0 for i = 1, ...,n, and j 6= i . As
for random intercept models, the question whether to use a fixed–effects ap-
proach (in fact a seemingly unrelated regression framework), or a random–
effects approach (a random–effects framework), depends on potential correla-
tion between the random coefficients and the explanatory variables. If depen-
dencies are present, which is sometimes referred to as ‘heterogeneity bias’, the
random–effects estimator ofβ is biased and a fixed–effects approach should be
used. Pudney (1978) provides a test for the null hypothesis that the explanatory
variables are not correlated with the random coefficients. This test is based on
the sample covariance between the (standard) least squares estimators forβi
![Page 179: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/179.jpg)
6.6 Discussion and future research 165
and the means of the explanatory variables for each individual (see also Cham-
berlain, 1982).
In general, we conclude that much needs to be done before problems of en-
dogeneity in multilevel models can be adequately addressed. We showed
that even small violations of the independence structure result in biased es-
timates for parameters of interest. In table 6.1 we distinguished four cases of
(in)dependence relations among the level-1 regressors and the random com-
ponents. No distinction was made between level-1 and level-2 regressors. If
this distinction is introduced, 15 instead of three possible cases of violations
of the independence assumptions emerge. Each of these combinations could
lead to different biases in the estimators discussed in this chapter. Although it
is possible to apply the methods presented here to address the various cases,
detailed studies are necessary to assess their performance in practice. Clearly,
endogeneity problems require much more attention than they receive in current
applications of multilevel models.
In the next chapter we introduce a nonparametric Bayesian latent instrumental
variables approach to estimate models with regressor-error dependencies at
various stages of the model. The nonparametric Bayes model can be applied
to multilevel models and does not impose restrictions on the distribution of
the latent instrument. Henceforth, it generalizes the LIV model introduced in
chapters 3 and 4.
![Page 180: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/180.jpg)
166 Chapter 6 Regressor and random-effects dependencies
Appendix 6A Classical instrumental variables (IV) es-timation
The (single level) standard linear regression model forn observations is given by
y = Xβ + ε, (6A.1)
whereX ∈ Rn×k are the regressors,ε = (ε1, ..., εn)′ are the (unobserved) and iden-tically independently distributed errors with mean 0 and varianceσ 2
ε In, and y is ann × 1 vector of dependent variables. The ordinary least squares (OLS) estimator isBLUE and is given byβOLS
n = (X′X)−1X′y. If E (ε|X) = 0, βOLSn is unbiased (e.g.
White, 2001).
In large samples, instrumental variables techniques can be used when this assumptionis not met. Instrumental variables (IVs), collected in matrixV ∈ Rn×m, should beuncorrelated with the error termε, i.e. E(ε|V) = 0, meaning that the instrumentscannot have a direct effect ony (external instruments). Furthermore, the instrumentsshould explain part of the variability in the endogenous regressors. Once instrumentsare available (andm ≥ k), two-stage least squares techniques, for example, can beused to obtain better estimates ofβ. The ‘classical’ IV estimator for model (6A.1)is computed asβ IV
n = (X′PV X)−1X′PV y, wherePV = V(V ′V)−1V ′ (Bowden andTurkington, 1984, White, 2001, or Greene, 2000).
When ‘valid’ instruments are available, a Hausman test (Hausman, 1978) can be usedto test for regressor error dependencies in model (6A.1). UnderH0 : E (ε|X) = 0,the Hausman test-statistic computed asH = (β IV
n − βOLSn )′6−1(β IV
n − βOLSn ), where
6 = Var(β IVn ) − Var(βOLS
n ), has aχ2 distribution. For a more detailed discussion onhow to obtain an estimate for6 and how to determine the degrees of freedom (d.f.),see Greene (2000).
Appendix 6B Estimation for the hierarchical linearmodel
The parametersβ in the multilevel model given in (6.1), can be estimated by eitherfixed-effects (assumeαi to be fixed parameters fori = 1, ...,n) or random–effects(assume theαi to be drawn from a distribution) methods. The fixed–effects estimator,also known as the within-groups– or the covariance–estimator, forβ can be computedas a simple regression on the transformed equation (6B.1) which is obtained by aver-aging (6.1) acrossj for everyi , and subtracting the result from (6.1), resulting in
yi j − yi = (Xi j − Xi )β + (ηi j − ηi ), (6B.1)
![Page 181: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/181.jpg)
Appendix 6B Estimation for the hierarchical linear model 167
where yi = (1/n j )∑
j yi j and similarly for Xi and ηi . Now αi and Zi γ dropout, and thusγ is not identifiable from (6B.1). An alternative would be to replaceall group variables by dummy variables and applying OLS on the equationyi j =∑
i αi di j + Xi j β + ηi j , wheredi j = 1 if i = j and 0 otherwise. The resulting es-timator forβ is known as the least squares dummy variable (LSDV) estimator and isexactly identical to the fixed–effects estimator forβ from (6B.1). For consistent andunbiased estimation in (6B.1) the ordinary least squares (OLS) estimator can be used,if the constructed regressorsXi j − Xi are independent of the constructed errorηi j −ηi .This implies that E(Xi j ηi l ) = 0 for all i, j, l .
The random–effects estimator provides an important alternative under the assumptionthat theαi ’s are i.i.d. random variables. Nowεi j = αi + ηi j in (6.1) is the composite(random) error term. The OLS estimator forβ andγ is consistent and unbiased, butnot fully efficient. Combining all observations we can rewrite model (6.1) as
y = Xβ + Zγ + ε = Wδ + ε, (6B.2)
whereW = [X : Z] andδ = (β ′, γ ′)′, and the other symbols defined accordingly tostacking. For known�, where� = Var(εi ), εi = (εi 1, ..., εin j )
′, the generalized least-
squares (GLS) estimator forβ andγ , given byδGLSn = (W′(In ⊗�−1)W)−1W′(In ⊗
�−1)y is efficient. However, when� is not known, it needs to be estimated, yieldinga feasible GLS estimator. A feasible GLS estimator can be obtained in several ways.We use the method explained in Verbeek (2000) (p.317). The GLS estimator is shownto be equal to a weighted average between the fixed–effects estimator computed from(6B.1) and the –so-called– between estimator, which is the OLS estimator in the model
yi = Xiβ + Zi γ + αi + ηi (6B.3)
for i = 1, ..., n. The latter estimator ignores the within–group information and ex-ploits only differences between groups. For more details on the computation of theweighting matrix, see Verbeek (2000), Hsiao (1986), or Baltagi (2001). Severalother random–effects estimation procedures for model (6.1) are available which in-clude the iterative GLS (IGLS) approach, (restricted) maximum likelihood (REML),or Bayesian procedures (see Goldstein, 1995, or Longford, 1993).
From standard OLS results, it follows that the between estimator forβ andγ from(6B.3) is consistent and unbiased when the constructed regressorsXi and Zi are in-dependent ofαi and ηi . The fixed–effects estimator from (6B.1) is consistent andunbiased when E(Xi j ηi l ) = 0 for all i, j, l . If both conditions hold, the random–effects estimator forβ andγ is consistent and unbiased.
In the simulations studies, the variance forηi j was estimated as the sum of the squaredresiduals from model (6B.1) divided byn(m−1)− (k+ l ), wheren j = m for all j in
![Page 182: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/182.jpg)
168 Chapter 6 Regressor and random-effects dependencies
our case. The variance forαi is estimated asσ 2α = σ 2
B − 1mσ
2η , whereσ 2
B is estimatedfrom the squared residuals divided byn from (6B.3).
Multilevel instrumental variables estimators
Two IV estimators that yield unbiased estimation of the parameters in model (6B.2)in the presence ofXη–dependencies, are the multilevel 2SLS estimator, given by
δ2SLSn = (W′PVW)−1W′PV y, (6B.4)
wherePV = V(V ′V)−1V , and the multilevel 3SLS estimator, given by
δ3SLSn = (W′ PVW)−1W′ PV y, (6B.5)
with PV = V(V ′(In ⊗ �)V)−1V ′ and where� can be estimated from the residualsfrom a 2SLS estimation. As in appendix 6A,V is a set of (external) instruments. Formore details, see Wooldridge (2002), Im et al. (1999), or Bowden and Turkington(1984).
Appendix 6C Lewbel’s instruments in a simple multi-level model
In this appendix we extend Lewbel’s method (Lewbel, 1997) to a non-measurementerror, multilevel application. We argue that, under certain conditions, instruments canbe constructed from the observed data that can be used with, for instance, two-stageleast squares to obtain a consistent estimate for the regression parameters. We applythe results derived here in subsection 6.5.2. Consider the following multilevel model:
yi j = β0+ β1xi j + αi + ηi j (6C.1)
xi j = θi j + νi j , (6C.2)
where i = 1, ..., n, j = 1, ...,m, αi ∼ N(0, σ 2α ), ηi j ∼ N(0, σ 2
η ), and νi j ∼N(0, σ 2
ν ). The unobserved componentθi j has meanµθ , varianceσ 2θ , and is indepen-
dent of theαi , ηi j , andνi j . Furthermore, we assume that Eηi j αi = 0. The regressorxi j cannot be assumed independent of the random components, i.e. Eνi j ηi j = σνη 6=0 and/or Eνi j αi = σνα 6= 0,∀i, j . Let
yi j = yi j − E yi j
xi j = xi j − E xi j (6C.3)
θi j = θi j − E θi j ,
![Page 183: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/183.jpg)
Appendix 6C Lewbel’s instruments in a simple multilevel model 169
and consider the following transformation of the model given in (6C.1) and (6C.2)(i.e. subtract Eyi j and Exi j from (6C.1) and (6C.2)):
yi j = β1xi j + αi + ηi j (6C.4)
xi j = θi j + νi j . (6C.5)
The reduced form is given by
yi j = β1θi j + β1νi j + αi + ηi j (6C.6)
xi j = θi j + νi j . (6C.7)
OLS (or GLS) estimation in (6C.4) fails asx is not independent of the random com-ponents. Lewbel proposes the following instrumental variables:
z1i j = xi j yi j (6C.8)
z2i j = y2i j (6C.9)
z3i j = x2i j (6C.10)
These IVs should satisfy the following conditions:
1. E zsi jαi = 0;
2. E zsi jηi j = 0;
3. E zsi j xi j 6= 0,
with s= 1, 2,3. In the following we check these conditions fors= 1.
Proof: E z1i j αi = 0
We have
E{z1i j αi
} = E{xi j yi j αi
} =E{β1θ
2i j αi + β1θi j νi j αi + θi j α2
i + θi j ηi j αi + β1νi j θi j αi++β1ν
2i j αi + νi j α
2i + νi j ηi j αi
}.
(6C.11)
Sinceθi j is independent ofαi , ηi j , andνi j , so areθi j and θ2i j . Hence, the expected
value ofθi j andθ2i j multiplied withαi , ηi j , or νi j is zero. Thus we arrive at
E{z1i j αi
} = E{β1ν
2i j αi + νi j α
2i + νi j ηi j αi
}. (6C.12)
![Page 184: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/184.jpg)
170 Chapter 6 Regressor and random-effects dependencies
Using the law of iterated expectations, E(x) = E {E(x|y)}, and the standard resultfor bivariate normally distributed variables thatL(y|x) = N(γ0 + γ1x, σ 2
y (1− ρ2))
with γ0 = µy − γ1µx, γ1 = σxy/σ2x , it follows that the expectation in (6C.12)
is zero. For instance E{β1ν2i j αi } = E {E (β1ν
2i j αi |νi j )} = E {β1ν
2i j E (αi |νi j )} =
E {β1ν2i j (σαν/σ
2ν νi j )} = E {cν3
i j } = 0, since the third moment of a normal dis-tributed random variable is zero. For E{νi j ηi j αi } use e.g. E{E (νi j ηi j αi |νi j , ηi j )} =E {νi j ηi j E (αi |νi j )} sinceαi andηi j are independent etc..
Proof: E z1i j ηi j = 0
Similar arguments as in the previous case yields
E{z1i j ηi j
} = E{xi j yi j ηi j
} =E{β1θ
2i j ηi j + β1θi j νi j ηi j + θi j ηi j αi + θi j η2
i j + β1νi j θi j ηi j++β1ν
2i j ηi j + νi j αi ηi j + νi j η
2i j
}= 0
(6C.13)
Proof: E z1i j xi j 6= 0
Finally, for the third condition we get
E{z1i j xi j
} = E{
yi j x2i j
}=
= E{β1θ
3i j + β1νi j θ
2i j + αi θ
2i j + ηi j θ
2i j + 2β1θ
2i j νi j + 2β1ν
2i j θi j+
+2θi j νi j αi + 2θi j νi j ηi j + β1ν2i j θi j + β1ν
3i j + ν2
i j αi + ηi j ν2i j
}=
= β1E θ3i j ,
(6C.14)
which is unequal to zero as long as the third order moment ofθ does not vanish andβ1 6= 0.
In order to estimateβ1 (sayβLewbel1n ), one has to obtain a consistent estimate for Eyi j
and Exi j (= E θi j ) first, to make thez1i j ’s operational. These estimates can be easilyobtained from the sample means ofy and x. An estimate forβ0 can be computedusing Eyi j = β0+ β1E xi j , from which we getβLewbel
0n = y− βLewbel1n x.
Note that for the instrumentsz2i j = y2i j andz3i j = x2
i j similar calculations can bedone.
![Page 185: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/185.jpg)
Chapter 7
A Nonparametic Bayesian LIVapproach
7.1 Introduction
In this chapter we introduce two important extensions of the standard LIV
model presented in chapters 3 and 4. Firstly, we generalize the multinomial
distribution of the latent instrument to a general distributionG. Secondly, we
consider endogeneity in two commonly used multilevel models and we suggest
how the method and results from this chapter may improve on the Hausman-
Taylor approach discussed in the previous chapter. Besides, the method pro-
posed here can be applied to a situation withmorethan one endogenous vari-
able. We present a Bayesian framework that can be applied to a wide variety
of models and allows for exact finite-sample inference. The methodological
results and the simulation studies that we present are promising, yet further
research is required and we suggest several steps for that.
The latent instrumental variable (LIV) model introduced in the previous chap-
ters can be used to estimate linear regression models where one regressor is
correlated with the error term. The LIV approach assumes the existence of a
discrete instrument with unobserved category membership. Hence, the latent
instrument has a multinomial distribution and the unconditional probability
distribution of the observations(y, x) is a mixture ofm bivariate normal distri-
171
![Page 186: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/186.jpg)
172 Chapter 7 A Nonparametic Bayesian LIV approach
bution functions, assuming normally distributed error-terms. Hence, the LIV
method does not require the availability of observed instrumental variables
to estimate the regression parameters if regressor-error correlations are sus-
pected. This is an advantage over classical instrumental variables estimation
since in many applications instruments are not available. Besides, available
instruments may be of bad quality (weak or endogenous) and, hence, sub-
stantive conclusions for the same phenomenon may be different when other
instruments are used (see e.g. chapters 2 and 5).
In this chapter we relax the assumption that the unobserved instrument has a
multinomial distribution withm categories. Instead, we let the data determine
the ‘best’ distributionG of the unobserved instrument. As such, this approach
is potentially more efficient than the standard LIV model because the distri-
bution of the instrument is fully estimated from the data and is not limited to
an assumed multinomial distribution withm groups. Besides, the number of
categories of the unobserved instrument is an unknown parameter that is de-
termined by the data. So, no tests for the number of groups are required. Fur-
thermore, we consider endogeneity issues in more general multilevel models.
Ebbes, Bockenholt and Wedel (2004) put forward that endogeneity in multi-
level models is more complex because of the presence of random components
at various level. Traditional methods (fixed-effects estimation, random-effects
estimation, Mundlak’s approach, or the Hausman-Taylor estimator) are shown
to be limited in various ways and they present a list of open problems, some of
which will be addressed here (see also chapter 6).
We propose a nonparametric Bayesian approach to estimate the regression pa-
rameters and the distribution of the unobserved instrument simultaneously.
Nonparametric Bayes models have originally been proposed to alleviate the
parametric assumptions often made in standard hierarchical linear models (Es-
cobar, 1994, Escobar and West, 1998, Ibrahim and Kleinman, 1998). At the
heart of hierarchical models are assumptions on the distributions of various
model parameters, which can often be questioned, and results are found to be
sensitive to assumed forms of the distributions. Nonparametric Bayesian mod-
![Page 187: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/187.jpg)
7.1 Introduction 173
els provide a way to alleviate these parametric assumptions by using Dirichlet
processes. A Dirichlet process is used as a prior distribution on the family
of distributions, i.e. a prior distribution is specified on the space of all possi-
ble distribution functions. Before Escobar’s (1994) results on nonparametric
Bayes models, applications of these methods were limited because of compu-
tational difficulties. However, he solved this by developing a MCMC sample
algorithm that is fairly easy to implement. Since then several studies used non-
parametric Bayes techniques in, for instance, density estimation or hierarchical
modeling. Dey, Muller and Sinha (1998) give an overview of recent develop-
ments in this area.
One advantage of using a Bayesian nonparametric approach to model the dis-
tribution of the unobserved instrument is that it bypasses the need to determine
the correct number of mixing components post hoc, while retaining the ability
to recover a variety of distributions in a unified modeling framework (cf. Kim,
Menzefricke and Feinberg, 2004). I.e., whatever the true distribution of the
instruments is, the Bayes estimate converges to it (Ferguson, 1973). Antoniak
(1974) shows that clustering is inherent to the Dirichlet process and there is a
positive probability that the number of support points found is smaller than the
number of sample points.
In section 7.2 we briefly introduce the nonparametric Bayes approach for a
simple multilevel model with a latent instrument to solve for level-1 depen-
dencies. We discuss model specification, the Dirichlet process prior, and we
present an estimation scheme. Next, we extend this idea to a general hierarchi-
cal model where individual level covariates may be endogenous (i.e. level-2
endogeneity). We show that an unobserved instrument with a Dirichlet process
can be incorporated in a similar way as for the simple multilevel model. We
present simulation results in section 7.4 and the conclusions and a discussion
of the results found are presented in section 7.5. Besides, we propose steps for
further research.
![Page 188: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/188.jpg)
174 Chapter 7 A Nonparametic Bayesian LIV approach
7.2 A simple multilevel model with a general latent in-strument
We consider the following simple multilevel model
yi j = β0+ β1xi j + εi j , (7.1)
where i = 1, ...,n, j = 1, ...,mi , andnt =∑
i mi is the total number of
observations. The error termεi j has mean zero and varianceσ 2ε . We do not
assume that E(xi j εi j
)equals zero. For the sake of simplicity, we omit further
regressors and random terms, but these can be easily included since the Dirich-
let process for the unobserved instrument is unaffected and MCMC estimation
is conditional on all other parameters and observations. In the terminology
of chapter 6, we consider here level-1 dependencies. Contrary to level-2 de-
pendencies, level-1 dependencies are more difficult to address, since potential
remedies require the availability of external instrumental variables that may
not be available or may be of bad quality (subsection 6.5.1).
The endogeneity ofxi j is modeled as follows
yi j = β0+ β1xi j + εi j (7.2)
xi j = θi j + νi j ,
whereθi j is the unobserved instrument and a ‘nuisance’ parameter.xi j is en-
dogenous when E(xi j εi j ) 6= 0. We assume that the endogenous regressorxi j
can be split up in an exogenous partθi j and an endogenous partνi j where the
latter is correlated withεi j . Furthermore, we assume thatεi j andνi j have mean
zero and variance-covariance matrix
6 =[σ 2ε σενσεν σ 2
ν
]. (7.3)
Instead of assuming a discrete distribution withk categories forθi j , we make
a very general assumption and assume a Dirichlet process for the distribution
![Page 189: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/189.jpg)
7.2 A simple multilevel model with a general latent instrument 175
of θi j (Ferguson, 1973, Antoniak, 1974). To be more specific, the unobserved
instruments are independently and identically distributed asG, where we do
not assume a specific parametric form forG. Instead, it has a Dirichlet process
prior, denoted byDP(α,G0), whereα > 0 is a concentration parameter and
G0 is the baseline prior distribution with densityg0. A Dirichlet prior places
a distribution (probability measure) on the space of all distribution functions
for G. For large values ofα, G is very likely to be close toG0, whereas for
small values ofα, the mass ofG is likely to be concentrated on a few atoms.
In fact, the support of the Dirichlet process is the class of all distribution func-
tions, and is, hence, very large. The nonparametric model allows the data to
adapt aG that is skewed, has ‘shoulders’, is multimodal, or any general shape
different fromG0 (cf. MacEachern, 1998). We present the definition and some
technical details on the Dirichlet process in appendix 7A. For more details on
recent results, estimation, and applications, see Dey, Muller and Sinha (1998).
The distribution ofθi j is specified as follows
θi j |G ∼ G (7.4)
G|α, λ ∼ DP(α,G0(.|λ)),
whereθi j ’s are independent andG0 is the baseline prior defined by the param-
eterλ. As stated above, whatever the true distributionG is, its nonparametric
Bayes estimate converges to it, and the choice for the distributionG0 is not
critical. We take forG0 a normal distribution with meanµg and varianceσ 2g .
The Dirichlet process prior forG is a probability distribution on the space of
all possible distributions for the unobserved instrument, andG0 can be seen
as the location parameter (see also appendix 7A). The parameterα acts as a
precision parameter (Escobar, 1994, Escobar and West, 1998): whenα is very
large, the Dirichlet process priorG for θi j is very close to the baseline prior,
and whenα is small,G is not and is likely to concentrate on a few distinct
atoms. The expected number of distinct values ofθi j is approximately equal to
αlog[(n+ α)/α] (Antoniak, 1974, or Escobar, 1994). This number, however,
is unknown and estimated from the data.
![Page 190: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/190.jpg)
176 Chapter 7 A Nonparametic Bayesian LIV approach
Throughout the following we assume that the errors are normally distributed.
Furthermore, letzi j = (yi j , xi j )′, µs
z,i j = (xi j β, θi j )′, wherexi j = (1, xi j ),
β = (β0, β1)′, ands stands for ‘structural’. The density function (likelihood)
for the structural model given in (7.2) is given by
p(zi j |θi j , β,6) = (2π)−1|6|−1/2×
×exp
(−1
2(zi j − µs
z,i j )′6−1(zi j − µs
z,i j )
), (7.5)
which can also be written in the reduced form as
p(zi j |θi j , β,6) = (2π)−1|�|−1/2×
×exp
(−1
2(zi j − µr
z,i j )′�−1(zi j − µr
z,i j )
), (7.6)
whereµrz,i j = (β0+ β1θi j , θi j )
′ and� = B6B′ with
B =[
1 β10 1
]. (7.7)
In the next subsection we discuss the Dirichlet Process prior forθi j in more
detail.
7.2.1 The Dirichlet process prior for θi j
Conditional onβ,6, G0, α, andzi j = (yi j , xi j ), the Dirichlet process prior for
θi j implies the following conditional posterior distribution (see e.g. Escobar,
1994, Escobar and West, 1998)
p(θi j |θ−i j , β,6,α,G0, zi j ) ∼ q0(i j )h(θi j |β,6,G0, zi j )+
+n∑
l=1
ml∑
k=1lk 6=i j
qlk(i j )δθlk (θi j ), (7.8)
![Page 191: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/191.jpg)
7.2 A simple multilevel model with a general latent instrument 177
whereθ−i j denotes the(nt−1)×1 vector obtained fromθ = (θ11, ..., θmn)′ with
the i j -th element deleted, andδθlk (θi j ) = 1 if θi j = θlk and zero otherwise.
Furthermore,
• h(θi j |β,6,G0, zi j ) ∝ p(zi j |θi j , β,6)g0(θi j |µg, σ2g ) is the density of
the ‘baseline’ posterior distribution forθi j ;
• q0(i j ) ∝ α∫
p(zi j |θi j , β,6)g0(θi j |µg, σ2g )dθi j , i.e. q0(i j ) is proportional
to α times the marginal distribution ofzi j whereθi j is integrated out
under the baseline priorG0;
• qlk(i j ) ∝ p(zi j |β,6, θlk), i.e. qlk(i j ) is proportional to the density ofzi j
conditional onθi j = θlk ;
• andq0(i j ) +∑n
l=1
∑mlk=1
lk 6=i jqlk(i j ) = 1, i.e. q0(i j ) andqlk(i j ) are normalized
to 1.
The intuition behind the above scheme is as follows. If observationi j has
a relatively large (small) residual using observationlk’s valueθlk , then it is
relatively less (more) likely that observationi j ’s value for the latent instrument
is chosen asθlk . Furthermore, the smaller the residual for observationi j , while
assuming its value for the latent instrument isµg, the greater the probability
that observationi j gets a new value for its latent instrument. In fact, at this
point G has been integrated out and it is conceptually easy to sample from the
above distribution using the following scheme:
θi j |θ−i j , β,6, α,G0, zi j
{ = θlk with probabilityqlk(i j )
∼ h(θi j |β,6,G0, zi j ) with probabilityq0(i j ).(7.9)
The ease of implementation, however, depends on whether the likelihood is
easy to evaluate, whether the densityh(θi j |β,6,G0, zi j ) is of manageable
form, and whetherq0(i j ) can be easily computed. WhenG0 is a conjugate
prior, which is often not a strong assumption, the marginal distribution ofzi j is
known analytically for the computation ofq0(i j ). Otherwise it may be possible
to compute it numerically, or to apply methods that circumvent the computa-
tion of q0, see Escobar and West (1998).
![Page 192: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/192.jpg)
178 Chapter 7 A Nonparametic Bayesian LIV approach
Antoniak (1974) shows that clustering of theθi j ’s is inherent to the Dirich-
let process. I.e., the valuesθi j typically reduce ton < nt distinct values or
clusters. These values are drawn fromG0, which can be seen using the Polya
Urn1 representation of the Dirichlet process (Escobar, 1994 or MacEachern,
1998). It is this description of the vectorθ that leads to the view of the mix-
ture of Dirichlet process model as a mixture model (cf. MacEachern, 1998).
We denote thesen distinct values ofθi j by θs, s = 1, ..., n. In the following,
the superscript “−” denotes that observationi j is left out of the conditioning,
i.e. n−s ≤ nt − 1 is the number of observations in ‘cluster’s with common
valueθs without observationi j , andn− is the number of distinct clusters when
observationi j is removed (see also MacEachern, 1998). Now the conditional
posterior distribution ofθi j is given by
p(θi j |θ−i j , β,6,α,G0, zi j ) ∼ q0(i j )h(θi j |β,6,G0, zi j )+
+n−∑
l=1
n−l ql (i j )δθl (θi j ), (7.10)
whereql (i j ) is proportional to the likelihoodp(zi j |β,6, θl ) given that obser-
vation i j is in clusterl andδθl (θi j ) = 1 if θi j = θl and 0 otherwise. The other
elements are defined as before. From (7.10) it can be seen that the full condi-
tional posterior distribution ofθi j is a mixture of a continuous distribution and
a discrete distribution with weights on the distinct values for the latent instru-
ment (excludingθi j ).
We assume that the baseline distributionG0 of the Dirichlet process is a uni-
variate normal distribution with unknown meanµg and varianceσ 2g . We take
conjugate priors for both parameters, i.e. a normal distribution forµg and an
inverse gamma distribution forσ 2g . The parameterα (the dispersion parameter
for the Dirichlet process) is an important parameter since it determines how
1Simply stated, the Polya urn problem concerns an urn that contains (say)ci balls of colori . Each time, a ball is randomly taken from the urn and the probability that it has colori isproportional to the number of balls of that color in the urn. Then it is replaced along withanother ball of the same color, etc..
![Page 193: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/193.jpg)
7.2 A simple multilevel model with a general latent instrument 179
‘close’ the unknown distributionG gets toG0. If this value is unknown, as
for most applications, a prior distributionp(α) can be specified and the data is
used to learn about this parameter, see Escobar and West (1995), West (1992),
Escobar (1994). We follow West (1995) and specify a gamma distribution as
prior for α. Finally, forβ and6 we take a normal distribution and an inverted
two-dimensional Wishart distribution, respectively, as priors. The parameters
of the prior distributions are known and are specified such that they reflect
‘vague’ or little information.
The complete nonparametric Bayes LIV model is rather complex and it is not
possible to derive closed form expressions for the joint and marginal posterior
distributions of the parametersβ and6, and the other parametersµg, σ 2g , θ ,
andα. However, Escobar’s (1994) MCMC results can be used to approximate
the nonparametric Bayes model, where the full conditional distribution ofθi j ’s
is a mixture of a discrete distribution with weights on the otherθlk ’s, lk 6= i j
and G0, see (7.8) and (7.10). His work has been modified and extended in
several ways and the resulting chains are relatively easy to implement and
have been empirically shown to move quickly through the parameter space
(Dey, Muller and Sinha, 1998). The full conditional distributions of the other
parameters can be derived straightforwardly when the priors are conjugate. We
discuss this in more detail next.
7.2.2 MCMC estimation
Depending on the form of the full conditional we use either (7.5) or (7.6)
for the likelihood. The likelihood of the complete sample is computed as∏ni=1
∏mij=1 p(zi j |θi j , β,6). We use the following expression for6−1:
6−1 =[
σ2ν
|6| −σεν|6|
−σεν|6|
σ2ε
|6|
]=[σ (11) σ (12)
σ (21) σ (22)
], (7.11)
whereσ (12) = σ (21), and|6| = σ 2ε σ
2ν − (σεν)2. The joint posterior distribution
for the parameters(β,6,µg, σ2g , α) is proportional to the likelihood times the
priors, i.e.
![Page 194: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/194.jpg)
180 Chapter 7 A Nonparametic Bayesian LIV approach
p(β,6,µg, σ2g , α) ∝
n∏
i=1
mi∏
j=1
p(zi j |θi j , β,6)× p(θi j |G)× p(G|G0, α)×
× p(α)× p(µ0)× p(60)× p(β)× p(6), (7.12)
wherep(zi j |θi j , β,6) is given in (7.5) or (7.6). This joint posterior density is
intractable analytically but Markov Chain Monte Carlo (MCMC) methods can
be used to generate random draws indirectly without having to calculate the
joint posterior explicitly. The MCMC chain is implemented via the following
full conditional distributions of the joint posterior distribution:
1. p(6|β, α, µg, σ2g , θ, z),
2. p(β|6,α,µg, σ2g , θ, z),
3. p(µg|β,6, α, σ 2g , θ, z),
4. p(σ 2g |β,6, α, µg, θ, z), and
5. p(θi j |θ−i j , β,6, α,µg, z) for eachi = 1, ...,n, j = 1, ...,mi ,
whereθ andz are thent ×1 vectors containing the elementsθi j andzi j . When
the Markov chain stabilizes on a (relatively) small number of distinct values
θs, s = 1, ..., n, it is unlikely that new values forθ are generated and, hence,
the chain gets ‘stuck’ and has undesirable mixing properties. In order to pre-
vent the chain from getting stuck on a few nodes, West, Muller and Escobar
(1994) propose to ‘remix’θs, s = 1, ..., n, after each iteration of the MCMC
algorithm (see also Escobar and West, 1998). LetS = (S11, ..., Snmn) denote
the cluster structure, that isSi j = s if θi j = θs for i = 1, ...,n, j = 1, ...,mi
and s = 1, ..., n. Given this configuration, the full conditional distribution
p(θs|S, n, β,6,µg, σ2g , z), s = 1, ..., n, can be used to generate a new set of
valuesθ to provide more movement in the chain which facilitates convergence.
As we show later on, the remixing step typically involves drawingn values for
θ from a distribution that has a density somewhat similar toh in (7.10).
![Page 195: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/195.jpg)
7.2 A simple multilevel model with a general latent instrument 181
In the following we provide the specification of the full conditional distribu-
tions above. More detailed information is given in appendix 7B.
Full conditional distribution of 6. The full conditional distribution for6
reduces to
p(6|β, θ, z) ∝
n∏
i=1
mi∏
j=1
p(zi j |θi j , β,6)
× p(6|ω,9), (7.13)
from which is follows that6 is sampled from a two-dimensional inverse Wishart
distribution with parametersω+ n and(∑
i
∑j (zi j −µs
z,i j )(zi j −µsz,i j )′+9).
Full conditional distribution of β. Similarly, the full conditional distribution
of the regression parametersβ is obtained from combining the likelihood and
a normal prior distribution, i.e.
p(β|6, θ, z) ∝
n∏
i=1
mi∏
j=1
p(zi j |θi j , β,6)
× p(β|µβ, 6β). (7.14)
Hence,β is sampled from a normal distribution with mean
C−1
∑
i, j
x′i j(σ (11)yi j + σ (12)
(xi j − θi j
))+6−1β µβ
,
and variance-covarianceC−1, whereC = σ (11)∑i, j x′i j xi j + 6−1
β and xi j =(1, xi j ).
Full conditional of θi j . A sample for eachθi j can be obtained using (7.10).
Hence,
1. sample a proposed ‘cluster’ valueci j from the integers{0,1, ..., n−},with probabilities proportional to{q0(i j ),n
−1 q1(i j ), ...,n
−n−qn−(i j )}.
![Page 196: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/196.jpg)
182 Chapter 7 A Nonparametic Bayesian LIV approach
2. If ci j ∈ {1, ..., n−}, then setθi j = θci j, and if ci j = 0, then draw a new
valueθi j from h(θi j |β,6,G0, zi j ), which is the density of a univariate
normal distribution with meanC−1(σ (22)xi j +σ (12)(yi j − xi j β)+µg/σ2g )
and varianceC−1 = (σ (22) + 1/σ 2g )−1.
We provide more detail on the form of the probabilitiesq0(i j ),q1(i j ), ...,qn−(i j )
in appendix 7B.
Remix θ . Let Jl denote the set of observations for whichθi j = θl , l = 1, ..., n.
The full conditional ofθl is proportional to
∏
i, j∈Jl
p(zi j |θl , β,6)
g0(θl |µg, σ
2g ),
for l = 1, ..., n, see e.g. West, Muller and Escobar (1994), and Escobar and
West (1998). The derivation of this distribution is more or less similar to the
derivation ofh(θi j |β,6,G0, zi j ) above.
The θl ’s are updated by replacing the current values by new values that are
drawn from an univariate normal distribution with meanC−1(σ (22)∑i, j∈Jl
xi j+σ (12)∑
i, j∈Jl(yi j − xi j β)+ µg
σ2g) and varianceC−1, whereC = nlσ
(22) + 1σ2
g.
Full conditional of µg. The parametersθl are independent and identically dis-
tributed fromG0(.|µg, σ2g ) andµg enters the model only throughG0. Hence,
the full conditional distribution forµg is given by (West, Muller and Escobar,
1994)
p(µg|σ 2g , θ , n, z) ∝
{n∏
l=1
g0(θl |µg, σ2g )
}p(µg|µ0, σ
20 ), (7.15)
where both densities are normal densities. Hence, a new value forµg is drawn
from a normal distribution with meanC−1(∑
l θl/σ2g + µ0/θ
20) and variance
C−1, with C = (1/σ 20 + n/σ 2
g ).
![Page 197: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/197.jpg)
7.2 A simple multilevel model with a general latent instrument 183
Full conditional of σ 2g . Similarly, the full conditional ofσ 2
g reduces to
p(σ 2g |µg, θ , n, z) ∝
{n∏
l=1
g0(θl |µg, σ2g )
}p(σ 2
g |c,d), (7.16)
where the prior forσ 2g is an inverted gamma distribution. It follows thatσ 2
g
is sampled from an inverted gamma distribution with parametersc+ n/2 and12
∑j (θ j − µg)
2+ d.
Full conditional for α. The full conditional distribution ofα, the ‘dispersion’
of the Dirichlet process, reduces top(α|n,nt). In fact, when the priorp(α)
is a gamma density with parameters(τα, γα), it is possible to obtain an exact
expression for the full conditional ofα. Escobar and West (1995), or West
(1992) show that a new value forα can be obtained in two steps:
1. sample an auxiliary valueη from p(η|α, n,nt) ∼ Beta(α + 1,nt), i.e. a
beta distribution with meanα+1α+nt+1.
2. Then, sampleα from the following mixture of gamma’s:p(α|η, n,nt) ∼πnGamma[τα+n, γα−log(η)]+(1−πn)Gamma[τα+n−1, γα−log(η)],
where πn1−πn= τα+n−1
nt (γα−log(η)) .
This completes the specification of the MCMC chain. Escobar (1994) and Es-
cobar and West (1995) prove convergence theorems for MCMC chains that use
a Dirichlet process prior. Using suitable starting values, the above scheme can
be iterated many times to obtain a sample of any size from the true posterior
distribution. An important question is to determine how often this scheme has
to be repeated to ensure convergence of the chain, see for instance Cowles and
Carlin (1996), or Brooks and Roberts (1998). We present simulation results
for this model in section 7.4. In the following we consider a more general
hierarchical regression model.
![Page 198: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/198.jpg)
184 Chapter 7 A Nonparametic Bayesian LIV approach
7.3 Endogenous subject-level covariates and randomcoefficients
In this section we consider general random coefficients models (e.g. Lenk et
al., 1996) with possible endogenous subject-level covariates, in which case
parameter estimates using standard estimation techniques are no longer guar-
anteed to be unbiased. This may occur, for instance, when relevant covariates
that are correlated with included covariates, are omitted, or when some of the
covariates are measured with error. For instance, the self-reported measure for
knowledge about the microcomputer market to explain part of the variability
of the random regression coefficients, used by Lenk et al. (1996), is possibly
measured with error because individuals may find it difficult to adequately ex-
press their knowledge by a few statements. Besides, if the measures used are
not, or only partly, related with the constructs that are actually searched for, the
observed data that is used in estimation contains measurement error. As far as
we know there are no other studies that consider regressor-error dependencies
at this stage of the model, but as will become clear the estimated model param-
eters may be biased in presence of such endogenous covariates using standard
estimation techniques. To be more specific, here we investigate a more general
form of level-2 dependencies as in section 6.1 (see also the discussion about
random coefficient models in section 6.6).
The model we consider is a standard linear two-level model with random co-
efficients. We assume that a set of individual level covariates are available to
explain part of the variance of the random coefficients. The model is given by
yi j = x′i j βi + εi j
βi = γc + γ zi + ηi , (7.17)
with i = 1, ...,n and j = 1, ...,m, or mi . xi j is a set of explanatory variables
(e.g. a design matrix in conjoint analysis), which is, asβi , ak× 1 vector. The
individual-level covariates are given byzi = (z′1i , z′2i )′, wherez1i is a l1 × 1
vector of potential endogenous covariates that are correlated withηi (but not
![Page 199: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/199.jpg)
7.3 Endogenous subject-level covariates and random coefficients 185
with εi j ), andz2i is a l2 × 1 vector containing the exogenous covariates.γ is
a k × (l1 + l2) matrix, and the constantγc is a k × 1 vector. We also write
γ0 = (γ ′, γ ′c)′, i.e. the matrixγ0 represents the effect of the covariateszi on
the regression coefficientsβi . The model forβi is a (latent) multivariate re-
gression model, where E(ηi ) = 0 and Var(ηi ) = 6ββ . We assume that all the
εi j ’s are independent across and within subjects, with mean 0 and varianceσ 2.
We are not restricted to within subject independency and the results presented
here can be generalized to a situation where Var(εi ) = 6, or6i .
The nonparametric Bayes LIV approach presented in the previous section can
be used in the model forβi to solve for possible biases in the presence of
endogenous covariatesz1. Here the idea of latent instruments is in particular
useful since obtaining valid observed instruments at this stage of the model
is highly problematic and ambiguous. The latent instruments are included as
follows
z1i = θi + αz2i + ξi , (7.18)
whereθi is a l1 × 1 vector of unobserved instruments,α a l1 × l2 matrix that
contain the effects of the exogenous covariates on the endogenous covariates2,
andξi is al1×1 vector of errors, which has expectation zero. The dependency
between the covariatesz1i andηi , i.e. the endogeneity, is caused by a nonzero
covariance betweenηi andξi . The variance covariance matrix of(ηi , ξi ) is the
(k + l1) × (k + l1) matrix3, which is assumed to be positive definite, and
contains the block matrices6ββ ,6βz1, and6z1z1
.
As in the previous section, the ‘latent’ instrumentθi has an unknown distribu-
tion G, which has a Dirichlet process prior with parametersρ andG0. G0 is
the normal density with meanµθ , which is a(l1 × 1)-vector, and a(l1 × l1)
variance-covariance matrixVθ . In the following we give the full conditionals
for the MCMC scheme. We use standard conjugate priors for the parame-
2This parameterα is not to be confused with the ‘dispersion’ parameterα of the Dirichletprocess in the previous section. For the model in this section we useρ for that purpose.
![Page 200: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/200.jpg)
186 Chapter 7 A Nonparametic Bayesian LIV approach
ters which are specified such that they represent no or vague prior knowledge.
More details on the derivation of the full conditionals is given in appendix 7C.
7.3.1 Estimating the hierarchical model with general latent in-struments
The unknown parameters of the model in (7.17) and (7.18) are:βi , σ2, γ0, θi ,
α, 3, ρ, µθ andVθ . The joint posterior distribution, conditionally on the data
X, Y, andZ, is formed as a product of the likelihood function, obtained from
the first equation of model (7.17), and the prior densities. Since no closed-
form expressions for the joint posterior density and marginal posterior densi-
ties are available, we use MCMC sampling to approximate a sample from the
true marginal posterior densities. Assuming normal distributions for the error
terms, the full conditional distributions for the parameters with conjugate pri-
ors can be obtained similar to section 7.2. Here we generally use multivariate
distributions to accommodate for having a vectorβi and possible several en-
dogenous variableszi . See Escobar (1994) and Dey, Muller and Sinha (1998)
for more details on general multivariate nonparametric Bayesian estimation.
The MCMC scheme is completed by iterating the following conditional distri-
butions:
1. p(σ 2|βi , i = 1, ...,n;Y, X);
2. p(βi |σ 2, γ0,3, θi , α;Y, X, Z), for i = 1, ..., n;
3. p(3|βi , θi , i = 1, ...,n, γ0, α; Z);
4. p(γ0|βi , θi , i = 1, ...,n,3, α; Z);
5. p(α|βi , θi , i = 1, ...,n,3, γ0; Z);
6. p(θi |θ−i , βi ,3, γ0, α; Z), whereθ−i = (θ1, ..., θi−1, θi+1, ..., θn), for i =1, ...,n;
7. p(µθ |θl , l = 1, ..., n,Vθ ), whereθl are then ≤ n distinct values forθi
that generally arise from the clustering structure in the Dirichlet process;
8. p(Vθ |θl , l = 1, ..., n, µθ ), and
![Page 201: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/201.jpg)
7.3 Endogenous subject-level covariates and random coefficients 187
9. p(ρ|n,n),
and, as before, a remixing step for the different values ofθl , l = 1, ..., n. In the
following we give the specific distributions, see appendix 7C for more details.
Full conditional distribution of σ 2. The full conditional forσ 2 is proportional
to the likelihood times the inverse gamma prior distribution forσ 2 with param-
etersτ0 andη0. Hence, a new value forσ 2 is drawn from an inverse gamma
distribution with parametersτ0+ nm/2 and(1/2)∑
i, j (yi j − x′i j βi )2+ η0.
Full conditional distribution of βi . The full conditional forβi can be obtained
from
p(βi |rest,data) ∝
m∏
j=1
p1(yi j |βi , σ2)
pk(βi |z1i , z2i , γ0,3, θ, α) (7.19)
wherepx denotes thex-variate normal density. The latter conditional distribu-
tion in (7.19) is obtained from the joint distributionpk+l1(hi |z2i , γ,3, θ, α),
wherehi = (β ′i , z′1i )′ is a(k+ l1)×1 vector. Hence,pk(βi |z1i , z2i , γ0,3, θ, α)
is ak-variate normal distribution with mean
µβ.z1i= γc + γ zi +6βz1
6−1z1z1(z1i − (θi + αz2i )), (7.20)
and variance-covariance
6ββ.z1= 6ββ −6βz1
6−1z1z16′βz1
(7.21)
(e.g. theorem 3.6, Greene, 2000). LetC = 1σ2
∑j xi j x
′i j + 6−1
ββ.z1. It fol-
lows that the full conditional forβi is a k-variate normal density with mean
C−1( 1σ2
∑j xi j yi j +6−1
ββ.z1µβ.z1i
) and variance-covarianceC−1.
Full conditional distribution of 3. As before, lethi = (β ′i , z′1i )′ and define
µhi= ((γc + γ zi )
′, (θi + αz2i )′)′. The full conditional for3 is obtained as
![Page 202: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/202.jpg)
188 Chapter 7 A Nonparametic Bayesian LIV approach
the product of the joint density ofhi acrossi = 1, ...,n, and the density of
the inverted Wishart prior distribution with parameters(c, D). Hence,3 is
sampled from an inverted Wishartk+l1-distribution with parametersn+ c and
n∑
i=1
(hi − µhi
) (hi − µhi
)′ + D.
Full conditional for γ0. The full conditional distribution forγ0 can be obtained
by vectorizing the model for(βi , z1i ). I.e. we stack the rows ofβi andz1i as
follows (see e.g. Lenk, 2001)
βi = vec(βi ) = vec(γc)+ vec(γ zi )+ vec(ηi )
= (z′0i ⊗ Ik)γ0+ ηi , (7.22)
with z0i = (z′i ,1)′, a(l + 1)× 1-vector, and
z1i = vec(z1i ) = vec(θi )+ vec(αz2i )+ vec(ξi )
= θi + (z′2i ⊗ I l1)α + ξi . (7.23)
We write z0i = z′0i ⊗ Ik andz2i = z′2i ⊗ I l1. Furthermore, letz10i = z1i − θi −
z2i α,3 = var(ηi , ξi ) and3,
3−1 =[3(11) 3(12)
3(21) 3(22)
], (7.24)
where3(12) = 3(21)′ , 3(11) is k × k, 3(12) is k × l1, and3(22) is a l1 × l1matrix. The vectorized system forβi and z1i has a ‘standard’ multivariate
normal form. It follows from appendix 7C that the values forγ0 are drawn
from a(l1+ l2+ 1)-variate normal distribution with mean
C−1
[∑
i
z′0i
(3(12)z10i +3(11)βi
)+ V−1
γ mγ
], (7.25)
3See Greene (2000), formula (2-74), for a general expression of the inverse of a 2× 2partitioned matrix.
![Page 203: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/203.jpg)
7.3 Endogenous subject-level covariates and random coefficients 189
and variance-covarianceC−1, whereC =∑i z′0i3(11)z0i + V−1
γ .
Full conditional for α. Using similar arguments as for the matrix of regression
coefficientsγ0, the full conditional distribution forα can be easily obtained
after vectorization. I.e., let
β0i = ηi
z10i = z2i α + ξi , (7.26)
whereβ0i = vec(βi − γc − γ zi ), z10i = vec(z1i − θi ), and z2i = z′2i ⊗ I l1.
It follows that the full conditional density forα is from a multivariate normal
distribution with mean (letC =∑i z′2i3(22)z2i + V−1
α )
C−1
[∑
i
z′2i
(3(21)β0i +3(22)z10i
)+ V−1
α mα
], (7.27)
and varianceC−1.
Full conditional for θi . The structure of the full conditional distributions for
each of the unobserved instrumentsθi , i = 1, ...,n, is derived in a similar way
as above for the simple linear multilevel model. The full conditional distribu-
tion for θi has the following form, withθ−i = (θ1, ..., θi−1, θi+1, ..., θn),
[θi |θ−i , rest, data]∼ q0(i )h(θi |rest, data)+n−∑
l=1
n−l ql (i )δθl (θi ), (7.28)
wheren− are the number of differentθ j ’s, j = 1, ...,n, j 6= i , andn−l are
the number of observations in clusterl when thei -th observation is removed.
Hence,
1. sample a proposed ‘cluster’ valueci from the integers{0,1, ..., n−} with
probabilities proportional to{q0(i ),n−1 q1(i ), ..., n
−n−qn−(i )}.
![Page 204: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/204.jpg)
190 Chapter 7 A Nonparametic Bayesian LIV approach
2. If ci ∈ {1, ..., n−}, setθi = θci, and ifci = 0, then draw a new valueθi
from h(θi |βi , α, γ0,3,µθ ,Vθ ; Z), which is a multivariate normal dis-
tribution with meanC−1(3(21)β0i + 3(22)z10i + V−1θ µθ ) and variance
C−1 = (3(22)+V−1θ )−1, whereβ0i = βi−γc−γ zi andz10i = z1i−αz2i .
In the appendix we give more details on how to compute the probabilities
{q0(i ),n−1 q1(i ), ...,n
−n−qn−(i )}.
Remix θ . The remixing density forθl , l = 1, ..., n, is proportional to
∏
i∈J j
p(hi |θl , α, γ0,3; Z) g0(θl |µθ ,Vθ ),
where Jj is the set of indicators of observations belonging to groupj . This
distribution can be obtained in a similar manner ash(θi |rest, data) in the pre-
vious subsection. Hence, the remixing density forθl is a multivariate normal
distribution with mean
C−1
3(21)
∑
i∈J j
β0i +3(22)∑
i∈Jj
z10i + V−1θ µθ
,
and varianceC−1, with C = nl3(22)+V−1
θ , and whereβ0i andz10i are defined
as before.
Full conditional for µθ . The full conditional distribution forµθ is propor-
tional to
{n∏
l=1
g0(θl |µθ ,Vθ )
}p(µθ |mµ,Vµ
), (7.29)
which are both densities of a multivariate normal distribution. LetC = nV−1θ +
V−1µ . Hence,µθ is sampled from a multivariate normal distribution with mean
C−1
(V−1θ
n∑
l=1
θl + V−1µ mµ
),
![Page 205: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/205.jpg)
7.4 A simulation study 191
and varianceC−1.
Full conditional distribution for Vθ . The prior distribution forVθ is an in-
verted Wishart with parameters(τv, ϒv). Its full conditional distribution is
obtained in a similar way as forµθ and can be shown to be equal to an inverted
Wishart distribution of dimensionl1 with parametersτv + n and
n∑
l=1
(θl − µθ
) (θl − µθ
)′+ϒv.
Full conditional distribution for ρ. The full conditional forρ is derived in
a similar way as the dispersion parameter in the simple multilevel model in
the previous section. Assuming a gamma prior distribution with parameters
(τρ, γρ), we obtain the following scheme for generating a new value forρ:
1. sample an auxiliary valueη from p(η|ρ, n,n) ∼ Beta(ρ + 1,n), i.e. a
beta distribution with meanρ+1ρ+n+1.
2. Then, sampleρ from the following mixture of gamma’s:p(ρ|η, n,n) ∼πnGamma[τρ+n, γρ−log(η)]+(1−πn)Gamma[τρ+n−1, γρ−log(η)],
where πn1−πn= τρ+n−1
n(γρ−log(η)) .
This completes the specification of the MCMC scheme. Similar arguments
as for the convergence of the MCMC algorithm for the simple nonparametric
Bayes LIV model in subsection 7.2.2 apply. In the following section we il-
lustrate the performance of the proposed two models and estimation schemes
using synthetic data.
7.4 A simulation study
In this section we discuss the results of two simulation studies to investigate the
performance of the models and estimation algorithms proposed in the previous
two sections. We first discuss the simulation results for the simple nonparamet-
ric Bayes model presented in section 7.2. Here we investigate the performance
![Page 206: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/206.jpg)
192 Chapter 7 A Nonparametic Bayesian LIV approach
of the model for three different choices of the distribution of the latent instru-
ment. Then we present the results for the random coefficients model in section
7.3 and consider a situation with one and two endogenous covariates.
7.4.1 Simulation results for the simple multilevel model
We specified three different distributions for the latent instrument in model
(7.2): (1) a discrete distribution with two categories, similar to the bimodal
(m = 2) case in section 3.5, (2) a continuous gamma distribution, and (3) at
distribution with six degrees of freedom. When the true distribution forθi is
discrete, the standard LIV model in chapter 3 is correctly specified (condition-
ally on knowing the true number of categories of the unobserved instrument)
and we expect that the standard LIV model outperforms the nonparametric
Bayesian LIV approach. When the unobserved instrument has at distribution,
the model is weakly identified, because it is not identified in case of an exact
normal distribution4.
In all cases the variance ofθi j is equal to 1.5. Furthermore, we normalized its
mean to zero. We took an initial sample size ofn = 1000 and assumed we
had only one observation per individual, i.e.mi = 1 for i = 1, ...,n. Fur-
thermore, we tookβ0 = 1, β1 = 2, σ 2ε = σ 2
ν = 1, andσεν was taken 0, 0.36,
and 0.79, representing a situation with no, moderate and severe endogeneity,
respectively. In total, we generated 15 datasets. We discarded the first 5000
iterations of the MCMC chain and saved the final 20000. To reduce the auto-
correlation in the MCMC draws, we only used every 10th draw. Convergence
was monitored based on iteration plots. We first discuss the results for the
main parameters for the bimodal and gamma distributions and compare the
results obtained from the nonparametric Bayes LIV method with the classical
OLS estimates and the standard LIV estimates. Subsequently, we present our
findings when the latent instrument has at distribution.
4We do not have a formal proof of this conjecture. If the distribution of the unobservedinstrument is exactly normal, it is identical to the specification ofG0 from which the unobservedinstruments are drawn. We can expect to end up with eithern = 1 or n = nt . Both situationsare not identifiable. We found support for this using simulated data.
![Page 207: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/207.jpg)
7.4 A simulation study 193
Results main parameters for bimodal and gamma distribution
Table 7.1: Results main parameters for bimodal distribution for the three situations:σεν = 0 (A), σεν = 0.36 (B), andσεν = 0.79 (C).
β1 σ 2ε σεν k
A OLS 1.99 (0.020) 1.00 (0.027)Bayes LIV 2.00 (0.043) 1.01 (0.029) -0.02 (0.127) 79 (42.99)LIV2 2.00 (0.037) 1.00 (0.028) -0.02 (0.115) 2
B OLS 2.15 (0.012) 0.95 (0.046)Bayes LIV 2.01 (0.030) 1.00 (0.050) 0.35 (0.074) 43 (24.59)LIV2 2.00 (0.029) 1.00 (0.050) 0.36 (0.073) 2
C OLS 2.31 (0.015) 0.76 (0.033)Bayes LIV 2.00 (0.028) 1.01 (0.049) 0.78 (0.030) 8 (2.26)LIV2 2.00 (0.029) 1.00 (0.051) 0.78 (0.031) 2
The results for the bimodal distribution are presented in table 7.1. We present
the mean and standard deviations of the estimated parameters computed across
the 15 simulated datasets. For the nonparametric Bayes model we computed
the posterior means forβ1, σ2ε , σεν , andk across the 2000 saved MCMC it-
erations and, subsequently, we computed the average and standard deviations
across these 15 posterior means. We do not report the results forβ0 because it
was estimated consistently by OLS sincexi j has mean zero in all cases.
It follows from table 7.1 that the simple nonparametric Bayes model gives ap-
proximate unbiased results in all cases. As can be seen, the number of clusters
(i.e. different values ofθi j ), as indicated byk, is a parameter in the nonparamet-
ric Bayes model and is also estimated, as opposed to the standard LIV model
wherek has to be chosen a priori. The high standard deviations of the esti-
mated values fork do not mean that it is not estimated precisely, but through-
out the MCMC iterations a few high values ofk appear that affect its mean
and standard deviation, see for instance figure 7.1. In particular forσεν > 0,
we find that the number of components estimated by the nonparametric Bayes
![Page 208: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/208.jpg)
194 Chapter 7 A Nonparametic Bayesian LIV approach
LIV model was more close to the true number of components, which was two.
It can be seen that the OLS results are biased when the regressor is correlated
with the error term, i.e. whenσεν 6= 0. Whenx is truly exogenous, OLS is the
best alternative and the classical LIV model, which was specified withk = 2,
is slightly more efficient that the nonparametric Bayes model. For nonzero
values ofσεν , the classical and Bayesian LIV methods give approximate sim-
ilar results. We note that the classical LIV model is correctly specified in all
cases since the true number of groups is two. However, the performance of the
nonparametric Bayesian LIV procedure is very encouraging.
Figure 7.1: Iteration plotk for synthetic dataset no. 15 whenσεν = 0.
As for the classical LIV model, in the nonparametric Bayesian LIV model we
can test whetherσεν = 0 (i.e. a test for endogeneity) easily by computing the
fractions of the MCMC sample in which casesσεν > 0 andσεν < 0. We found
for σεν = 0 that these fractions were close to 0.5, as they should be, and for
bothσεν = 0.36 and 0.79 that these were equal to 1 and 0, respectively. Hence,
![Page 209: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/209.jpg)
7.4 A simulation study 195
a test for endogeneity can be computed straightforward as a byproduct of the
MCMC output, using the above procedure based on the posteriorP-values (see
for a discussion Meng, 1994, or Sellke, Bayarri and Berger, 2001).
Table 7.2: Results main parameters for a skewed gamma distribution for the treesituations:σεν = 0 (A), σεν = 0.36 (B), andσεν = 0.79 (C).
β1 σ 2ε σεν k
A OLS 2.00 (0.023) 1.01 (0.053)Bayes LIV 2.00 (0.049) 1.02 (0.053) -0.02 (0.087) 13 (3.05)LIV2 2.00 (0.043) 1.01 (0.053) -0.02 (0.072) 2LIV3 1.99 (0.031) 1.01 (0.088) -0.02 (0.082) 3
B OLS 2.14 (0.018) 0.95 (0.056)Bayes LIV 2.00 (0.021) 1.00 (0.066) 0.37 (0.057) 11 (2.45)LIV2 2.00 (0.025) 1.00 (0.066) 0.37 (0.074) 2LIV3 1.99 (0.024) 1.00 (0.071) 0.38 (0.071) 3
C OLS 2.32 (0.028) 0.74 (0.043)Bayes LIV 1.99 (0.026) 1.01 (0.082) 0.81 (0.075) 14 (2.31)LIV2 1.99 (0.031) 1.01 (0.088) 0.81 (0.096) 2LIV3 1.99 (0.029) 1.00 (0.086) 0.80 (0.082) 3
In table 7.2 we present the results for a situation where the unobserved instru-
ment has a continuous skewed gamma distribution with scale parameter 0.5
and shape 0.577 (i.e. its variance is 1.5). It can be seen that the proposed non-
parametric Bayesian LIV model gives unbiased results in all cases. Its results
are preferred to the classical LIV results forσεν > 0. Although the classical
LIV model with three categories is slightly more efficient than with two cat-
egories, it is still less efficient than the nonparametric Bayes LIV model. For
the classical LIV model withk > 3 we found degenerate solutions in several
runs, which indicates thatk = 3 is more or less the “best” choice. It can be
seen that the nonparametric Bayes model can adapt more easily to a situation
where the true distribution of the instrument is not discrete but continuous. As
before, whenσεν = 0 the OLS estimate is best and the standard LIV model
![Page 210: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/210.jpg)
196 Chapter 7 A Nonparametic Bayesian LIV approach
gives more efficient results than the nonparametric Bayes model. Surprisingly,
the estimated number of clustersk is for the skewed gamma distribution much
lower for σεν = 0 and 0.36 than when the true distribution ofθi j is discrete
with two categories in table 7.1. This can be expected if the two components
of the discrete distribution are not far apart, in which case the sampled distri-
bution ofθ resembles more a symmetric, unimodal distribution, with a flat top,
which is approximated by a large number of support points drawn from a nor-
mal distributions. Apparently, a skewed gamma distribution is approximated
by a mixture of normals with fewer support points. The proposed test to test
for endogeneity was found to give satisfactory results (posteriorP-values of
0.45, 1, and 1 forσεν = 0,0.36, and 0.79).
Table 7.3: Results main parameters for at distribution with six degrees of freedomfor the tree situations:σεν = 0 (A), σεν = 0.36 (B), andσεν = 0.79 (C).
β1 σ 2ε σεν k
A OLS 2.00 (0.006) 1.00 (0.016)Bayes LIV 1.99 (0.027) 1.00 (0.016) 0.03 (0.066) 29 (42.7)
B OLS 2.14 (0.005) 0.94 (0.018)Bayes LIV 1.99 (0.028) 1.00 (0.024) 0.38 (0.069) 15 (8.94)
C OLS 2.32 (0.004) 0.74 (0.015)Bayes LIV 2.00 (0.032) 1.00 (0.058) 0.80 (0.078) 19 (6.75)
In table 7.3 we present the results for the simulatedt6 distribution of the in-
struments. We found that a sample size ofnt = 1000 was not sufficient to
estimate the model. In general, the MCMC chain did not converge, indicating
non- or under-identification, which was immediately clear from convergence
plots for (e.g.)σεν . This can be expected because thet distribution is close
to a normal distribution which is not identified, and a relatively large sample
is needed to have a good representation of the tails of that distribution. The
results presented in table 7.3 are for a total sample size ofnt = 10000 (ob-
tained as, for instance,n = 10000 andm = 1, orn = 1000 andm = 10). As
![Page 211: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/211.jpg)
7.4 A simulation study 197
before, we used 15 simulated datasets. We also estimated the synthetic data
with the standard LIV model, but in all cases we found that the estimated Hes-
sian matrix had several eigenvalues equal to zero, indicating that the model is
not identified. This reveals that the standard LIV model is more sensitive to
the distribution of the instruments when it is close to normal. From the results
in table 7.3 it becomes clear that the nonparametric Bayes LIV model yields
approximate unbiased results, but the relatively large standard deviations in-
dicate that the model is weakly identified. Examination of the iteration plots
suggests that the chain has converged. Contrary to the results in the previous,
the nonparametric Bayes estimates exhibit more variability for larger amounts
of endogeneity, although part of this variability is expected to reduce when the
number of simulated datasets is increased.
The simulation studies presented here illustrate that the nonparametric Bayes
approach with a general distribution for the unobserved instrument is pow-
erful in estimating linear models in presence of regressor-error correlations.
We compared the new method with classical OLS and the LIV method pro-
posed in the previous chapters and we examined two extreme cases, one case
where the classical LIV method is correctly specified and one case which rep-
resents near identifiability. When the distribution of the instrument is truly
discrete, the classical LIV approach performs best, but the nonparametric ap-
proach gives approximately similar results. When the true distribution of the
instrument is continuous the nonparametric Bayes approach performs better,
which illustrates its flexibility in adapting to the distribution ofθ . The stan-
dard LIV model could not be estimated when the unobserved instrument has at
distribution. The nonparametric Bayesian LIV method, however, does give ap-
proximate unbiased results but the estimated standard deviations may be rather
large. Besides, the results critically rely on the sample size used, which may
present a problem for cross sectional applications. However, this may be less
an issue in multilevel studies where typically several observations on a subject
are available.
In the following we present the results for the multilevel model described in
![Page 212: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/212.jpg)
198 Chapter 7 A Nonparametic Bayesian LIV approach
section 7.3. Here we consider a situation with one endogenous regressor and a
situation with two endogenous regressors. Given the problems found with the
t disitribution for the simple model, we only present results for an unobserved
instrument that has a discrete distribution and a continuous skewed distribu-
tion. This model can not be estimated by the standard LIV model developed
in the previous chapters.
7.4.2 Simulation results for the hierarchical model
We considered two situations for model (7.17): one in which case we have one
endogenous regressorz1, i.e. l1 = 1, and a situation with two,l1 = 2. In both
cases we assumed the presence of one exogenous covariatez2 (l2 = 1), with a
small effect (α) on the endogenous covariates, one regressorx, and a constant,
such thatβi is of dimension two, fori = 1, ...,n. We tookn = 500 andm =15 in all cases. As before, we simulated 15 datasets and we considered three
situations: no, moderate and severe endogeneity (the corresponding elements
of 6βz1are 0, 0.36 and 0.79, respectively). We first present the results for one
endogenous regressor, where the true distribution ofθi is a (univariate) discrete
distribution with two categories, subsequently we present the results for two
endogenous covariates where the distribution of the unobserved instrument is
a (bivariate) skewed gamma distribution. The estimates are compared with
results from a standard hierarchical Bayes model as in Lenk et al. (1996).
Results for one endogenous regressor
The results are presented in table 7.4. We only present the results for the
regression parameterβ1, the regression parametersγ that correspond to the
endogenous covariatez1, and the estimated covariances6βz1. We found that
for two of the 15 simulated datasets the estimated value fork equaled 1 in all
MCMC draws, in which case the model is not identified. These situations can
be identified easily from examining iteration plots ofk or γ , see the figure in
appendix 7D. The estimated values for the nonparametric Bayes model forγ
and6βz1in table 7.4 are obtained after excluding the non-converged cases.
![Page 213: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/213.jpg)
7.4 A simulation study 199
Table 7.4: Results main parameters for hierarchical model with one possible en-dogenous regressor tree situations:6βz1 = 0× ι2 (A), 6βz1 = 0.36× ι2 (B), and6βz1 = 0.79× ι2 (C). The other true values are:β1 = 2 andγ11 = γ21 = 1.
A B C
NPB LIV Std HB NPB LIV Std HB NPB LIV Std HB
β1 2.01 2.01 1.99 1.99 2.02 2.02(0.06) (0.06) (0.07) (0.07) (0.08) (0.08)
γ11 1.03 1.00 1.00 1.14 0.99 1.28(0.06) (0.03) (0.06) (0.03) (0.04) (0.02)
γ21 0.99 1.00 1.00 1.15 1.02 1.29(0.06) (0.03) (0.08) (0.03) (0.02) (0.02)
6(11)βz1
-0.05 0.34 0.70(0.13) (0.15) (0.10)
6(21)βz1
0.02 0.37 0.68(0.13) (0.20) (0.08)
k 13 11 5(3.70) (2.74) (1.30)
Two results are immediately clear. Firstly, the results for the main regres-
sion parameterβ1 are almost equal for the nonparametric Bayes model and
the standard hierarchical Bayes model, regardless of whether a covariatez is
endogenous or not. Hence, the results for the main regression parametersβ
areunaffected by the presence of an endogenous covariatez (in this example,
however, the covariatesz were generated independently from the regressors
x, i.e. they are not collinear, which may potentially explain the result found).
Secondly, as expected, when endogeneity is present, the estimated regression
parametersγ obtained from the standard hierarchical Bayes model are biased,
as expected. It can be seen that the nonparametric Bayes model corrects for
this and allows for unbiased estimation.
We note that the effective sample size used to estimate the distribution of the
latent instrument in this simulation study is 500, whereas in the previous sim-
ulation studies the sample size was at least 1000. Furthermore, the regression
![Page 214: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/214.jpg)
200 Chapter 7 A Nonparametic Bayesian LIV approach
parametersβi are unobserved parameters and, hence, less information is avail-
able to estimate the distribution of the unobserved instrument. Hence, we typ-
ically observe larger standard deviations than for the results in table 7.1
In all casesσ 2ε , the variance of the main regression equation fory, was esti-
mated unbiasedly by both the nonparametric Bayes LIV model and the stan-
dard hierarchical Bayes model. Furthermore, although not given in table 7.4,
the nonparametric Bayesian LIV model estimates for the heterogeneity vari-
ances of the regression parameters6ββ are approximately unbiased, regardless
of the presence of endogenous covariates. Contrary, the heterogeneity vari-
ances are severely underestimated by the standard hierarchical Bayes model in
presence of an endogenous covariate. This was also found in chapter 6 for the
variance of the random intercept (for instance table 6.2).
It can be seen from the above results that the bias due to level-2 endogeneity
in the standard hierarchical Bayes estimates forγ , and6ββ , may be quite
large. In the following subsection we present the results for a situation with
two endogenous covariates.
Results for two endogenous regressors
Since there are a large number of parameters in this model, we only focuss
on the main parameters. As for the results in the previous subsection with
one endogenous covariate, the results for the main regression equation are not
affected by presence of endogeneity of some of thez’s (while assuming that
the z’s and thex’s are not collinear), and both the nonparametric Bayes LIV
model and the standard hierarchical Bayes model give approximate similar re-
sults. We therefore choose to report the results only for the elements ofγ
(γ11, γ12, γ21, γ22) corresponding to the endogenous covariates and the the ele-
ments of6βz1. Furthermore, we report the results for the estimated number of
clustersk.
The results are presented in table 7.5. It can be seen that the nonparametric
Bayes LIV model gives unbiased results for the regression parametersγ and
![Page 215: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/215.jpg)
7.4 A simulation study 201
Table 7.5: Results main parameters for hierarchical model with two possible endoge-nous regressors for tree situations:6βz1 = 0× I2 (A), 6βz1 = 0.36× I2 (B), and6βz1 = 0.79× I2 (C). The other true values are:γ11 = γ21 = 1 andγ12 = γ22 = −1.
A B C
NPB LIV Std HB NPB LIV Std HB NPB LIV Std HB
γ11 0.98 1.00 0.98 1.12 1.00 1.24(0.05) (0.03) (0.05) (0.02) (0.04) (0.04)
γ12 -1.00 -1.00 -1.00 -0.88 -0.98 -0.74(0.05) (0.03) (0.05) (0.02) (0.05) (0.04)
γ21 0.99 0.99 0.99 1.12 0.99 1.24(0.07) (0.03) (0.05) (0.02) (0.03) (0.03)
γ22 -0.99 -0.98 -1.02 -0.89 -1.01 -0.75(0.07) (0.03) (0.04) (0.02) (0.03) (0.03)
6(11)βz1
0.03 0.40 0.70(0.13) (0.12) (0.09)
6(12)βz1
0.03 0.37 0.69(0.09) (0.11) (0.10)
6(21)βz1
0.01 0.38 0.71(0.17) (0.08) (0.08)
6(22)βz1
0.01 0.40 0.73(0.17) (0.11) (0.06)
k 23 18 21(12.71) (5.64) (3.58)
the covariances between (ηi , ξi ) in (7.17) and (7.18), that induce the depen-
dency betweenβi and the elements ofz1i . Furthermore, the estimated values
for γ using the standard hierarchical Bayes model are biased when the covari-
atesz1 are not exogenous. For instance, for the case with severe endogeneity
(C), this bias amounts to approximately 25%, which is quite severe.
We found that the estimates for the heterogeneity variance components5 6ββ ,
are approximately unbiased for the nonparametric Bayes LIV model, but the
5Not reported here.
![Page 216: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/216.jpg)
202 Chapter 7 A Nonparametic Bayesian LIV approach
standard hierarchical Bayes model underestimates the variances by about 40%
when there is severe endogeneity.
7.5 Discussion nonparametric Bayesian LIV approach
In this chapter we presented preliminary findings on a very general approach
to model endogeneity in single or multilevel models. We proposed a non-
parametric Bayes approach to model the distribution of the latent instrument
using a Dirichlet prior process. We considered two general multilevel models
that may suffer from regressor-error dependencies. One advantage of using a
Bayesian approach is that it can handle complex model structures in a straight-
forward manner through MCMC estimation. We illustrated that the nonpara-
metric Bayes model for the unobserved instrument proposed in section 7.2
could be adapted to a more general setting in section 7.3 without too much
effort. Although the technicalities surrounding Dirichlet prior processes may
be demanding, it presents a flexible approach that can be extended and adapted
easily to other situations with potential regressor-error dependencies.
The simulation results showed that the nonparametric Bayes LIV model gives
unbiased results for a variety of settings. We compared the approach proposed
in this chapter to the LIV model in chapter 3, and found that the nonparametric
Bayes approach outperforms the classical LIV model for non-discrete distri-
butions, because the Dirichlet process prior allows for full estimation of the
distribution of the unobserved instrument. We saw that the model yields ap-
proximately unbiased results, even for the extreme case when the unobserved
instrument has at distribution, although a large dataset needs to be available.
Implementation of the nonparametric Bayes LIV model does not require a pri-
ori specification of the number of clusters, but this number is estimated as a
by-product of the estimation. Similarly, we found for the random coefficients
model in section 7.3 that the nonparametric Bayesian LIV approach can be
successfully used to estimated the model parameters in presence of endoge-
nous covariates. Importantly, the estimates forγ obtained from the standard
hierarchical Bayes model are strongly biased. In addition, the standard hierar-
![Page 217: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/217.jpg)
7.5 Discussion nonparametric Bayesian LIV approach 203
chical Bayes model substantially underestimates the amount of heterogeneity
in the regression coefficients. We showed that the nonparametric Bayesian LIV
approach can handle situations with more than one endogenous regressor.
Although the results are promising, future research is needed to obtain more
insight in using a Dirichlet prior process for the distribution of the unobserved
instrument. The simulation studies presented here are informative, but limited
to only three kinds of distributions for the unobserved instrument: a discrete
distribution with two categories, a heavily skewed gamma distribution and a
t distribution with fat tales. We plan to investigate the properties of the non-
parametric Bayes LIV model for a broader range of distributions. The results
above suggest that when the distribution of the latent instrument is close to a
normal distribution the results should be interpreted with caution, which was
revealed by examining iteration plots of key parameters. We suggest to inves-
tigate convergence issues in detail in empirical applications, since simply rely-
ing on iterations plots may be too limiting (Cowles and Carlin, 1996, Brooks
and Roberts, 1998). Besides, we plan to investigate whether the mixing of the
MCMC chain can be improved and whether this depends on the amount of
autocorrelation between subsequent MCMC draws. We found that the com-
putational burden for the nonparametric Bayes LIV model is much larger than
for the standard LIV model.
One interesting application for the nonparametric Bayes LIV approach is to
combine the Dirichlet process prior with the Hausman-Taylor approach pre-
sented in subsection 6.3.3. The Hausman-Taylor approach can be applied
to general random intercept models that suffer from level-two dependencies.
Hausman and Taylor (1981) show that the multilevel structure of the data and
prior knowledge on the exogeneity of part of the available regressors can be
used to construct instrumental variables to estimate the regression parameters.
This method has the advantage that no external instrumental variables are re-
quired. Furthermore, their approach allows for estimation of level-two (group-
specific) variables as opposed to e.g. Mundlak’s approach or fixed-effects es-
timation. The Hausman-Taylor estimator, however, was shown to be limited
![Page 218: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/218.jpg)
204 Chapter 7 A Nonparametic Bayesian LIV approach
in its use when level-one regressor-error dependencies are present (see section
6.4). Importantly, in constructing the ‘internal’ instruments as proposed by
Hausman and Taylor focus is only on whether or not certain regressors can be
assumed independent of the random intercepts. Although this may be a valid
assumption in many applications, their method does not address the strength
of the obtained ‘internal’ instruments, and the method seems to be ad-hoc at
this point. We illustrated this important aspect in section 6.6. Incorporation
of a general distributed unobserved instrument in the Hausman-Taylor model
can be done using the results in section 7.2, and may yield improved results
when the proposed ‘internal’ instruments are weak or when possible level-one
endogeneity is present.
Furthermore, our nonparametric Bayesian LIV approach can possibly handle
situations where both level-1 and level-2 dependencies are present. Lousily
spoken, the errors in (7.2) can be regarded as a result of two terms: (1) a
level-2 specific error, and (2) a level-1 specific error. I.e., the ‘total’ errors are
εi j = αi + εi j andνi j = τi + νi j . The variance-covariance matrix6 in (7.3)
can be changed accordingly. Level-2 endogeneity arises when Eαi τi 6= 0, and
level-1 endogeneity when Eεi j νi j 6= 0. In both cases, Eεi j νi j 6= 0, which is in
form similar to the problem considered in section 7.2. It is interesting to inves-
tigate this extension, in particular given the conclusions in the previous chapter.
The standard LIV model in chapter 3 is identified when the unobserved in-
strument has at least two categories. When the unobserved instrument has one
mean (m = 1), the parameters are not identified. In a Dirichilet process there
is a positive probability that the number of different values for the latent instru-
mentθ is less than the sample sizent (i.e. theθ ’s are clustered). Our simulation
studies suggest that identification problems occur when the true distribution of
the unobserved instrument gets close to normal. We found clear indications
of lack of convergence of the MCMC output in such cases, suggesting that
the nonparametric Bayes model ‘automatically’ points-out a non-identified so-
lution, and the results should be disgarded. Nevertheless, further research is
required to investigate this conjecture. Observations made by Lewbel (1997)
![Page 219: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/219.jpg)
7.5 Discussion nonparametric Bayesian LIV approach 205
or Carroll, Roeder and Wasserman (1999) may prove helpful. Related to this,
we suspect that, if the error distributionνi j in (7.2) is non-normal, a normal
distribution for the latent instrument may be possible.
![Page 220: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/220.jpg)
206 Chapter 7 A Nonparametic Bayesian LIV approach
Appendix 7A The Dirichlet process
Ferguson (1973) introduces the Dirichlet process priors as a class of prior distributionsfor a set of probability distributions on a given sample space. The Dirichlet process isbased on the Dirichlet distribution, which is given in definition 7A.1 (Ferguson, 1973).
Definition 7A.1 Let Z1, Z2, ..., Zk be independent random variables withZi ∈Gamma(αi , 1), whereαi ≥ 0 for all i , andαi > 0 for somei . Let Yi = Zi /
∑j Z j .
Then distribution of(Y1, ...,Yk) is a Dirichlet distribution with parameter(α1, ..., αk).
The probability density for a Dirchlet distribution is defined by
f (y1, ..., yk|α1, ..., αk) =0(∑k
i αi )∏ki 0(αi )
(
k∏
i=1
yαi−1i ), (7A.1)
whereα1, ..., αk > 0, y1, ..., yk ≥ 0 and∑
i yi = 1. Let α = ∑i αi . The mean
of a Dirichlet distribution is E(yi ) = αi /α and the variance is var(yi ) = αi (α −αi )/α
2(α + 1). The Dirichlet distribution is an extension of the Beta distribution(k = 2). The Dirichlet distribution can be used as a prior for the (discrete) probabili-ties (group sizes) in e.g. mixture models (see also property 3, Ferguson, 1973).
Ferguson’s Dirichlet process is extensively discussed in Ferguson (1973) and Antoniak(1974). Here we present definition 1 from Antoniak:
Definition 7A.2 Let2 be a set andA be aσ -field of subsets of2. Let ν be a finite,non-null, non-negative, finitely additive measure on(2,A). Now, a random prob-ability measureP on (2,A) is a Dirichlet process on(2,A) with parameterν, iffor everyk = 1, 2, ... and measurable partitionB1, ..., Bk of 2, the joint distribu-tion of the random probabilities(P(B1), ..., P(Bk)) is a Dirichlet distribution withparameters(ν(B1), ..., ν(Bk)).
We write: P ∈ D(ν). Ferguson obtains the following properties of the Dirichletprocess:
1. If P ∈ D(ν), andA ∈ A, then E(P(A)) = ν(A)/ν(2);2. If P ∈ D(ν), and, conditional givenP, θ1, θ2, ..., θn are an i.i.d. sample from
P, thenP|θ1, θ2, ..., θn ∈ D(ν+∑n
i=1 δθi ), whereδx is a measure giving massone to pointx;
3. If P ∈ D(ν), thenP is almost surely discrete.
Furthermore, Ferguson (1973) shows that for a sampleX of size 1 fromP ∈ D(ν),P(X ∈ A) = ν(A)/ν(2), for A ∈ A.
In nonparametric Bayesian applications it is common to specifyν asν = αG0, whereG0 is a distribution, andα > 0. The posterior distribution, which is conditional on
![Page 221: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/221.jpg)
Appendix 7B Full conditionals: the simple multilevel model with general LIV 207
the data, that arises from a structure with a Dirichlet process prior, is known to bea mixture of Dirichlet processes (Antoniak, 1974, Escobar, 1994). Escobar’s (1994)results on MCMC estimation, however, show that the full conditional distributions arefairly simple to use in parameter estimation.
Appendix 7B Full conditionals: the simple multilevelmodel with general LIV
Full conditional for 6. For6 we take the inverted Wishart prior with parametersω
and9, implying
p(6|rest,data) ∝
(det(6))−n/2 exp
−1
2
∑
i
∑
j
(zi j − µsz,i j )′6−1(zi j − µs
z,i j )
×
×|6|−(ω+2+1)/2 exp
[−1
2tr(96−1
)](7B.1)
∝ |6|−(ω+n+2+1)/2 exp
−1
2tr
(∑
i
∑
j
(zi j − µsz,i j )(zi j − µs
z,i j )′ +9)6−1
,
i.e. the full conditional of6 is a inverted Wishart with parametersω + n and(∑
i∑
j (zi j − µsz,i j )(zi j − µs
z,i j )′ +9).
Full conditional for β. The prior forβ is a bivariate normal distribution with meanµβ and variance6β . Because both likelihood and prior are normal distributions,we focuss on the ‘quadratic’ term (kernel) (for the sake of national convenience wesuppress in the following the subscripti j , and letη = (x − θ) andx = (1, x))
(z− µsz)′6−1(z− µs
z) =
=((
yx
)−(
xβθ
))′ [σ (11) σ (12)
σ (12) σ (22)
]((yx
)−(
xβθ
))
= σ (11)(y− xβ)′(y− xβ)+ σ (12)η(y− xβ)+ σ (12)η(y− xβ)′ + η2σ (22)
∝ σ (11)(−β ′ x′y− yxβ + β ′ x′ xβ)− σ (12)ηxβ − σ (12)β ′ x′η, (7B.2)
= σ (11)β ′ x′ xβ − β ′ x′(σ (11)y+ σ (12)η)− (σ (11)y+ σ (12)η)xβ.
By adding the subscripts and the kernel of the prior we get,
![Page 222: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/222.jpg)
208 Chapter 7 A Nonparametic Bayesian LIV approach
β ′(σ (11)∑
i, j
x′i j xi j +6−1β )β − β ′(
∑
i, j
x′i j (σ(11)yi j + σ (12)ηi j )+6−1
β µβ)+
−(µ′β6−1β +
∑
i, j
(σ (11)yi j + σ (12)ηi j )xi j )β.
Let C = σ (11)∑i, j x′i j xi j +6−1
β . Now it follows that the full conditional forβ, giventhe other parameters and the data, is bivariate normal with mean
C−1(∑
i, j
x′i j (σ(11)yi j + σ (12)ηi j )+6−1
β µβ),
and variance-covarianceC−1. Note that if6−1β = 0 the mean of the full conditional
for β is similar to an ordinary regression with a correction term for the endogeneity.If their is no endogeneity, i.e.σ (12) = 0, the ‘standard’ regression model is obtained.
Derivation of full conditional distribution of θi j . In the following we derive theexpressions for the components of (7.10).
Derivation ofh(θi j |rest, data). h(θi j |rest, data) ∝ p(zi j |θi j , β,6)g0(θi j |µg, σ2g ) can
be computed in a similar way as the full conditional ofβ. The kernel of the (structural)likelihood, whereε = y− xβ, is
(z− µsz)′6−1(z− µs
z) == σ (11)ε′ε + σ (12)(x − θ)ε + σ (12)(x − θ)ε′ + (x − θ)2σ (22)
∝ −2σ (12)θ ε + σ (22)θ2− 2σ (22)xθ. (7B.3)
Adding the subscripti j and the kernel ofg0 yields
(σ (22) + 1
σ 2g
)θ2
i j − 2
(σ (22)xi j + σ (12)εi j +
µg
σ 2g
)θi j , (7B.4)
from which it follows thath(θi j |rest, data) is a normal density with mean
C−1
(σ (22)xi j + σ (12)εi j +
µg
σ 2g
),
and varianceC−1, with C = σ (22) + 1σ2
g. Note: whenσ 2
g → ∞, the location of the
distributionh(θi j ) is estimated byxi j and an ‘endogeneity’ correction(σ (12)/σ (22))εi j(sinceσ (12) = −σεν/|6| andσ (22) = σ 2
ε /|6|).
![Page 223: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/223.jpg)
Appendix 7B Full conditionals: the simple multilevel model with general LIV 209
Derivation ofq0. q0 is proportional toα∫
p(zi j |θi j , β,6)dG0(θi j |µg, σ2g ). In the
following we present the steps necessary to integrateθi j out of the likelihood function.We focus on the quadratic term of the reduced form likelihood (7.6). This derivationis conditional on all parameters (exceptθi j ) and the data. We dropi j -subscript fornotational convenience and usez = (y − β0, x)′ andµr
z = θν, whereν = (β1, 1)′.Then,
(z− µrz)′�−1(z− µr
z)+(θ − µg)
2
σ 2g
=
z′�−1z− θν′�−1z− z′�−1νθ + θν′�−1νθ + θ2
σ 2g− 2
µg
σ 2g+ µ
2g
σ 2g
By letting κ1 = ν′�−1z, κ2 = ν′�−1ν + 1σ2
g, and rearranging terms (the terms not
involving θ or z are dropped in the factor of proportionality), we get
κ2θ2− 2
(µg
σ 2g+ κ1
)θ + z′�−1z (7B.5)
which is equal to (takeκ3 = κ1+ µg
σ2g
)
κ2
(θ − κ3
κ2
)2
− κ23
κ2+ z′�−1z (7B.6)
The first term integrates to one. The remaining terms involve matrices and vectorsand we have to be careful with taking squares, transposing and taking inverses. LetA = κ−1
2 , now (7B.6) is equal to
z′�−1z− κ ′3Aκ3 =
z′�−1z−(
z′�−1ν + µg
σ 2g
)A
(ν′�−1z+ µg
σ 2g
)∝
−z′�−1νAµg
σ 2g− µg
σ 2g
Aν′�−1z− z′�−1νAν′�−1z+ z′�−1z=
z′(�−1−�−1νAν′�−1
)z− z′�−1νA
µg
σ 2g− µg
σ 2g
Aν′�−1z. (7B.7)
Let B = �−1−�−1νAν′�−1 = [�+ νσ 2ν′]−1
, where we used expression (2-66b)from Greene (2000). The expression in (7B.7) is the kernel of a multivariate normaldensity6 with variance-covariance matrixB−1 = �+νσ 2ν′ and meanB−1�−1νA
µg
σ2g
.
6The general form of a multivariate normal distribution with meanV−1µ and variance-covarianceV−1 is: (x − V−1µ)′V(x − V−1µ) = x′V x− µ′x − x′µ+ µ′V−1µ.
![Page 224: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/224.jpg)
210 Chapter 7 A Nonparametic Bayesian LIV approach
This latter expression can be simplified toνµg. This is not immediately clear, but sub-
stituting A = σ 2g − σ 2
gν′[�+ νσ 2
gν′]−1
νσ 2g and rearranging terms gives the desired
result. Hence,q0 is proportional toαNi j , whereNi j is the density7 of a bivariate nor-mal distribution atzi j with meanνµg and variance-covariance� + σ 2
gνν′, which is
equal to
B6B′ + σ 2g
[β2
1 β1β1 1
]=
[β2
1(σ2ν + σ 2
g )+ 2β1σεν + σ 2ε β1(σ
2ν + σ 2
g )+ σενβ1(σ
2ν + σ 2
g )+ σεν σ 2ν + σ 2
g
]. (7B.8)
Derivation of q j . This quantity is proportional ton−j times p(zi j |θ j , β,6), j =1, ..., n−, wheren− is the number of differentθ ’s whenθi j is removed.
Remixing θ . Using the expression for the posterior distribution ofθ , it follows thatthe derivation of the remixing distribution forθ is more or less similar to the derivationof h(θi j ) above. We have
∑
i, j∈Jl
(ε
x − θ)′ [
σ (11) σ (12)
σ (12) σ (22)
](ε
x − θ)∝
∑
i, j∈Jl
{−2σ (12)θ ε + σ (22)θ2− 2σ (22)xθ
}, (7B.9)
and by adding the kernel ofg0 and the subscriptsi j we obtain
(nlσ
(22) + 1
σ 2g
)θ2
l − 2
σ (22)
∑
i, j∈Jl
xi j + σ (12)∑
i, j∈Jl
εi j +µg
σ 2g
θl . (7B.10)
It follows that the (full conditional) posterior remixing distribution forθl is a normaldistribution with meanC−1(σ (22)∑
i, j∈Jlxi j + σ (12)∑
i, j∈Jlεi j + µg
σ2g) and variance
C−1, whereC = nlσ(22) + 1
σ2g
andJl is defined in subsection 7.2.2. Whenσ 2g →∞
and there is no endogeneity (σ (12) = 0), then the mean of the full conditional distri-bution for θl is equal to the sample mean ofxi j of the observations in clusterl . The
7This result looks very much like the results of a more simple case:∫f (y|µ, σ2)g(µ|µ0, σ
20 )dµ, where y has a normal distribution with meanµ and vari-
anceσ2 andµ has a normal distribution with meanµ0 and varianceσ20 , of which we know
that this integral yieldsy|σ2, µ0, σ20 ∼ N(µ0, σ
2+ σ20 ).
![Page 225: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/225.jpg)
Appendix 7C Full conditionals: the hierarchical model with general LIV 211
variance is in this case equal toσ 2ν /nl .
Full conditional for µg. From (7.15) we obtain
∑
l
(θl − µg)2
σ 2g
+ (µg − µ0)2
σ 20
∝
µ2g
(1
σ 20
+ n
σ 2g
)− 2
(∑l θl
σ 2g+ µ0
σ 20
)µg, (7B.11)
from which it follows that the full conditional forµg is a normal distribution withmeanC−1(
∑l θl /σ
2g + µ0/σ
20 ) and varianceC−1, with C = (1/σ 2
0 + n/σ 2g ). Note:
if σ 20 →∞ than the mean of the full conditional distribution is computed as
∑l θl /n
and its variance asσ 2g/n.
Full conditional for σ 2g . The derivation is similar to the derivation of the full condi-
tional distribution forµg. We use (7.16), hence
1(√σ 2
g
)nexp
−1
2
n∑
j=1
(θl − µg)2
σ 2g
1
(σ 2g )(c+1)
exp
(− d
σ 2g
)=
1
(σ 2g )(c+n/2+1)
exp
(−
12
∑j (θ j − µg)
2+ d
σ 2g
), (7B.12)
from which it follows that the full conditional distribution forσ 2g is an inverted gamma
distribution with parametersc+ n/2 and12
∑j (θ j − µg)
2+ d.
Full conditional for α. The full conditional distribution for the dispersion parameterα of the Dirichlet process can be obtained using the results in e.g. West (1992) orEscobar and West (1995).
Appendix 7C Full conditionals: the hierarchical modelwith general LIV
Full conditional for σ 2 and βi . Assuming conjugate prior densities, the full condi-tional distributions forσ 2 andβi can be derived easily using standard results fromcombining a normal likelihood with a inverted gamma distribution, and with a multi-variate normal distribution with mean (7.20) and variance-covariance (7.21), respec-tively.
![Page 226: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/226.jpg)
212 Chapter 7 A Nonparametic Bayesian LIV approach
Full conditional distribution for 3. The prior distribution for3 is an uninformative(k+ l1) dimensional inverted Wishart distribution with parametersc andD. Let hi =(β ′i , z
′1i )′ andµhi = ((γc + γ zi )
′, (θi + αz2i )′)′. Now
p(3|rest, data) ∝{
n∏
i=1
det(3)12 exp
[−1
2
((hi − µhi )
′3−1(hi − µhi ))]}×
×det(3)−c+k+l1+1
2 exp
{−1
2tr(
D3−1)}, (7C.1)
from which it follows that the full conditional for3 is also an inverted Wishart(k+l1)-distribution with parametersn+ c and
n∑
i=1
(hi − µhi
) (hi − µhi
)′ + D.
Full conditional for the matrix γ0. We use the vectorized system in (7.22) and (7.23).The quadratic term, under normality of the errors(η′i , ξ
′i )′, of this system is given by
(we drop the subscripti for the moment)
(β − z0γ0
z10
)′ [3(11) 3(12)
3(21) 3(22)
](β − z0γ0
z10
)
∝ (7C.2)
γ ′0z′03(11)z0γ0− γ ′0z′0(3
(12)z10+3(11)β)− (β ′3(11) + z′103(21))z0γ0.
The prior forγ0 is a(l1+l2+1)-variate normal distribution with meanmγ and varianceVγ . Adding the subscripti and the prior kernel we obtain
γ ′0
(∑
i
z′0i3(11)z0i
)γ0− γ ′0
(∑
i
z′0i
(3(12)z10i +3(11)βi
))+
−(∑
i
(β ′i3
(11) + z′10i3(21))
z0i
)γ0+ γ ′0V−1
γ γ0−m′γV−1γ γ0− γ ′0V−1
γ mγ
=
γ ′0
[∑
i
z′0i3(11)z0i + V−1
γ
]γ0− γ ′0
[∑
i
z′0i
(3(12)z10i +3(11)βi
)+ V−1
γ mγ
]+
−[∑
i
(β ′i3
(11) + z′10i3(21))
z0i +m′γV−1γ
]γ0, (7C.3)
from which it follows that the full conditional distribution forγ0 is a (l1 + l2 + 1)-variate normal distribution with mean
![Page 227: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/227.jpg)
Appendix 7C Full conditionals: the hierarchical model with general LIV 213
C−1
[∑
i
z′0i
(3(12)z10i +3(11)βi
)+ V−1
γ mγ
], (7C.4)
and variance-covarianceC−1, whereC =∑i z′0i3(11)z0i + V−1
γ .
Full conditional for the matrix α. Similarly, the quadratic term of the multivariatenormal distribution of the system in (7.26) is given by
(β0
z10− z2α
)′ [3(11) 3(12)
3(21) 3(22)
](β0
z10− z2α
)
∝ (7C.5)
α′z′23(22)z2α − α′z′2
(3(21)β0+3(22)z10
)−(β ′03
(12) + z′103(22))
z2α.
Adding the kernel of the prior forα (a l1l2-variate normal distribution) and the sub-scriptsi , we find that (similar as forλ) the full conditional density function forα is amultivariate normal distribution with mean (letC =∑i z′2i3
(22)z2i + V−1α ),
C−1
[∑
i
z′2i
(3(21)β0i +3(22)z10i
)+ V−1
α mα
](7C.6)
and varianceC−1.
Full conditional distribution for θi . In the following we derive the probabilitiesq0(i ),q1(i ), ...,qn−(i ) and the densityh(θi |data, rest).
Derivation of q0. We proceed in the same way as in the derivation ofq0 for thenonparametric Bayes LIV model in section 7.2. The reduced form is equal to (wedrop subscripti for the moment)
β = γc + γ1θ + (γ1α + γ2)z2+ γ1ξ + ηz1 = θ + αz2+ ξ. (7C.7)
Defineu = γ1ξ + η, which is equal to
u =(γ1ξ + ηξ
)= B
[η
ξ
](7C.8)
where
![Page 228: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/228.jpg)
214 Chapter 7 A Nonparametic Bayesian LIV approach
B =[
Ik γ10 Il1
].
Define3 = var{(u′, ξ ′)′} = B3B′. The likelihoods for structural and reduced formare equal, since the Jacobian of transformation is 1. The following is conditional on allthe other parameters. Letβ0 = β − γc− (γ1α+ γ2)z2 andz10 = z1−αz2, and defineh = (β0, z10), a (k + l1) × 1 vector. Furthermore, letν = (γ ′1, Il1)
′, a (k + l1) × l1-matrix, andµh = νθ . q0 ∝ A(h) = ρ ∫ p(h|θ, γ0, α, 3, data)dG0(θ |µθ ,Vθ ), i.e. q0is proportional to the density that results whenθ is integrated out. We examine thekernel of the product of these two multivariate normal densities:
(h− µh)′3−1(h− µh)+ (θ − µθ )′V−1
θ (θ − µθ ) ∝h′3−1h− h′3−1νθ − (νθ)′3−1h+ (νθ)′3−1νθ + θ ′V−1
θ θ+ (7C.9)
−µ′θV−1θ θ − θ ′V−1
θ µθ .
Define
κ1 = ν′3−1h
κ2 = ν′3−1ν + V−1θ (7C.10)
κ3 = κ1+ V−1θ µθ .
Then it can be shown that (7C.9) is equal to
(θ − κ−12 κ3)
′κ2(θ − κ−12 κ3)− κ ′3κ−1
2 κ3+ h′3−1h. (7C.11)
The first term is the kernel of a multivariate normal distribution and integrates to 1,and we only need to focuss on the latter two terms. In lettingA = κ−1
2 and B =3−1− 3−1ν Aν′3−1 = (3+ νVθν′)−1 we find that
h′3−1h− κ ′3κ−12 κ3 = h′ Bh− h′3−1ν AV−1
θ µθ − µ′θV−1θ Aν′3−1h, (7C.12)
from which it can be seen thatA(h) is proportional to a(k+ l1)-variate normal distri-bution with varianceB−1 and meanB−13−1ν AV−1
θ µθ = νµθ (this last equality isnot immediately clear but follows after completing all products).
Derivation ofq j . q j is proportional ton−j times the densityp(hi |θ j , α, γ0,3, data),j = 1, ..., n−, which is a(k+ l1)-dimensional multivariate normal density with mean((γc + γ1θ j + (γ1α + γ2)z2)
′, (θ j + αz2i )′)′ and variance3.
Derivation ofh(θi |rest, data). h(θi |rest, data) is proportional to
![Page 229: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/229.jpg)
Appendix 7C Full conditionals: the hierarchical model with general LIV 215
f (hi |θi , α, γ0,3, data)g0(θi |µθ ,Vθ ),
whereg0 is the probability density function of a multivariate normal distribution. Wefirst consider the kernel off (drop the subscripti , let β0 = β − γc − γ z, andz10 =z1− αz2). We have
(β0
z10− θ)′ [
3(11) 3(12)
3(21) 3(22)
](β0
z10− θ)∝ (7C.13)
∝ θ ′3(22)θ − θ ′(3(21)β0+3(22)z10)− (β03(12) + z′103
(22))θ,
and adding the kernel of the distributiong0 gives
θ ′3(22)θ − θ ′(3(21)β0+3(22)z10)− (β03(12) + z′103
(22))θ+ (7C.14)
+θ ′V−1θ θ − µ′θV−1
θ θ − θ ′V−1θ
∝θ ′(3(22) + V−1
θ )θ − θ ′(3(21)β0+3(22)z10+ V−1θ )+ (7C.15)
−(β ′03(12) + z′103(22) + µ′θV−1
θ )θ,
from which it can be seen that (writeC = 3(22) + V−1θ ) h(θi |rest, data) is a normal
density with mean
C−1(3(21)β0i +3(22)z10i + V−1
θ µθ
)
and variance-covarianceC−1.
Full conditional for µθ . We assume a multivariate normal prior density forµθ withmeanmµ and varianceVθ . Using the normal kernel of the prior andG0, and the clusterstructure of the Dirichlet process, we have
n∑
l=1
(θl − µθ
)′V−1θ
(θl − µθ
)+ (µθ −mµ
)′V−1µ
(µθ −mµ
) ∝
µ′θ(nV−1
θ + V−1µ
)µθ − µ′θ
V−1
θ
n∑
l=1
θl + V−1µ mµ
+
−
n∑
l=1
θ ′l V−1θ +m′µV−1
µ
µθ ,
![Page 230: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/230.jpg)
216 Chapter 7 A Nonparametic Bayesian LIV approach
from which it can be seen that the full conditional distribution forµθ is a multi-
variate normal distribution with meanC−1(
V−1θ
∑nl=1 θl + V−1
µ mµ
)and variance-
covariance matrixC−1, whereC = nV−1θ + V−1
µ .
Full conditional distribution for Vθ . Assuming an inverted Wishart density functionwith parameters(τV , ϒV ) as prior, and using similar arguments as forµθ , we obtainthe following full conditional distribution
p(Vθ |data, rest) ∝
n∏
l=1
|Vθ |−12 exp
{−1
2
(θl − µθ
)′V−1θ
(θl − µθ
)}×
×|Vθ |−τV+l1+1
2 exp
[−1
2tr(ϒV V−1
θ
)](7C.16)
∝ |Vθ |−τV+n+l1+1
2 exp
−1
2tr
n∑
l=1
(θl − µθ
) (θl − µθ
)′+ ϒV
V−1
θ
,
from which it follows that the full conditional forVθ is an inverted Wishart distribu-tion of dimensionl1 with parametersτV + n and
∑nl=1(θl − µθ )(θl − µθ )′ + ϒV .
Full conditional for ρ. As before, we refer to West (1992) or Escobar and West(1995) to construct the full conditional distribution for the ‘dispersion’ parameterρ.
![Page 231: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/231.jpg)
Appendix 7D Iteration plots 217
Appendix 7D Iteration plots
Figure 7D.1: Iteration plotsγ11 andk for dataset no. 1 that failed to converge (withtable 7.4).
![Page 232: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/232.jpg)
![Page 233: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/233.jpg)
Chapter 8
Discussion
The primary objective of this thesis is to develop a new method, the latent in-
strumental variables (LIV) method, to solve and test for regressor-error depen-
dencies in linear models. The traditional instrumental variables (IV) method
is limited in its use because it requires the availability of instruments of de-
cent quality. In many situations such instruments are not available. Besides,
in applications where instruments are available, the performance of inferential
procedures critically depends on the quality of such variables, and results have
to be interpreted with caution. The proposed LIV method allows for consis-
tent estimation in the presence of regressor-error dependencies and testing for
such dependencies without having observed instrumental variables at hand. In
this chapter we present the conclusions of our findings. Table 8.1 gives an
overview of the main topics and findings of the chapters. Furthermore, we
provide a discussion of the LIV model and suggest steps for further research.
8.1 Summary and conclusions
An important assumption in the linear regression model is independence of
the regressors and the error term. In chapter 2 we presented five situations
in which this assumption is questionable: (i) relevant omitted variables, (ii)
measurement error, (iii) self-selection, (iv) simultaneous equation models, and
(v) lagged dependent variables and autocorrelation. In many empirical ap-
plications one or more of these situations may apply and standard estimation
219
![Page 234: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/234.jpg)
220 Chapter 8 DiscussionC
h.S
ubjectM
odelM
ainfindings
2Literature
reviewinstrum
ental–
•B
iasO
LSin
presenceofXε–dependency
variables(IV
)m
ethod•
Possible
caveatsw
ithclassicalIV
estimation
3S
imple
LIVm
odelandtests
Linearm
odel,oneendogenous
x,•
Sim
.studies
forw
iderange
ofsettings,regr.par.
estimated
con-for
Xε–dependency
anunobserveddiscrete
instru-sistently,proposed
testspow
erfuldetectingXε–dependency
mentw
ithm>
1categories
•R
esultsinsensitive
form
isspecificationofm
•Identification
proof
4Tests
forinstrum
entweakness
Extension
ofmodelC
h.3,addex-•
Sim
.studies:
proposedtests
powerfulto
detectbadquality
IVs
andendogeneity,and
imple-
ogenousregressors
andobserved•
LIVm
odelrobustagainstmisspecifying
likelihoodm
entationissues
IVs
•D
iagnosticsto
choosemand
examine
outliers/infl.observations
5E
stimating
thereturn
toeducation
Application
ofmodelC
h.4•
Results
forthree
empiricaldatasets
•O
LSestim
atefor
schoolingbiased
upward
(≈
7%)
•Tests
indicatebad
qualityofavailable
observedIV
s
6M
ultilevelmodels
andrandom
–S
everalrandominterceptm
odels•Tests
forR
Eregressor-dependencies
arereview
edeffects
(RE
)regressor–dependen-
•S
im.
studies:bad
performance
testsand
estimation
methods
ciesin
certainsituations
•R
ecomm
endationsare
made
totargetthese
situations
7H
ierarchicalmodels
andendoge-
Hierarchicalm
odelsw
ithD
irichlet•A
lleviatingdiscreteness
assumption
latentinstrument
neityprior
processforunobserved
instru-•
Sim
ulationstudies
promising
ment
•W
orkin
progress
Table8.1:O
verviewthesis-chapters
andm
ainfindings.
![Page 235: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/235.jpg)
8.1 Summary and conclusions 221
procedures for the linear regression model are known to give biased and incon-
sistent results. Important examples are, for instance, estimating the effect of
marketing mix variables in sales response models and estimating the return to
education on income. Studies in marketing and industrial economics (Berry,
Levinsohn and Pakes, 1995, or Besanko, Gupta and Jain, 1998) find that the
estimated price response parameter in choice models is biased towards zero
when endogeneity of prices is ignored. Managerial decisions based on price
response measures that are not corrected for endogeneity are likely to have un-
derestimated the effect of a price change on sales or market share. Similarly,
policy makers that rely on the OLS estimates for the return to education (Card,
1999) find themselves over-ambitious because the true effect of education on
wages can be expected to be lower. Hence, ignoring endogeneity leads to false
conclusions and erroneous decision making.
The ‘classical’ instrumental variables (IV) method can be used to estimate
models where regressor-error dependencies may be present. This method as-
sumes that an additional set of instruments is available that can be used to
separate the endogenous regressors into an exogenous part and an endogenous
part. If the instruments are of good quality, then the IV estimates are known to
be consistent. However, the literature review given in chapter 2 points out two
problems with classical IV estimation: (i) instruments need to be available, and
(ii) performance of the IV method critically relies on the quality of the instru-
ments used. Despite (ii), these variables are often chosen on basis of ad-hoc
arguments or convenience, as in many empirical applications instruments are
not readily available. Several studies in econometrics have proposed solutions
to the problem of weak instruments (Stock, Wright and Yogo, 2002, and Hahn
and Hausman, 2003). The results from these studies present a toolbox with
methods and tests to improve on classical IV inference in presence of weak
instruments. Most of these studies, however, do not address instrument endo-
geneity and are conditional on the availability of a set of instrumental variables.
For empirical problems the question how and where to find instruments is still
open. The latent instrumental variables (LIV) method presented in chapter 3
![Page 236: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/236.jpg)
222 Chapter 8 Discussion
addresses this issue right at the heart. We propose a new method that doesnot
require the availability of observed instrumental variables. We prove that the
LIV model parameters can be identified through the likelihood and we illus-
trate the method on synthetic data. The simulation studies show that the LIV
model gives consistent results for the regression parameters and the proposed
test to test for regressor-error dependencies has a reasonable power across a
wide variety of settings. These results are obtainedwithout having observed
instrumental variables at hand. In addition, the LIV model gives identical re-
sults to classical IV estimation for a measurement error application where a
laboratory dummy instrumental variable is available. Furthermore, we show
that the LIV results are rather insensitive to misspecification of the true number
of categories of the discrete instrument. These results are important for em-
pirical researchers because our ‘instrument-free’ approach does not require the
necessity of first finding good quality instrumental variables when regressor-
error dependencies are suspected.
In chapter 4 we extend the simple LIV model by allowing for additional exoge-
nous regressors and possible available instruments. Furthermore, we discuss
several implementation issues that complete an LIV analysis. The results of the
identification proof for the more general LIV model suggest two procedures to
investigate the validity of instrumental variables: (i) a test for a zero effect
of the instrument on the endogenous variable (i.e. whether the instrument is
‘weak’), (ii) a test for a direct effect of the instrument on the dependent vari-
able (i.e. whether the instrument is exogenous). Our synthetic data results
show that the proposed procedures have a reasonable power to detect ‘bad’
quality instruments. Furthermore, our results indicate that the LIV estimates
for the regression parameters are rather insensitive to misspecification of the
true distribution of the error terms. This can be expected, since the LIV model
belongs to the class of mixture models, that are known to be flexible in adapt-
ing to a broad range of distributions.
The literature review in chapter 5 illustrates the difficulties in estimating the
return to education on income due to potential ability bias and the lack of good
![Page 237: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/237.jpg)
8.1 Summary and conclusions 223
quality instrumental variables. The LIV results for three empirical datasets in-
dicate an upward ability bias of approximately 7%. This number is close to
recent results from twin studies (Card, 1999). On the contrary, the classical
IV results are highly unstable, inconsistent with the traditional ability bias crit-
icism, and suffer from large standard deviations. We investigate the quality
of the available instrumental variables in the three datasets and compare them
with the ‘optimal’ LIV instruments. We find in two of the three applications
that the available instruments are weak and/or exogenous. In all cases the opti-
mal LIV instruments are found to be much stronger and, hence, the LIV results
are more efficient than the classical IV results. The results that we find are con-
vergent and lend credibility to the usefulness of the LIV method in empirical
settings.
Chapters 6 and 7 consider endogeneity problems in multilevel models. In many
applications data has an hierarchical structure, which introduces additional er-
ror terms and possible endogeneity-relations in the model. The model we con-
sider in chapter 6 has two levels, and endogeneity may arise at the individual-
specific level (level-one) or at the group level (level-two). In this chapter we
review previous literature on estimating random intercept models in presence
of regressor-error dependencies. Traditional methods (fixed-effects estima-
tion, the Hausman-Taylor approach, Mundlak’s approach) to solve for level-
two dependencies are shown to be limited in their use in presence of level-one
dependencies. Our results reveal that even small violations of level-one in-
dependence may lead to fallacious conclusions in applying these traditional
methods. Besides, we provide evidence that the problem of weak instruments
also applies to multilevel applications, in particular to multilevel methods that
solve for level-one dependencies, but also to the Hausman-Taylor approach to
address level-two dependencies. We argue that much work needs to be done
before problems of endogeneity in multilevel models can be adequately ad-
dressed and we present a list of open problems.
In chapter 7 we address two issues. Firstly, we present a solution for two
multilevel models discussed in chapter 6 that may suffer from regressor error-
![Page 238: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/238.jpg)
224 Chapter 8 Discussion
dependencies: the standard (random-intercept) multilevel model and the ran-
dom coefficient regression model with individual-level covariates to explain
part of the heterogeneity-variance. Furthermore, we suggest how our results
may improve on the standard Hausman-Taylor approach. Secondly, we pro-
pose a nonparametric Bayesian method to alleviate the discreteness assump-
tion of the unobserved instrument. The model can be estimated using Markov
Chain Monte Carlo methods. The advantage of a Bayesian approach is that it
provides a general framework that can be extended easily to incorporate more
general models (e.g. choice models or models with several endogenous vari-
ables). Besides, a Bayesian analysis facilitates exact small sample inference.
By assuming that the unobserved instrument has a Dirichlet prior process, the
unobserved distribution of the instrument can adapt to any distribution. As op-
posed to the LIV model, it is not necessary to specify the number of support
points of the mixture distribution since the model estimates the distribution
from the available data. We present several simulation studies and show that
the results are promising, yet several issues are still open for future research.
8.2 Limitations and future research
There are several issues concerning the LIV method that we did not address in
this thesis. We will discuss the following issues in more detail below:
• Methodological (technical) issues
– Large sample results– Identification in more general settings– Testing for a discrete instrument– Relation with classical IV estimation
• Substantive issues
– Extensions to more than one endogenous variable– Choice models and more general GLM– Self-selection problems– Comparison to Lewbel’s approach and heterogenous LIV– Straightforward testing for endogeneity
![Page 239: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/239.jpg)
8.2 Limitations and future research 225
– Generalizing the unobserved instrument
These issues mostly apply to the standard LIV model introduced in chapters 3
and 4. A discussion on and steps for further research for the Bayesian approach
in chapter 7 was given in section 7.5.
8.2.1 Methodological (technical) issues
Large sample results.Two technical issues that we did not address in this the-
sis are the consistency and the asymptotic distribution, that approximates the
finite sample distribution, of the LIV estimator. The simulation studies pre-
sented in this thesis indicate that the LIV estimates are consistent, but we have
not yet proven this.
The LIV estimates are maximum-likelihood (ML) estimates and consistency
can be examined using basic results from maximum likelihood theory (e.g.
Ferguson, 1996). Redner and Walker (1984), and Titterington, Smith and
Makov (1985) summarize large sample results for ML estimation in mixture
models, the class to which the LIV model belongs. They find that asymptotic
theory for mixtures is not always straightforward because of possible singu-
larities in the likelihood surface. Besides, the likelihood may be unbounded.
However, Titterington, Smith and Makov (1985) state that the regularity con-
ditions for consistency and asymptotic normality are satisfied in many well
known and commonly occurring cases.
It may be more interesting, however, to investigate whether the regression pa-
rameterβ, which is not a mixing parameter in the LIV model, can be estimated
consistently by maximum likelihood when the model fitted has fewer compo-
nents than the actual model. In other words, can consistency be proven for
m = 2, regardless of whether the true value form is larger than two. The
simulation studies in section 3.5 suggest a positive answer to this question.
Besides, if one has a set of strong instruments at hand, then adding a few ad-
ditional instruments does not change the asymptotic results in a classical IV
framework.
![Page 240: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/240.jpg)
226 Chapter 8 Discussion
Two recent articles (Cheng and Liu, 2001, and Zhu and Zhang, 2004) estab-
lish asymptotic theory for comparing nested mixture models in which case the
distribution is represented by a subset in the parameter space. Their results
suggest that under certain regularity conditions the ML estimator converges
to an arbitrary point in this subset, and quantities of interest such as means
or variances may be estimated consistently even though the distribution is not
uniquely represented. These results are supported by our simulation results in
section 3.5. Andrews (1999) considers asymptotic theory for extremum esti-
mators (e.g. ML) when a parameter is on the boundary. His results are inter-
esting because he establish conditions under which the asymptotic distribution
of a subvector of the parameter is not affected by the true values of another
sub-vector being on a boundary of a parameter space. For instance, he shows
that for a random coefficient model, the quasi-ML estimator for the regression
coefficients are asymptotically normal whether or not some of the random co-
efficient variances are zero. His theory appears to be very general and may be
applicable to the LIV model. The conditions he establishes, however, may be
difficult to verify.
The LIV model in reduced form is quite similar to measurement error models,
although standard measurement error models assume zero covariance between
the errors. As mentioned in section 2.4, the grouping results of Wald (1940)
and Madansky (1959) are similar in thought to the grouping idea of the LIV
model. Wald and Madansky assume that a grouping of the data into two groups
exists, or can be constructed. Once a ‘valid’ grouping is available, a line can
be drawn, because it is determined by two points. This line is estimated con-
sistently under certain conditions (e.g. Neymann and Scott, 1951). Madansky
also considers another grouping method from an ANOVA point of view, where
ki observations forXi , i = 1, ..., k, are available. He shows that the within
mean square error and between mean square error can be used to obtain a con-
sistent estimate forβ when the grouping is independent of the model error1,
hence, consistency is independent ofk. The LIV model does not assume prior
1See also his discussion on the Housner-Brennan estimate (p. 189 - p. 191).
![Page 241: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/241.jpg)
8.2 Limitations and future research 227
existence of such a grouping and uses mixture methodologies to classify the
sample into groups. The results that we found using synthetic data also suggest
that consistency of the LIV estimate forβ does not depend on the number of
categories chosen for the discrete instrument.
Another model closely related to the LIV model is a measurement error model
considered by Kiefer and Wolfowitz (1956), who prove the consistency of the
ML estimator in the presence of infinitely many incidental parameters. The
model considered is
Xi 1 = αi + ui
Xi 2 = θ01+ θ20αi + νi , (8.1)
where(νi ,ui ) have a bivariate normal distribution with mean zero and a co-
variance matrix consisting of the elements{d11,d12,d22}. They find that the
maximum-likelihood estimates for(θ1, θ2) are strongly consistent, given that
d11, d22, and d11d22 − d212 are bounded away from zero. Reiersøl (1950)
proves for normally distributed errors thatθ1 andθ2 are nonidentifiable if and
only if X1, X2 are constants or normally distributed (cf. Madansky, 1959, p.
180). Something similar was observed in chapter 7 using the nonparametric
Bayesian LIV model. Furthermore, the mixture approach for measurement
error models advocated by Carroll, Roeder and Wasserman (1999), and their
discussion, may be applicable to our framework as well.
Although we have not proven consistency of the maximum-likelihood esti-
mates for the LIV model introduced in chapters 3 and 4, the simulation studies
presented in this thesis suggest that they are. Furthermore, the articles cited
above consider similar models, and provide intuition for the simulation results
found, and a possible starting point to formally prove consistency and asymp-
totic normality ofβLIVn .
Identification in more general settings.Identification of all LIV model param-
eters was proven in chapters 3 and 4 assuming a bivariate normal distribution
![Page 242: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/242.jpg)
228 Chapter 8 Discussion
for the error terms(ε, ν). Although a mixture of normals can adapt to a broad
class of distributions (Kim, Menzefricke and Feinberg, 2004), it is desirable
to generalize the LIV model to allow for non-normally distributed error terms.
In some applications, for instance, the normality assumption may be too re-
strictive and a more robust or general specification (e.g.t , gamma, logistic, or
Gumbel distributions) may be desirable. We found in subsection 4.5.2 that the
LIV model appears to be fairly robust against misspecified errors, although in
case of severe misspecification of the error distribution of the regression equa-
tion this may present a problem. In such a case, a more robust distribution for
the errors may circumvent this.
The existence of a discrete instrument.Identification of the LIV model re-
quires the existence of a discrete instrument with at least two categories. Sub-
sequently, a likelihood-framework can be used to estimate the regression pa-
rameters. Two important questions that were not considered in this thesis are:
(i) is it possible to test for the existence of a discrete instrument, and (ii) what
happens if the category meansπ in (3.1) for k = 2 are not very distinct, i.e.
||π2− π1|| is small?
Recent studies (Cheng and Liu, 2003, and Zhu and Zhang, 2004) have devel-
oped tests to test for a simpler mixture model versus a full mixture model, i.e.
tests of the formH0 : λ(1−λ)||π1−π2|| = 0 versusH1 : λ(1−λ)||π1−π2|| 6=0. These tests may be applicable to the LIV model to investigate the assump-
tion of the existence of a discrete instrument. However, given that mixture
models are often used to approximate continuous distributions, we feel that
the discreteness assumption, which does not imply thatx is discrete, is not
limiting in most empirical applications. Besides, many classical IV studies
rely on discrete instruments.
The second question is an important issue in the mixture model literature and
is closely related to the information matrix and the Mahalanobis distance be-
tween mixture components. It is known that if the mixture components do not
separate well, large sample sizes may be required to obtain precise maximum-
![Page 243: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/243.jpg)
8.2 Limitations and future research 229
likelihood estimates (e.g. Redner and Walker, 1984, or Titterington, Smith,
and Makov, 1985). Something similar was observed in subsection 3.5.3 where
we found for synthetic data that usingm > 2 in the LIV model, increases the
occurrence of degenerate solutions. This is not much of an issue in most ap-
plications since the latent category instrument is a ‘nuisance’ parameter rather
than of theoretical interest. However, estimation may be problematic if the true
distribution of the unobserved instrument consists of only two groups that are
not well separated. In this case the model is weakly identified and this issue is
related to (i). The distribution of the latent instrument is now very close to a
normal distribution, or a constant. Deriving the actual information matrix may
give some insights in these issues. Furthermore, increasing the sample size and
EM-algorithm estimation may improve estimation results in such situations.
Relation with classical IV estimation. The basic LIV model does not as-
sume the existence of observed instrumental variables, and identification is
established through the likelihood. The classical IV approach assumes the ex-
istence of good quality instrumental variables and the model parameters can
be identified via the first two moments or via the likelihood. Although we ar-
gued and showed in both synthetic and real data examples that the LIV model
results are rather insensitive to the different choices form, to different shapes
of the distribution ofx, or to a modest misspecification of the likelihood, re-
searchers who have been using the traditional instrumental variables approach
(i.e. identification via theory and observed data) may be skeptical in adopting
the latent instrumental variables approach. In this research we have not ex-
plicitly pursued the relation with classical instrumental variables, because the
main goal is to formulate a new method that does not require such instruments
(an exception is the study in section 3.6). However, in order to introduce the
LIV method to more traditional IV users, we feel that future research should
emphasize the relation between LIV and classical IV. This can be done in one
or more of the following four ways.
Firstly, as was shown before, the LIV estimates can be used to obtain an a pos-
teriori clustering of the data using Bayes’ rule, which gives the ‘optimal’ LIV
![Page 244: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/244.jpg)
230 Chapter 8 Discussion
instrumentZ, an×m matrix. This instrument matrix can be used to compute
a 2SLS estimate for the regression parameters. In a simulation study the fol-
lowing questions can be investigated: (1) are the 2SLS estimates forβ usingZ
similar to the LIV estimates, (2) is the optimal LIV instrumentZ uncorrelated
with ε, (3) what is theR2 of a regression ofx on Z compared to theR2 of a
regression ofx on the true (discrete)Z, and (4) what is the relation betweenx
andx = ZπLIV . For the simulation results presented in section 3.5 we find that
the 2SLS estimate, based on LIV instruments, yields approximately similar
results (means and standard deviations) to the maximum likelihood (LIV) esti-
mate ofβ (in most cases the values are exactly identical, but for the unimodel
case with eight instruments there are small differences). We also examined the
correlation betweenx andε, and the correlation betweenx andx. We found,
on average, that the correlation betweenx and the true errors is approximately
zero, while the correlation betweenx andx was found to be much larger than
zero. Although these preliminary findings suggest that the LIV predicted in-
struments are possibly ‘optimal’, because they are not correlated withε and
are of considerable strength, future research is needed to give more conclusive
results.
Secondly, in empirical applications the LIV instrumentsZ can be profiled us-
ing (additional) observed data. The results in section 3.6, for instance, illustrate
that the predicted LIV instrument is identical to the laboratory temperature ef-
fect. We have not yet found interpretations for the predicted instruments for
‘schooling’ in chapter 5. However, if an instrument can be given a sensible
interpretation, it may inspire confidence in the results found, or even point out
new theories that can be used in subsequent studies to obtain instrumental vari-
ables.
Thirdly, another empirical validation of the LIV model for schooling applica-
tions (chapter 5) can be obtained using twin or sibling data. In twin or sibling
studies the schooling parameter is estimated using a fixed-effects estimator be-
cause unobserved ‘ability’ cancels out within families (see also section 5.3.3
and chapter 6). Ideally, both methods should give similar results. In addition,
![Page 245: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/245.jpg)
8.2 Limitations and future research 231
the predictive validity of the estimated LIV model can be examined using the
transformed ‘within-family’ data, since differences in years of schooling of
twins or siblings is exogenous, because the effect of omitted ability is elimi-
nated. However, to assess predictive validity, the schooling variable has to be
measured without error, which is questionable, see recent results on twin stud-
ies (e.g. Bonjour et al., 2003, Hertz, 2003, Isacsson, 2004).
Finally, it is interesting to investigate in what situations the LIV model can
be used to improve efficiency in standard IV models if ‘valid’ observed in-
struments are available. Since IV estimates often suffer from large standard
deviations, addition of an unobserved discrete instrument may improve on ef-
ficiency. Furthermore, the more traditional IV users are now still identifying
the model through a priori formed theories or reasoning. The simulation study
in section 4.4 indirectly addresses this issue and we found that combining ob-
served instruments with a latent discrete instrument may be beneficial.
8.2.2 Substantive issues
Extensions to more than one endogenous variable.Although one right-hand
side endogenous variable is the most commonly occurring situation (cf. Hanh
and Hausman, 2003), applications may suffer from two or more endogenous
regressors. For instance, marketing managers not only set prices based on
unobserved information, but also other marketing mix variables like advertis-
ing or shelf-space location (Chintagunta, Kadiyali, and Vilcassim, 2003, Man-
chanda, Rossi, and Chintagunta, 2004). Furthermore, in estimating the return
to schooling it is common to include measures for experience and squared
experience that are constructed from ‘years of schooling’, and hence also en-
dogenous (Verbeek, 2000).
The nonparametric Bayes approach in chapter 7 is applicable to problems with
more than one endogenous variable. The standard LIV model in (3.1) can
be extended to (say)l endogenous variables by taking forxi a (l × 1)-vector
and extending the variance-covariance matrix6 to a(l + 1)× (l + 1) matrix.
Hence, the more general LIV model is a mixture of(l + 1)-dimensional mul-
![Page 246: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/246.jpg)
232 Chapter 8 Discussion
tivariate normal distributions. The identification proof has to be modified and
we suspect that a discrete instrument with at least two categories has to exist
for each endogenous variable. Consequently, the resulting mixture LIV model
hasm ≥ 2l categories. Simulation studies and theoretical results need to be
obtained prior to applying the outlined approach to empirical applications.
Choice models and more general linear models (GLM).The models consid-
ered in this thesis are simple linear models. However, for many applications
the linearity assumption is too restrictive whereas endogeneity may be present.
For instance, most studies cited in subsection 2.1.4 and section 2.3 (methods
that model demand, cost, and competition) are choice models. An interesting
and important extension of the simple LIV model is a generalization to this
class of models.
Observed choices can be modeled using a random utility framework. It is as-
sumed that the alternative with the highest utility is chosen. Lety j denote the
(unobserved) utility derived from choosing alternativej = 1, ...,m, and letc
be the observed choice. Thenc = j if y j = maxl=1,...,m yl . The model for the
unobserved utility is just a standard linear model. If the errors are assumed to
have a normal distribution and one of the explanatory variables is endogenous,
then model (3.1) can be augmented with the maximum utility framework to
obtain a ‘LIV-probit’ model. Furthermore, the LIV approach can be applied
to the type of problems and the linearization of choice models introduced by
Berry (1994) and Berry, Levinsohn, and Pakes (1995), that has recently gener-
ated a stream of subsequent research.
However, extending endogeneity issues to general nonlinear models is not
straightforward. Dube and Chintagunta (2003) argue that “Characterizing [en-
dogeneity] bias is not straightforward in the context of non-linear models [...]
it is unclear how strong the correlation between prices and [the errors] must
be to generate statistical bias. [...] It is also unclear how the endogeneity bias
will manifest itself in the estimates”. Cramer (2004) considers omitted vari-
ables bias in discrete models. He observes that “Even if the omitted variable
![Page 247: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/247.jpg)
8.2 Limitations and future research 233
is orthogonal to the other regressors, its effect shows up in the variance of
the disturbance. Since the slope coefficients of discrete models are scaled by
the standard deviation, [...] the remaining coefficients are depressed towards
zero”. Furthermore, he finds that the omitted variables bias may be larger be-
cause of a misspecification of the disturbances. Mullahy (1997) considers de-
pendence of covariates and unobservables in count data models. He observes
that the standard assumption of separable additivity of the unobservables from
the parametric structural model does generally not hold. Hence, even certain
nonlinear IV estimators (e.g. Bowden and Turkington, 1984) may not be con-
sistent. He proposes an alternative approach based on transforming the basic
model that may be more appropriate to use. Foster (1997) also notes that tra-
ditional instrumental variables estimation does not simply extend to non-linear
models. He proposes a non-linear two stage least squares estimator for a logit
model, but the comments made by Mullahy (1997) may still apply. See also
Blundell and Powell (2001a,b) for a more detailed discussion. From this dis-
cussion it becomes clear that extending the LIV approach to general nonlinear
model is of great importance, yet nontrivial because researchers do not agree
on how to model endogeneity in such models.
Self-selection problems.As discussed before, self-selection issues arise when
an individual tends to select itself in a certain state (treated vs. non-treated,
internet user vs. non-user) in a non-random way. A simple self-selection model
is given byyi = β0 + β1di + εi , wheredi is zero or one, depending on the
‘state’ of individual i . This model is similar to (3.1) with a single discrete
endogenous regressor. The LIV model, however, assumes that the endogenous
regressor is a continuous variable. But, using similar arguments as above for
choice models, the LIV approach can possible be extended by incorporating a
probit model forxi to handle self-selection problems.
Comparison to Lewbel’s approach and heterogenous LIV.As mentioned in
section 2.3, Lewbel’s approach (Lewbel, 1997, Erickson and Whited, 2002)
is in spirit similar to the LIV approach in the sense that Lewbel’s approach
also does not require the availability of observed instrumental variables. In-
![Page 248: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/248.jpg)
234 Chapter 8 Discussion
stead, Lewbel proposes to construct instruments from the available data based
on higher-order moment restrictions. Subsequently, 2SLS or GMM estimates
can be computed to estimate the regression parameters. The identifying condi-
tions for Lewbel’s approach are not similar to the conditions for the LIV model
(see also appendix 6C). Hence, it is interesting to compare the performance of
the LIV- and Lewbel estimates forβ under the different identifying conditions
using synthetic data.
For instance, identification for the Lewbel estimator requires that the distribu-
tion of the unobserved instrument is non-symmetric. The LIV model, however,
is not restricted to non-symmetric distributions, as was shown in (e.g.) section
3.5. Secondly, as opposed to the LIV model, Lewbel’s approach requiresβ1 in
(3.1) to be nonzero, and situations where it is close to zero are weakly iden-
tified. On the other hand, the LIV model assumes the existence of a discrete
unobserved instrument. If, for instance, the true distribution of the instrument
is a skewed gamma distribution, Lewbel’s method can be used, whereas the
LIV model is ‘technically’ not identified, because all observations belong to
the same group (m= 1). However, as stated before, mixture models are gener-
ally used to approximate continuous distributions, a property that also extends
to the LIV model. This was illustrated for the nonparametric Bayes model in
chapter 7, and the standard LIV model was estimated for a situation where the
latent instrument had a skewed gamma distribution. The LIV model in chap-
ters 3 and 4 assumes that the mixture components forx have equal variances.
This assumption may be too restrictive to approximate general continuous bi-
variate densities of(y, x). Hence, an interesting development is to extend the
LIV model to the class of heterogenous mixture models where the varianceσ 2ν
in (3.2) can be different for each groupj = 1, ...,m. This model may be very
robust in adapting to any distribution.
We emphasize that Lewbel’s method for measurement error models has not
yet been extended to models with general regressor-error dependencies. The
results presented in appendix 6C for a general multilevel model have, to the
best of our knowledge, not appeared in the literature before.
![Page 249: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/249.jpg)
8.2 Limitations and future research 235
Straightforward testing for endogeneity without having instruments.We pro-
posed two tests to test for endogeneity in standard linear models without hav-
ing observed instruments at hand: a Hausman test (section 3.4) and a Wald
test (section 4.6). Both were shown to have a reasonable power to detect an
endogenous regressor. Another asymptotical equivalent test is a Lagrange-
multiplier test (e.g. Greene, 2000). The potential advantage of this test is that
it operates under the restricted model, i.e. whenσεν = 0 (x is endogenous). As
such, the model parametersβ andσ 2ε can be estimated by OLS in a standard
statistical package, and estimates for the group meansπ , the group sizesλ, and
σ 2ν , can be obtained using standard software for mixture models. Subsequently,
the estimated values can be substituted in the gradient vector (evaluated at the
restricted parameter vector), which should give a vector of zeros, at least within
the range of sample variability, if the restrictions are valid.
The only complicated step is to evaluate the score vector, that is based on the
first-order derivatives in appendix 3B. However, once these derivatives are
programmed, this test is potentially easy to apply, because it does not require
the availability of observed instrumental variables, and may serve as a standard
diagnostic tool to investigate endogeneity in linear regression estimation.
Generalizing the unobserved instrument.Finally, an interesting empirical ques-
tion is whether the exogenous part (i.e. the unobserved discrete instrument) of
the endogenous regressor can be profiled and given an interpretation. We elab-
orated on this before and suggested to examine the posterior classifications.
Alternatively, one can investigate this formally by using a concomitant mix-
ture model (Wedel and Kamakura, 2000) in which case the prior group sizesλ
are made dependent on individual level covariates, i.e.
λ j |i =exp
(γ0 j + v′i γ j
)∑k
l=1 exp(γ0l + v′i γl
) , (8.2)
for j = 1, ..., k. The parameterγ j represents the effect of the concomitant
variablesvi on the prior probabilitiesλ j . As such, each observation has its
![Page 250: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/250.jpg)
236 Chapter 8 Discussion
own prior probabilityλ j |i of belonging to thej -th group of the discrete instru-
ment. This generalizes the standard LIV model where the observations have
the same prior probabilitiesλ j . An important question is to investigate under
which conditions inclusion of concomitant variables yields improved results.
For instance, if thevi are observed instrumental variables, this approach may
give more efficient results than classical IV estimation and simple LIV estima-
tion.
Furthermore, it is interesting to investigate whether a generalization of the prior
distribution of the latent instrument can identify patters of endogeneity. For in-
stance, Dube and Chintagunta (2003) observe for the results obtained by Yang,
Chen and Allenby (2003), that the pattern of endogeneity is most pronounced
at the lower price levels. In other applications similar observations can possi-
bly be made and the pattern of endogeneity may depend on certain covariates.
In summary, we believe that the LIV method is a powerful approach to address
endogeneity issues, it is simple to implement, and it presents an avenue for
further research and future applications that can shed light on the issues raised
in this discussion.
![Page 251: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/251.jpg)
Bibliography
Anderson, T. W. and Rubin, H. (1949). Estimators on the parameters of a singleequation in a complete set of stochastic equations.Annals of Mathemati-cal Statistics, 21:570–582.
Andrews, D. W. K. (1999). Estimation when a parameter is on the boundary.Econometrica, 67:1341–1383.
Andrews, R. L. and Currim, A. S. (2003). Retention of latent segments inregression-based marketing models.International Journal of Research inMarketing, 20:315–321.
Angrist, J. D. (1990). Lifetime earnings and the vietnam era draft lottery:Evidence from social security administrative records.The American Eco-nomic Review, 80:313–336.
Angrist, J. D., Imbens, G. W., and Krueger, A. B. (1999). Jacknife instrumentalvariables estimation.Journal of Applied Econometrics, 14:57–67.
Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causaleffects using instrumental variables.Journal of the American StatisticalAssociation, 91:444–455.
Angrist, J. D. and Krueger, A. B. (1991). Does compulsory school attendanceaffect schooling and earnings?The Quarterly Journal of Economics,56:979–1014.
Antoniak, C. E. (1974). Mixtures of dirichlet processes with applications tobayesian nonparametric problems.The Annals of Statistics, 2:1152–1174.
Apostol, T. M. (1969).Calculus, 2nd Ed., Vol. 1: One-Variable Calculus, withan Introduction to Linear Algebra. Blaisdell, Waltham (MA).
Aptech (2000). GAUSS Language Reference. Aptech Systems, Inc., MapleValley.
237
![Page 252: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/252.jpg)
238 Bibliography
Arellano, M. (2002). Sargan’s instrumental variables estimation and the gen-eralized method of moments.Journal of Business & Economic Statistics,20:450–459.
Arellano, M. and Bover, O. (1995). Another look at the instrumental variablesestimation of error-components models.Journal of Econometrics, 68:29–51.
Asher, H. B. (1983). Causal modelling (2nd edition). InQuantitative Appli-cations in the Social Sciences, No. 07-003. Sage Publications, NewburyPark (CA).
Bagozzi, R. P., Yi, Y., and Nassen, K. D. (1999). Representation of measure-ment error in marketing variables: Review of approaches and extensionto three-faced designs.Journal of Econometrics, 89:393–421.
Baltagi, B. H. (2001). Econometric Analysis of Panel Data. John Wiley &Sons, Ltd, Chichester.
Bekker, P. A. (1994). Alternative approximations to the distributions of instru-mental variable estimators.Econometrica, 62:657–681.
Bekker, P. A. and Kleibergen, F. (2003). Finite-sample instrumental vari-ables inference using an asymptotic pivotal statistic.Econometric Theory,19:744–753.
Belsley, D. A., Kuh, E., and Welsch, R. E. (1980).Regression Diagnostics:Identifying Influential Data and Sources of Collinearity. John Wiley &Sons, Inc., New York.
Berry, S. (2003). Comment: Bayesian analysis of simultaneous demand andsupply.Quantitative Marketing and Economics, 1:251–275.
Berry, S., Levinsohn, J., and Pakes, A. (1995). Automobile prices in marketequilibrium. Econometrica, 63:841–890.
Berry, S. T. (1994). Estimating discrete-choice models of product differentia-tion. The RAND Journal of Economics, 25:242–262.
Besanko, D., Dube, J.-P., and Gupta, S. (2000). Heterogeneity and target mar-keting using aggregate retail data: A structural approach. Cornell Univer-sity.
Besanko, D., Gupta, S., and Jain, D. (1998). Logit demand estimation undercompetitive pricing behavior: An equilibrium framework.ManagementScience, 44:1533–1547.
![Page 253: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/253.jpg)
Bibliography 239
Biernacki, C., Celeux, G., and Govaert, G. (2000). Assessing a mixture modelfor clustering with the integrated completed likelihood.IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 22:719 – 725.
Blackburn, M. L. and Neumark, D. (1993). Omitted-ability bias and the in-crease in the return to schooling.Journal of Labor Economics, 11:521–544.
Blundell, R. and Powell, J. L. (2001a). Endogeneity in nonparametric andsemiparametric regression models.Working paper, University CollegeLondon.
Blundell, R. and Powell, J. L. (2001b). Endogeneity in semiparametric binaryresponse models.CEMMAP working paper, CWP05/01.
Bonjour, D., Cherkas, L. F., Haskel, J. E., Hawkes, D. D., and Spector, T. D.(2003). Returns to education: Evidence from U.K. twins.The AmericanEconomic Review, 93:1799–1812.
Bound, J. and Jaeger, D. A. (1996). On the validity of season of birth as an in-strument in wage equations: A comment on Angrist and Krueger’s “doescompulsory school attendance affect schooling and earnings?”. TechnicalReport 5835, NBER.
Bound, J., Jaeger, D. A., and Baker, R. M. (1995). Problems with instrumentalvariables estimation when the correlation between the instruments andthe endogenous explanatory variable is weak.Journal of the AmericanStatistical Association, 90:443–450.
Bowden, R. J. and Turkington, D. A. (1984).Instrumental Variables. Cam-bridge University Press, New York.
Bronnenberg, B. J. and Mahajan, V. (2001). Unobserved retailer behavior inmultimarket data: Joint spatial dependence in market shares and promo-tion variables.Marketing Science, 20:284–299.
Brooks, S. P. and Roberts, G. O. (1998). Convergence assessment techniquesfor Markov Chain Monte Carlo.Statistics and Computing, 8:319–335.
Bryk, A. S. and Raudenbush, S. W. (1992).Hierarchical Linear Models, Ap-plications and Data Analysis Methods. Sage Publications, Newbury Park,CA.
Buse, A. (1992). The bias of instrumental variables estimators.Econometrica,60:173–180.
![Page 254: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/254.jpg)
240 Bibliography
Card, D. (1995). Using geographical variation in college proximity to estimatethe return to schooling. In Christofides, L. N., Grant, E., and Swidinsky,R., editors,Aspects of Labour Market Behaviour: Essays in Honour ofJohn Vanderkamp, pages 201–222. University of Toronto Press, Toronto.
Card, D. (1999). The causal effect of education on earnings. In Ashenfelter,O. C. and Card, D., editors,Handbook of Labor Economics, volume 3A,pages 1801–1863. Elsevier Science B.V., North-Holland.
Card, D. (2001). Estimating the return to schooling: Progress on some persis-tent econometric problems.Econometrica, 69:1127–1160.
Carroll, R. J., Roeder, K., and Wasserman, L. (1999). Flexible parametricmeasurement error models.Biometrics, 55:44–54.
Carroll, R. J., Ruppert, D., and Stefanski, L. A. (1995).Measurement Error inNonlinear Models. Chapman & Hall, London.
Chamberlain, G. (1980). Analysis of covariance with qualitative data.TheReview of Economic Studies, 47:225–238.
Chamberlain, G. (1982). Multivariate regression models for panel data.Jour-nal of Econometrics, 18:5–46.
Chamberlain, G. (1984). Panel data. In Griliches, S. and Intriligator, M. D., ed-itors,Handbook of Econometrics, Volume II, pages 1247–1318. Elsevier,Amsterdam: North Holland.
Cheng, R. C. H. and Liu, W. B. (2001). The consistency of estimators in finitemixture models.The Scandinavian Journal of Statistics, 28:603–616.
Chintagunta, P. K. (2001). Endogeneity and heterogeneity in a probit demandmodel: Estimation using aggregate data.Marketing Science, 20:442–456.
Chintagunta, P. K., Kadiyali, V., and Vilcassim, N. J. (2003). Endogeneityand simultaneity in competitive pricing and advertising: A logit demandanalysis.Working paper, University of Chicago.
Chintagunta, P. K., Dube, J.-P., and Goh, K. Y. (2004). Beyond the endogeneitybias: The effect of unmeasured brand characteristics on household-levelbrand choice models.Working paper, University of Chicago.
Cook, R. D. and Weisberg, S. (1982).Residuals and Influence in Regression.Chapman and Hall, New York.
![Page 255: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/255.jpg)
Bibliography 241
Cowles, M. K. and Carlin, B. P. (1996). Markov Chain Monte Carlo con-vergence diagnostics: A comparative review.Journal of the AmericanStatistical Association, 91:883–904.
Cramer, M. . (2004). Omitted variable bias in discrete models.Working paper,Tinbergen institute.
Davidson, R. and MacKinnon, J. G. (1993).Estimation and Inference inEconometrics. Oxford University Press, New York.
Dey, D., Muller, P., and Sinha, D. (1998).Practical Nonparametric and Semi-parametric Bayesian Statistics. Springer-Verlag, New York.
Dhrymes, P. J. (2003). Tests for endogeneity and instrument suitability.Work-ing paper, Columbia University.
Dijk, van, A., Heerde, van, H. J., Leeflang, P. S. H., and Wittink, D. R. (2004).Similarity-based spatial methods for estimating shelf space elasticitiesfrom correlational data.Quantitative Marketing and Economics, 2:257–277.
Donald, S. G. and Newey, W. K. (2001). Choosing the number of instruments.Econometrica, 69:1161–1191.
Draganska, M. and Jain, D. (2004). A likelihood approach to estimating marketequilibrium models.Management Science, 50:605–616.
Dube, J.-P. and Chintagunta, P. K. (2003). Comment: Bayesian analysis of si-multaneous demand and supply.Quantitative Marketing and Economics,1:293–298.
Ebbes, P., Bockenholt, U., and Wedel, M. (2004). Regressor and random-effects dependencies in multilevel models.Statistica Neerlandica,58:161–178.
Erickson, T. (2001). Constructing instruments for regressions with measure-ment error when no additional data are available: Comment.Economet-rica, 69:221–222.
Erickson, T. and Whited, T. M. (2002). Two-step GMM estimation of theerrors-in-variables model using high-order moments.Econometric The-ory, 18:776–799.
Escobar, M. D. (1994). Estimating normal means with a dirichlet process prior.Journal of the American Statistical Association, 89:268–277.
![Page 256: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/256.jpg)
242 Bibliography
Escobar, M. D. and West, M. (1995). Bayesian density estimation and in-ference using mixtures.Journal of the American Statistical Association,90:577–588.
Escobar, M. D. and West, M. (1998). Computing bayesian nonparametric hi-erarchical models. In Dey, D., Muller, P., and Sinha, D., editors,Practi-cal Nonparametric and Semiparametric Bayesian Statistics, pages 1–22.Springer-Verlag, New York.
Fahrmeir, L. and Tutz, G. (1994).Multivariate Statistical Modelling Based onGeneralized Linear Models. Springer-Verlag, New York.
Ferguson, T. S. (1973). A bayesian analysis of some nonparametric problems.The Annals of Statistics, 1:209–230.
Ferguson, T. S. (1996).A Course in Large Sample Theory. Chapman & Hall,New York.
Foster, E. M. (1997). Instrumental variables for logistic regression: An illus-tration. Social Science Research, 26:487–504.
Fox, J. (1991).Regression Diagnostics. Sage Publications, inc., London.
Fuller, W. (1977). Some properties of a modification of the limited informationestimator.Econometrica, 45:939–953.
Garen, J. (1984). The returns to schooling: A selectivity bias approach with acontinuous choice variable.Econometrica, 52:1199–1218.
Gasmi, F., Laffont, J. J., and Vuong, Q. (1992). Econometric analysis of collu-sive behavior in a soft-drink market.Journal of Economics and Manage-ment Strategy, 1:277–311.
Goldstein, H. (1995).Multilevel Statistical Models. John Wiley & Sons Ltd.,New York.
Gonul, F. F., Kim, B.-D., and Shi, M. (2000). Mailing smarter to catalogcustomers.Journal of Interactive Marketing, 14:2–16.
Greene, W. H. (2000).Econometric Analysis. Prentice-Hall, Inc., Upper Sad-dle River, New Jersey.
Griliches, Z. (1977). Estimating the returns to schooling: Some econometricproblems.Econometrica, 45:1–22.
![Page 257: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/257.jpg)
Bibliography 243
Hahn, J. (2002). Optimal inference with many instruments.Econometric The-ory, 18:140–168.
Hahn, J. and Hausman, J. (2002). A new specification test for the validity ofinstrumental variables.Econometrica, 70:163–189.
Hahn, J. and Hausman, J. (2003). Weak instrumens: Diagnosis and cures inempirical econometrics.Recent Advances in Econometric Methodology,93:118–125.
Hamilton, B. H. and Nickerson, J. A. (2003). Correcting for endogeneity instrategic management research.Strategic Organization, 1:51–78.
Harmon, C. and Walker, I. (1995). Estimates of the economic return to school-ing for the united kingdom.American Economic Review, 85:1278–1286.
Hartog, J. (1988). An ordered response model for allocation and earnings.Kyklos, 41:113–141.
Hausman, J. A. (1978). Specification tests for econometrics.Econometrica,46:1251–1271.
Hausman, J. A. and Taylor, W. E. (1981). Panel data and unobservable indi-vidual effects.Econometrica, 49:1377–1398.
Hennig, C. (2000). Identifiability of models for clusterwise linear regression.Journal of Classification, 17:273–296.
Hertz, T. (2003). Upward bias in the estimated returns to education: Evidencefrom south africa.The American Economic Review, 93:1354–1368.
Honore, B. O. and Hu, L. (2004). On the performance of some robust instru-mental variables estimators.Journal of Business & Economic Statistics,22:30–39.
Hsiao, C. (1986).Analysis of Panel Data. Cambridge University Press, NewYork.
Ibrahim, J. G. and Kleinman, K. P. (1998). Semiparametric bayesian methodsfor random effects models. In Dey, D., Muller, P., and Sinha, D., editors,Practical Nonparametric and Semiparametric Bayesian Statistics, pages89–114. Springer-Verlag, New York.
Im, K. S., Ahn, S. C., Schmidt, P., and Wooldridge, J. M. (1999). Efficientestimation of panel data models with strictly exogenous explanatory vari-ables.Journal of Econometrics, 93:177–201.
![Page 258: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/258.jpg)
244 Bibliography
Isacsson, G. (2004). Estimating the economic return to educational levels usingdata on twins.Journal of Applied Econometrics, 19:99–119.
Judge, G. G., Griffiths, W. E., Hill, R. C., Lutkepohl, H., and Lee, T.-C. (1985).The Theory and Practice of Econometrics. John Wiley & Sons Inc., NewYork.
Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihoodestimator in the presence of infinitely many incidental parameters.Annalsof Mathematical Statistics, 27:887–906.
Kim, J. G., Menzefricke, U., and Feinberg, F. M. (2004). Assessing hetero-geneity in discrete choice models using a dirichlet process prior.Reviewof Marketing Science, 2:1–39.
Kleibergen, F. (2002). Pivotal statistics for testing structural parameters ininstrumental variables regression.Econometrica, 70:1781–1803.
Kleibergen, F. and Zivot, E. (2003). Bayesian and classical approaches toinstrumental variables regression.Journal of Econometrics, 114:29–72.
Leeflang, P. S. H. (1994).Probleemgebied Marketing: De Marktinstrumenten.Stenfert Kroese, Houten.
Lenk, P. J. (2001). Bayesian inference and Markov Chain Monte Carlo.Notes,University of Michigan.
Lenk, P. J., DeSarbo, W. S., Green, P. E., and Young, M. R. (1996). Hierar-chical bayes conjoint analysis: Recovery of partworth heterogeneity fromreduced experimental designs.Marketing Science, 15:173–191.
Lewbel, A. (1997). Constructing instruments for regressions with measure-ment error when no additional data are available, with an application topatents and R&D.Econometrica, 65:1201–1213.
Longford, N. T. (1993).Random Coefficient Models. Oxford University Press,New York.
MacEachern, S. N. (1998). Computational methods for mixture of dirichletprocess models. In Dey, D., Muller, P., and Sinha, D., editors,Practi-cal Nonparametric and Semiparametric Bayesian Statistics, pages 23–43.Springer-Verlag, New York.
Madansky, A. (1959). The fitting of straight lines when both variables aresubject to error.Journal of the American Statistical Association, 54:173–205.
![Page 259: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/259.jpg)
Bibliography 245
Manchanda, P., Rossi, P. E., and Chintagunta, P. K. (2004). Response mod-eling with non-random marketing mix variables.Journal of MarketingResearch, forthcoming.
Meng, X.-L. (1994). Posterior predictive P-values.The Annals of Statistics,22:1142–1160.
Mroz, T. A. (1987). The sensitivity of an empirical model of married women’shours of work to economic and statistical assumptions.Econometrica,55:765–799.
Mullahy, J. (1997). Instrumental-variable estimation of count data models:Applications to models of cigarette smoking behavior.The Review ofEconomics and Statistics, pages 586–593.
Mundlak, Y. (1978). On the pooling of time-series and cross section data.Econometrica, 46:69–85.
Naik, P. A., Shi, P., and Tsai, C.-L. (2003). Extending Akaike informationcriterion to mixture regression models.Working paper.
Nelson, C. R. and Startz, R. (1990). Some further results on the exact smallsample properties of the instrumental variable estimator.Econometrica,58:967–976.
Nevo, A. (2000). A practitioner’s guide to estimation of random-coefficientslogit models of demand.Journal of Economics & Management Strategy,9:513–548.
Nevo, A. (2001). Measuring market power in the ready-to-eat cereal industry.Econometrica, 69:307–342.
Neyman, J. and Scott, E. L. (1951). On certain methods of estimating the linearstructural relation.The Annals of Mathematical Statistics, 22:352–361.
Pagan, A. (1984). Econometric issues in the analysis of regressions with gen-erated regressors.International Economic Review, 25:221–247.
Petrin, A. and Train, K. (2000). Omitted product attributes in discrete choicemodels.Working paper, University of Berkeley.
Plat, F. W. (1988).Modelling for Markets: Applications of Advanced Modelsand Methods for Data Analysis. PhD thesis, Rijksuniversiteit Groningen.
Ploeg, van der, J. (1997).Instrumental Variable Estimation and Group-Asymptotics. PhD thesis, SOM, University of Groningen.
![Page 260: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/260.jpg)
246 Bibliography
Pudney, S. E. (1978). The estimation and testing of some error componentsmodels. Technical report, London school of economics.
Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likeli-hood and the EM algorithm.SIAM Review, 26:195–239.
Reiersøl, O. (1950). Identifiability of a linear relation between variables whichare subject to error.Econometrica, 18:375–389.
Ruud, P. A. (2000).An Introduction to Classical Econometric Theory. OxfordUniversity Press, New York.
Sargan, J. D. (1958). The estimation of economic relationships using instru-mental variables.Econometrica, 26:393–415.
Sargan, J. D. (1959). The estimation of relationships with autocorrelated resid-uals by the use of instrumental variables.Journal of the Royal StatisticalSociety, Series B, 21:91–105.
Sellke, T., Bayarri, M. J., and Berger, J. O. (2001). Calibration of P-values fortesting precise null hypotheses.The American Statistician, 55:62–71.
Shugan, S. M. (2004). Endogeneity in marketing decision models.MarketingScience, 23:1–3.
Snijders, T. A. B. and Bosker, R. J. (1999).Multilevel Analysis. SAGE Publi-cations, London.
Spencer, N. H. and Fielding, A. (1998a). A comparison of modelling strategiesfor value-added analyses of educational data.Working paper, Universityof Hertfordshire.
Spencer, N. H. and Fielding, A. (1998b). An instrumental variable consistentestimation procedure to overcome the problem of endogenous variablesin multilevel models.Working paper, University of Hertfordshire.
Staiger, D. and Stock, J. H. (1997). Instrumental variables regression withweak instruments.Econometrica, 65:557–586.
Stern, S. (2004). Do scientist pay to be scientist?Management Science,50:835–853.
Stock, J. H., Wright, J. H., and Yogo, M. (2002). A survey of weak instrumentsand weak identification in generalized method of moments.Journal ofBusiness & Economic Statistics, 20:518–529.
![Page 261: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/261.jpg)
Bibliography 247
Sudhir, K. (2001). Competitive pricing behavior in the auto market: A struc-tural analysis.Marketing Science, 20:42–60.
Teicher, H. (1963). Identifiability of finite mixtures.The Annals of Mathemat-ical Statistics, 34:1265–1269.
Titterington, D. M., Smith, A. F. M., and Makov, U. E. (1985).StatisticalAnalysis of Finite Mixture Distributions. John Wiley & Sons Ltd., Chich-ester.
Uusitalo, R. (1999).Essays in Economics of Education. PhD thesis, Universityof Helsinki.
Vella, F. (1998). Estimating models with sample selection bias: A survey.TheJournal of Human Resources, 33:127–169.
Vella, F. and Verbeek, M. (1998). Whose wages do unions raise? a dynamicmodel of unionism and wage rate determination for young men.Journalof Applied Econometrics, 13:163–183.
Verbeek, M. (2000).A Guide to Modern Econometrics. John Wiley & SonsLtd., Chichester.
Vilcassim, N. J. and Chintagunta, P. K. (1995). Investigating retailer productcategory pricing from household scanner panel data.Journal of Retailing,71:103–128.
Villas-Boas, J. M. and Winer, R. S. (1999). Endogeneity in brand choice mod-els. Management Science, 45:1324–1338.
Wald, A. (1940). The fitting of straight lines if both variables are subject toerror. The Annals of Mathematical Statistics, 11:284–300.
Wang, P., Puterman, M. L., Cockburn, I., and Le, N. (1996). Mixed poissonregression models with covariate dependent rates.Biometrics, 52:381–400.
Wansbeek, T. and Meijer, E. (2000).Measurement Error and Latent Variablesin Econometrics. Elsevier, Amsterdam.
Wedel, M. and Kamakura, W. A. (2000).Market Segmentation. Kluwer Aca-demic Publishers, Boston.
Weisstein, E. W. (2004a). Multinomial distribution.From Mathworld – AWolfram Web Resource.
![Page 262: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/262.jpg)
248 Bibliography
Weisstein, E. W. (2004b). Leptokurtic.From Mathworld – A Wolfram WebResource.
Weisstein, E. W. (2004c). Power.FromMathworld– A Wolfram Web Resource.
West, M. (1992). Hyperparameter estimation in dirichlet process mixture mod-els. ISDS Discussion Paper, no. 92-A02, Duke University.
West, M., Muller, P., and Escobar, M. D. (1994). Hierarchical priors and mix-ture models, with applications in regression and density estimation. InFreeman, P. R. and Smith, A. F. M., editors,Aspects of Uncertainty, aTribute to D. V. Lindley, pages 363–386. John Wiley & Sons Ltd., Chich-ester.
White, H. (1980). A heteroscedasticity-consistent covariance matrix estimatorand a direct test for heteroscedasticity.Econometrica, 48:817–838.
White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica, 50:1–25.
White, H. (2001).Asymptotic Theory for Econometricians. Academic Press,New York.
Wooldridge, J. M. (2002).Econometric Analysis of Cross Section and PanelData. Massachusetts Institute of Technology, Cambridge.
Yakowitz, S. J. and Spragins, J. D. (1968). On the identifiability of finitemixtures.The Annals of Mathematical Statistics, 39:209–214.
Yang, S., Chen, Y., and Allenby, G. M. (2003). Bayesian analysis of simultane-ous demand and supply.Quantitative Marketing and Economics, 1:251–275.
Zhu, H.-T. and Zhang, H. (2004). Hypothesis testing in mixture regressionmodels.Journal of the Royal Statistical Society, Series B, 66:3–16.
![Page 263: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/263.jpg)
Author index
Ahn, S. C., 158, 168Allenby, G. M., 18, 29, 30, 33,
236Anderson, T. W., 25Andrews, D. W. K., 226Andrews, R. L., 95Angrist, J. D., 12, 17, 18, 21, 26,
114, 119, 146Antoniak, C. E., 175, 178, 206,
207Apostol, T. M., 71Aptech, 39, 49Arellano, M., 2, 18, 19, 154Asher, H. B., 14
Bagozzi, R. P., 11, 146Baker, R. M., 17, 19–21, 35, 38,
85, 89, 94, 96, 112, 119, 135,159, 162, 163
Baltagi, B. H., 149, 152, 167Bayarri, M. J., 195Bekker, P. A., 22, 25, 36Belsley, D. A., 78, 96, 101Berger, J. O., 195Berry, S. T., 13, 29, 221, 232Besanko, D., 13, 29, 221Biernacki, C., 95, 96Blackburn, M. L., 121Blundell, R., 164, 233Bockenholt, U., 145, 172Bonjour, D., 231Bosker, R. J., 7, 146, 152Bound, J., 17, 19–21, 35, 38, 85,
89, 94, 96, 112, 119, 135,159, 162, 163
Bover, O., 154Bowden, R. J., 2, 13, 17, 22–24,
27, 32, 35, 38, 60, 94, 112,147, 148, 158, 163, 164,166, 168, 233
Bronnenberg, B. J., 18, 30Brooks, S. P., 183, 203Bryk, A. S., 7, 146Buse, A., 20, 38, 94
Card, D., 5, 8, 10, 13, 18, 27,77, 111–113, 116–122, 135–137, 139, 146, 221, 223
Carlin, B. P., 183, 203
Carroll, R. J., 11, 27, 146, 163,205, 227
Celeux, G., 95, 96Chamberlain, G., 153, 165Chen, Y., 18, 29, 30, 33, 236Cheng, R. C. H., 226, 228Cherkas, L. F., 231Chintagunta, P. K., 9, 27, 29, 30,
153, 231, 232, 236Cockburn, I., 78, 97Cook, R. D., 78, 96, 101Cowles, M. K., 183, 203Cramer, M., 232Currim, A. S., 95
Davidson, R., 14, 15, 17, 23, 45,163
DeSarbo, W. S., 184, 198Dey, D., 6, 173, 175, 179, 186Dhrymes, P. J., 103Dijk, van, A., 30Donald, S. G., 23, 26, 162Draganska, M., 29Dube, J.-P., 27, 29, 30, 232, 236
Ebbes, P., 145, 172Erickson, T., 28, 32, 36, 233Escobar, M. D., 172, 175–180,
182, 183, 186, 207, 211, 216
Fahrmeir, L., 101Feinberg, F. F., 33, 173, 228Ferguson, T. S., 2, 16, 114, 117,
175, 206, 225Fielding, A., 146Foster, E. M., 233Fox, J., 78, 96, 101Fuller, W., 26
Garen, J., 118Gasmi, F., 13Goh, K. Y., 27, 30Goldstein, H., 149, 167Gonul, F. F., 15Govaert, G., 95, 96Green, P. E., 184, 198Greene, W. H., 2, 7, 15, 28, 31, 35,
45, 47, 64, 78, 86, 98, 100,
![Page 264: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/264.jpg)
250 Author index
102, 103, 105, 111, 112,146, 164, 166, 187, 209, 235
Griffiths, W. E., 2, 7, 9, 14, 115,146
Griliches, Z., 10, 111, 114, 116–118, 121
Gupta, S., 13, 29, 221
Hahn, J., 2, 19–22, 24, 26, 31, 36,160, 162, 221, 231
Hamilton, B. H., 12, 146Harmon, C., 114, 118, 121Hartog, J., 123Haskel, J. E., 231Hausman, J. A., 2, 19–22, 24, 26,
31, 36, 37, 147, 148, 151–153, 155, 160, 162, 166,203, 221, 231
Hawkes, D. D., 231Heerde, van, H. J., 30Hennig, C., 36, 41, 67Hertz, T., 121, 231Hill, R. C., 2, 7, 9, 14, 115, 146Honore, B. O., 98Hsiao, C., 167Hu, L., 98
Ibrahim, J. G., 172Im, K. S., 158, 168Imbens, G. W., 12, 26, 146Isacsson, G., 231
Jaeger, D. A., 17, 19–21, 35, 38,85, 89, 94, 96, 112, 119, 135,159, 162, 163
Jain, D., 13, 29, 221Judge, G. G., 2, 7, 9, 14, 115, 146
Kadiyali, V., 231Kamakura, W. A., 235Kiefer, J., 227Kim, B.-D., 15Kim, J. G., 33, 173, 228Kleibergen, F., 17, 19, 20, 25, 159Kleinman, K. P., 172Krueger, A. B., 18, 21, 26, 114,
119Kuh, E., 78, 96, 101
Laffont, J. J., 13Le, N., 78, 97
Lee, T.-C., 2, 7, 9, 14, 115, 146Leeflang, P. S. H., 18, 30Lenk, P. J., 184, 188, 198Levinsohn, J., 13, 29, 221, 232Lewbel, A., 28, 32, 36, 160, 163,
168, 204, 233Liu, W. B., 226, 228Longford, N. T., 149, 167Lutkepohl, H., 2, 7, 9, 14, 115,
146
MacEachern, S. N., 178MacKinnon, J. G., 14, 15, 17, 23,
45, 163Madansky, A., 28, 32, 36, 60, 63,
64, 163, 226, 227Mahajan, V., 18, 30Makov, U. E., 33, 41, 45, 225, 229Manchanda, P., 153, 231Meijer, E., 11, 27, 36, 146, 163Meng, X.-L., 195Menzefricke, U., 33, 173, 228Mroz, T. A., 123, 139Mullahy, J., 233Muller, P., 6, 173, 175, 179, 180,
182, 186Mundlak, Y., 148, 152
Naik, P. A., 95Nassen, K. D., 11, 146Nelson, C. R., 21, 159Neumark, D., 121Nevo, A., 8–10, 13, 18, 29Newey, W. K., 23, 26, 162Neyman, J., 226Nickerson, J. A., 12, 146
Pagan, A., 98Pakes, A., 13, 29, 221, 232Petrin, A., 9, 27Plat, F. W., 11, 146Ploeg, van der, J., 14, 17, 32, 36,
37Powell, J. L., 164, 233Pudney, S. E., 164Puterman, M. L., 78, 97
Raudenbush, S. W., 7, 146Redner, R. A., 45, 58, 225, 229Reiersøl, O., 227Roberts, G. O., 183, 203Roeder, K., 205, 227
![Page 265: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/265.jpg)
Author index 251
Rossi, P. E., 153, 231Rubin, D. B., 12, 146Rubin, H., 25Ruppert, D., 11, 27, 146, 163Ruud, P. A., 8, 147
Sargan, J. D., 19, 23Schmidt, S. P., 158, 168Scott, E. L., 226Sellke, T., 195Shi, M., 15Shi, P., 95Shugan, S. M., 9, 93Sinha, D., 6, 173, 175, 179, 186Smith, A. F. M., 33, 41, 45, 225,
229Snijders, T. A. B., 7, 146, 152Spector, T. D., 231Spencer, N. H., 146Spragins, J. D., 41, 42, 44, 67Staiger, D., 19, 22, 24, 25, 35, 36,
112, 159, 163Startz, R., 21, 159Stefanski, L. A., 11, 27, 146, 163Stern, S., 27Stock, J. H., 2, 18–20, 22, 24–26,
31, 35, 36, 112, 159, 163,221
Sudhir, K., 10, 11, 13, 29
Taylor, W. E., 147, 151–153, 155,203
Teicher, H., 41, 42, 67Titterington, D. M., 33, 41, 45,
225, 229Train, K., 9, 27Tsai, C.-L., 95Turkington, D. A., 2, 13, 17, 22–
24, 27, 32, 35, 38, 60, 94,112, 147, 148, 158, 163,164, 166, 168, 233
Tutz, G., 101
Uusitalo, R., 8, 111, 118, 120, 146
Vella, F., 12, 13, 146Verbeek, M., 12, 15, 22, 27, 32,
112, 114, 118, 119, 122,139, 146, 147, 149, 155,167, 231
Vilcassim, N. J., 9, 231Villas-Boas, J. M., 9Vuong, Q., 13
Wald, A., 27, 32, 163, 226Walker, H. F., 45, 58, 225, 229Walker, I., 114, 118, 121Wang, P., 78, 97Wansbeek, T., 9, 11, 27, 36, 146,
163Wasserman, L., 205, 227Wedel, M., 9, 145, 172, 235Weisberg, S., 78, 96, 101Weisstein, E. W., 48, 80, 128Welsch, R. E., 78, 96, 101West, M., 172, 175–177, 179,
180, 182, 183, 211, 216White, H., 2, 14, 16, 17, 27, 48,
101, 146, 147, 163, 164, 166Whited, T. M., 28, 32, 233Winer, R. S., 9Wittink, D. R., 30Wolfowitz, J., 227Wooldridge, J. M., 7, 8, 13, 16,
17, 20, 27, 84, 121, 123,146–148, 158, 164, 168
Wright, J. H., 2, 18–20, 22, 24–26, 31, 221
Yakowitz, S. J., 41, 42, 44, 67Yang, S., 18, 29, 30, 33, 236Yi, Y., 11, 146Yogo, M., 2, 18–20, 22, 24–26,
31, 221Young, M. R., 184, 198
Zhang, H., 226, 228Zhu, H.-T., 226, 228Zivot, E., 17, 19, 20, 159
![Page 266: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/266.jpg)
Subject index
Xα-dependency, 146Tests, 151
Xη-dependency, 146External instruments, 158Tests, 158
2SLS, 16–26, 35, 85–94, 130–134, 158, 166, 168see also Instrumental vari-ables approach
3SLS, 158, 168
Ability bias, 4, 9, 113–114AIC, 94–96AIC3, 94–96Asymptotic distribution LIV esti-
mator, 225–227
Between estimator, definition,166–168
BIC, 94–96
CAIC, 94–96Chamberlain’s approach, 153Choice models, 232–233Competition, see Simultaneous
equationsConcomitant mixture models, 235Conjoint Analysis, 184Consistency LIV estimator, 225–
227Control function approach, 27Cook’s distance, 101–102Covariance estimator,seeFixed-
effects estimatorCOVRATIO, 101–102
Degeneracy, 57, 125Demand-cost models,seeSimul-
taneous equationsDirichlet process, 173, 175–179,
184–186, 206–207
Endogeneity, 2, 8Exogeneity, 2, 8, 35
Test,seeTesting
Fixed-effects (FE) estimator, defi-nition, 166–168
Generalized least squares (GLS)estimator, definition, 166–168
Generalized linear models(GLM), 232–233
Goodness-of-fit diagnostics,seeLIV model
Gradient, 46, 68–69, 85, 108Numerical optimization, 70
Halo effect, 11Hausman test, 22, 24, 35, 47–48,
53–54, 102–105, 147, 151Negative values, 103
Hausman-Taylor approach, 151–155
Hessian, 46, 71–76, 85, 108Heterogeneity bias, 116–118, 164Heterogenous LIV, 233–235Heteroscedasticity, 97, 101, 128–
130
ICL, 94–96Identification, 39–44, 67, 80–85,
227–228Definition, 40Theorem LIV, 43, 82
Indicator IV method, 27Influential observations, 96–102,
128–130Information matrix, 44–47, 229Instrumental variables (IV) ap-
proach, 2, 16–31, 166Multilevel, 158–161, 168Relation with LIV, 229–231see also2SLS
Instruments‘Optimal’ LIV, seePredictedLIV instrumentsConsiderations, 17–24Discrete, 32, 36, 37, 60, 122Endogenous, 20, 86–94External, 36, 154, 158–160,174Internal, 154, 160–161Latent,seeLIV modelNatural/laboratory, 60–65Number of, 23Pitfalls, 17–21
![Page 267: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/267.jpg)
Subject index 253
Validity, 17–26, 85–94Weak, 2, 19–21, 85, 96, 159,162
Internet bias,seeSelf-selection
Jacknife LIV, 101–102
Lagged dependent variables, 14–15, 27
Lagrange multiplier test, 105, 235Least squares dummy (LSDV) es-
timator, definition, 166–168Level-1 dependency,see Xη-
dependencyLevel-2 dependency,see Xα-
dependencyLewbel’s approach, 28, 32, 36,
160–161, 163, 168–170,233–235
Likelihood distance, 101–102LIML, 16–26, 35, 85, 162LIV model, 37–39, 79
Model diagnostics, 94–102,228Nonparametric Bayes, 173–191Residuals, 97–98Simple, 38, 40, 41, 48
LIV-IV estimator, 78, 106, 229–231
LIV-probit model, 232
Mailing catalogues, 15Market response model, 9, 153MCMC algorithm, 179–183,
186–191Measurement error, 10–11, 27,
60–65, 114–116, 184Method errors, 11, 63Model diagnostics,seeLIV modelMundlak’s approach, definition,
152–153
Nonparametric Bayesian LIVmodel, 171–205
Normality assumption of the er-rors,seeRobustness issues
OLS estimatorBias, 15–16Consistency, 15–16
Omitted variables bias, 8–10, 27,113–114, 184
Online versus offline stores,seeSelf-selection
Optimizing behavior, 118Outer product of gradient (OPG)
matrix, 45Outliers, 96–102, 128–130
Polya urn, 178PosteriorP-values, 195Power, definition, 48Predicted LIV instruments, 60–
65, 96, 98, 131–133, 229–231
PriceEndogeneity, 9, 28–30Measurement error bias, 11Omitted variables bias, 9Simultaneity bias, 13
Promotional effects,seeLaggeddependent variables
Proxy-variable OLS method, 27,121
Random coefficient model, 164,184
Random intercept model, 149,174
Random-effects (RE) estimator,definition, 166–168
RegressorsEndogenous, definition, 2, 8Exogenous, definition, 2, 8
Residual analysis, 96–102, 128–130
Return to education,seeSchool-ing
Robustness issuesMisspecification of the errordistribution, 98–101, 228Mixture models, 33, 228see alsoModel diagnostics
SchoolingAbility bias, 9, 113–114Heterogeneity bias, 116–118IV estimation, 118–122LIV results, 124–136Measurement error bias, 10,114–116
![Page 268: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/268.jpg)
254 Subject index
Optimizing behavior bias,118Proxies for ability, 121Twin studies, 121, 135, 231
Selection ofm, 94–96Self-selection, 12–13, 233Simultaneous equations, 13–14,
27–30Spatial Econometrics, 30–31Supply-Demand-Competition
models, 28–30
TestingFor existence of a latent in-strument, 228–229For exogenous instruments,23, 86, 133–134For instrument validity, 22,85–94
For regressor-error correla-tion, 24, 47–48, 102–105For weak instruments, 23,85–86, 131–133see alsoHausman test
Wald test, 85, 86, 102–105Wald’s method, 27, 32, 60–65,
226Weak instruments
‘Classical’ solutions to, 24–26seeInstrumentsseeTesting
White’s consistent covariance ma-trix, 48, 101
Within-groups estimator, seeFixed-effects estimator
![Page 269: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/269.jpg)
Samenvatting (summary in Dutch)
In dit proefschrift introduceren en ontwikkelen we een nieuwe methode, de la-tente instrumenten (LIV) methode, die bijvoorbeeld gebruikt kan worden omhet volgende probleem op te lossen. Een verkoopster van ijsjes is geınteres-seerd in de prijsgevoeligheid van haar verkopen. Dat wil zeggen, ze wil derelatie tussen de prijs van haar ijsjes en de verkopen weten. Daartoe verandertze voor een lange periode de ijscoprijs regelmatig en noteert de gerealiseerdeverkopen. Deze twee variabelen zijn beschikbaar voor analyse. Echter, zijvertelt niet aan de onderzoeker dat ze een relatief hogere prijs voor een ijsjevraagt voor warme dagen: een hogere temperatuur leidt tot een toename vanhet aantal badgasten die, vanwege de warmte, een hoge bereidwilligheid zullenhebben om een ijsje te consumeren. Om het voorbeeld eenvoudig te houdennemen we aan dat de relatie tussen de ijscoprijs en de verkopen, gegeven dedata, kan worden bepaald middels het veelgebruikte lineaire regressie model.Dat wil zeggen,St = α + βPt + εt , waarSt staat voor het aantal verkochteijsjes (“sales”) op dagt en Pt is de op tijdstipt gehanteerde prijs. De on-bekende parameterβ geeft de prijsgevoeligheid weer en is naar verwachtingnegatief. Omdat dit model een vereenvoudigde voorstelling van de werkelijk-heid is, wordt er altijd een (kleine) foutεt gemaakt. Zo is bijvoorbeeld hettemperatuur effect ‘onderdeel’ van deze storingsterm. Nadat de gegevens vande verkopenS en de prijzenP beschikbaar zijn, ligt het voor de hand om eenschatting voorβ te maken door middel van de standaard lineaire regressie tech-niek (“the ordinary least squares (OLS) method”). Deze techniek is gebaseerdop een aantal veronderstellingen, waaronder de aanname dat de prijsP en destoringstermε onafhankelijk van elkaar zijn. Indien dat niet het geval is, is deberekende schatting voor de prijsgevoeligheidβ op basis van de OLS methodemet zekerheid onjuist. In dit voorbeeld geldt deze onafhankelijkheidsaannameniet. Omdat de verkoopster informatie over de temperatuur gebruikt om haarprijs te bepalen, hangt de prijs af van de temperatuur en zijn de variabeleP enε afhankelijk. Er wordt dus een foute prijsgevoeligheid berekend met de stan-daard OLS methode, omdat een deel van het temperatuureffect onterecht wordttoebedeeld aan het prijseffect. Soortgelijke problemen komen in meer realis-tische toepassing regelmatig voor. In dit proefschrift stellen we een nieuwemethode voor die in dergelijke situaties wel het juiste antwoord kan geven.
Het zojuist gebruikte model, het lineaire regressie model, en de genoemdetechniek om de onbekende parameters te schatten, de OLS methode, is, inalgemenere vorm, een veel gebruikte methode in de praktijk en in de weten-schap om de relatie tussen een afhankelijke variabele, bijvoorbeeld verkopen,en onafhankelijke variabelen, bijvoorbeeld prijs, te bepalen. In dit proefschrift
![Page 270: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/270.jpg)
256 Samenvatting (summary in Dutch)
beschouwen we situaties waarin een onafhankelijke variabele gecorreleerd ismet de storingsterm. In dit geval is die variabele niet exogeen, maar endo-geen en is er een endogeniteitsprobleem. Dergelijke afhankelijkheden kunnenvoorkomen in verschillende situaties, bijvoorbeeld wanneer verklarende vari-abelen onterecht niet in het model zijn opgenomen, zoals de variabele tempera-tuur in het bovengenoemde voorbeeld, wanneer de afhankelijke variabele ookde onafhankelijke variabelen beınvloedt (simultaniteit), of wanneer de onaf-hankelijke variabelen meetfouten bevatten. De standaard inferentie proceduresleiden tot foute conclusies. Bijvoorbeeld, de OLS schatters voor de regressieparameters zijn onzuiver en niet consistent, en de werkelijke effecten van deregressoren op de afhankelijke variabele worden systematisch onder- of over-schat. Een manager of beleidsmaker die dergelijke resultaten gebruikt om eenbeslissing te nemen, kan een grote fout maken.
In de wetenschap bestaat er een techniek die, in plaats van OLS, kan wordengebuikt als de regressoren correleren met de storingsterm. Dit is de klassiekeinstrumenten methoden (“the instrumental variables (IV) method”). Deze me-thode heeft een lange geschiedenis in de econometrische literatuur en werktglobaal als volgt. Men verzamelt additionele variabelen, de zogeheten in-strumenten, die moeten voldoen aan de volgende eisen: (1) de instrumentenverklaren een gedeelte van de variantie in de endogene regressoren, en (2) deinstrumenten zijn onafhankelijk van de storingsterm. Voor het bovengenoemdeijsjes probleem dient men duseen of meerdere variabelen te vinden die de ijs-coprijs verklaren, maar onafhankelijk zijn van de temperatuur. Wanneer der-gelijke instrumenten beschikbaar zijn, kan men de regressie parameters con-sistent schatten middels, bijvoorbeeld, twee-stap regressie technieken. Echter,dergelijke additionele variabelen zijn vaak niet beschikbaar of voldoen niet aande voorgenoemde twee voorwaarden. In het eerste geval kan de instrumentenschatter niet worden berekend en in het tweede geval kan deze methode ergonbetrouwbare resultaten produceren. Deze resultaten zijn soms slechter dansimpelweg het endogeniteitsprobleem te negeren en de OLS methode te ge-bruiken in de wetenschap dat OLS een fout antwoord levert. Ondanks deze kri-tieken worden instrumenten in de praktijk vaak op ad-hoc basis, of simpelwegop basis van beschikbaarheid, gedefinieerd. De problemen met de klassiekeinstrumenten methode en de mogelijke inconsistentie van de populaire stan-daard OLS schatter vormen het startpunt van dit onderzoek. Wij introducereneen nieuwe methode, de latente instrumenten methode (LIV methode), die hetendogeniteitsprobleem kan oplossenzondergebruik te maken van additioneledata. Onze methode schat de ‘perfecte’ instrumenten uit de beschikbare data,waardoor de regressie parameters consistent kunnen worden geschat, ongeachtde aanwezigheid van mogelijke regressor-storingsterm afhankelijkheden.
![Page 271: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/271.jpg)
Samenvatting (summary in Dutch) 257
De bovengenoemde klassieke instrumenten (IV) methode en de gerelateerdepotentiele problemen worden in detail besproken in hoofdstuk 2. We presen-teren een literatuur overzicht dat een groot gedeelte omvat van de studies overde klassieke instrumenten methode, met name de situatie waarin men slechtsbeschikt over zwakke instrumenten, en een groot aantal empirische studiesmet endogeniteitsproblemen. Tevens wijzen we op een aantal alternatieveresultaten, die gebruikt kunnen worden om een endogeniteitsprobleem op telossen. Dit overzicht toont aan dat endogeniteitsproblemen in veel situatiesvoorkomen en niet altijd eenvoudig kunnen worden opgelost met de beschik-bare standaard methoden. In de conclusie van dit hoofdstuk motiveren we indetail de ontwikkeling van de LIV methode.
In hoofdstuk 3 introduceren we de latente instrumenten methode, “the latentinstrumental variables (LIV) method”. Deze methode neemt aan dat er eeninstrument bestaat dat discreet en latent (ongeobserveerd) is. Evenals in hetklassieke IV model wordt er aangenomen dat de endogene regressor kan wor-den opgesplitst in een exogeen gedeelte en een endogeen gedeelte. De aan-name dat het instrument discreet is betekent dat de steekproef kan wordenopgedeeld in groepen en de aanname dat het instrument latent is betekent datdeze opdeling niet wordt waargenomen. We stellen voor om gebruik te makenvan mengsel modellen (mixture models) om deze opdeling te schatten. Onzeaanpak maakt het tevens mogelijk om te toetsen op de afwezigheid van endo-geniteit, zonder te eisen dat geobserveerde instrumenten beschikbaar zijn. Deuitgevoerde simulatie studies tonen aan dat de LIV methode de regressie pa-rameters consistent schat, terwijl er geen gebruik is gemaakt van additioneledata. Tevens laten de resultaten zien dat onze methode superieur is aan de stan-daard OLS methode als de regressoren niet exogeen zijn. We laten zien dat devoorgestelde toetsingsmethode een redelijk sterk vermogen heeft om endoge-niteit te detecteren. Verder bewijzen we dat de model parameters geıdenti-ficeerd zijn. We passen de LIV methode toe op een meetfouten probleem waarwe de beschikking hebben over een discreet instrument dat afkomstig is vaneen experimentele omgeving. Het blijkt dat het geschatte LIV instrument iden-tiek is aan het geobserveerde instrument. Dit is een belangrijk empirisch resul-taat, omdat in de meeste studies in economie en marketing men niet beschiktover experimentele data. Deze resultaten tonen aan dat de LIV methode suc-cesvol kan worden toegepast in een situatie waar een onafhankelijke variabeleis gecorreleerd met de storingsterm van het model. De resultaten zijn nietafhankelijk van de beschikbaarheid en de kwaliteit van geobserveerde instru-menten, in tegenstelling tot de klassieke IV methode.
![Page 272: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/272.jpg)
258 Samenvatting (summary in Dutch)
In hoofdstuk 4 breiden we het LIV model verder uit. We nemen diverse ex-ogene regressoren op in het model. Daarnaast beschouwen we een situatiewaarin geobserveerde instrumenten beschikbaar zijn. We laten zien hoe dezevariabelen kunnen worden opgenomen in het LIV model. We gebruiken soort-gelijke technieken als in hoofdstuk 3 om aan te tonen dat alle model parametersidentificeerbaar zijn. De identificatie resultaten suggereren een nieuwe me-thode om de validiteit van de geobserveerde instrumenten te bestuderen. Wekunnen nagaan of de beschikbare instrumenten voldoen aan de twee bovenge-noemde eisen. Dergelijke methoden zijn, voor zover ons bekend, nog nieteerder ontwikkeld en zijn van belang voor empirische onderzoekers, gegevendat de resultaten van de klassieke instrumenten methode sterk afhangen vande kwaliteit van de gebruikte instrumenten. We illustreren de voorgesteldemethode op gesimuleerde data en tonen aan dat de aanpak succesvol is inhet identificeren van instrumenten van slechte kwaliteit. Daarnaast stellen weeen aantal diagnostieken voor om na te gaan of het LIV model de data goedbeschrijft. Verder laten we zien dat de LIV schattingen redelijk robuust zijntegen foute model specificaties (verdelingsaanames) en uitschieters in de data.
In hoofdstuk 5 onderzoeken we de relatie tussen ‘opleiding’ en ‘inkomen’.We maken hierbij gebruik van de technieken ontwikkeld in de voorgaandehoofdstukken. De variabele ‘opleiding’ is mogelijk endogeen omdat een goedemaatstaf voor de mogelijkheid om iets te bereiken, ‘kunnen’ (“ability”), veelalontbreekt in de beschikbare data. Iemand die van nature succesvol is, zal naarwaarschijnlijkheid langer een opleiding volgen en tevens in staat zijn om meerinkomen te genereren, ongeacht zijn opleiding. De standaard OLS methodeom het opleidingseffect te bepalen kan daardoor niet worden vertrouwd. Wepresenteren een overzicht van de opleiding’s literature en hieruit volgt dat ereen groot aantal onderzoeken zijn gedaan naar het effect van ‘opleiding’ op‘inkomen’. Echter blijkt ook dat er nog geen bevredigend antwoord is gevon-den omdat geschikte instrumenten veelal ontbreken. Daarnaast zijn veel re-sultaten verkregen met de klassieke instrumenten (IV) methode tegenstrijdigin termen van hoe groot en wat het teken van de fout in de OLS schatter voorhet opleiding’s effect is. Deze resultaten worden toegeschreven aan het ge-bruik van slechte instrumenten. De resultaten van onze LIV methode hangenniet af van dergelijke geobserveerde, en mogelijk slechte, instrumenten. Wijvinden voor drie datasets dat de onzuiverheid in de OLS schatter ongeveer7% bedraagt. Dat wil zeggen, opleiding lijkt meer waard te zijn volgensde standaard OLS methode dan dat het in werkelijkheid is. Onze resultatenkomen sterk overeen met recente conclusies van studies die gebruik maken vantweelingen. We vinden dat het LIV model de data voldoende goed beschrijft.Daarnaast vinden we voor twee van de drie datasets dat de beschikbare instru-
![Page 273: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/273.jpg)
Rijksuniversiteit Groningen
Latent Instrumental VariablesA New Approach to Solve for Endogeneity
Proefschrift
ter verkrijging van het doctoraat in deEconomische Wetenschappen
aan de Rijksuniversiteit Groningenop gezag van de
Rector Magnificus, dr. F. Zwarts,in het openbaar te verdedigen op
donderdag 23 december 2004om 11.00 uur
door
Peter Ebbes
geboren op 28 mei 1976te Smilde
![Page 274: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/274.jpg)
Samenvatting (summary in Dutch) 259
menten niet voldoen aan de hiervoor genoemde aannames en dus van slechtekwaliteit zijn. We concluderen dat de door ons gevonden resultaten, gebaseerdop het LIV model, geprefereerd dienen te worden boven de standaard OLSmethode en de klassieke instrumenten methode.
In hoofdstuk 6 beschouwen we endogeniteit in multiniveau modellen. Derge-lijke modellen worden gebruikt wanneer de data een hierarchische structuurheeft. Zoals voor het simpele regressie model wordt in dit soort modellenverondersteld dat de regressoren onafhankelijk zijn van de stochastische com-ponenten (“random effects”) in het model. Indien men beschikt over bijvoor-beeld panel data of gegevens van tweelingen of broers/zussen, dan kunnenmultiniveau modellen in bepaalde gevallen worden gebruikt om het endogeni-teitsprobleem op te lossen. Er zijn diverse veelgebruikte schattingstechniekendie in dergelijke situaties gebruikt kunnen worden. In dit hoofdstuk laten wezien dat men moet oppassen met het toepassen van deze technieken. Onze sim-ulatie studies tonen aan in welke gevallen deze methoden het juiste antwoordgeven, en in welke gevallen deze methoden niet werken. We concluderen datde huidige methoden, alhoewel veel toegepast, gelimiteerd zijn in hun gebruiken we suggereren stappen voor verder onderzoek. Het blijkt dat de regressors-random effect afhankelijkheid in diverse econometrische studies is onderzocht,maar dit belangrijke probleem heeft nauwelijks aandacht gekregen in de soci-aal wetenschappelijk en gedragswetenschappelijke literatuur.
De resultaten van hoofdstuk 6 worden verder uitgebreid in hoofdstuk 7. Weintroduceren niet-parametrische Bayesiaanse methoden om regressor-randomeffect correlaties op verschillende niveaus in het model op te lossen. Dezemethode kan een aantal van de openstaande vragen in hoofdstuk 6 beantwoor-den. Daarnaast generaliseert deze aanpak het LIV model in de hoofdstukken3 en 4, omdat er nu een algemene verdeling voor het ongeobserveerde instru-ment wordt aangenomen in plaats van een (discrete) multinomiale verdeling.Een bijkomend voordeel is dat het nu niet meer noodzakelijk is om vooraf hetaantal categorieen van het ongeobserveerde instrument te specificeren. Om-dat dit model volledige is geformuleerd in een Bayesiaans raamwerk, kan hetvrij eenvoudig worden uitgebreid naar meer algemenere situaties. Daarnaastverkrijgen we een beter inzicht in de kleine steekproef eigenschappen van deschatters. Het onderzoek in dit hoofdstuk is nog niet volledig afgerond, maarde eerste resultaten zijn veelbelovend. We geven in de discussie suggestiesvoor vervolgstappen.
We concluderen dat de nieuwe LIV methode geıntroduceerd in dit proefschrifteen interessante en waardevolle techniek is om endogeniteitsproblemen op te
![Page 275: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/275.jpg)
260 Samenvatting (summary in Dutch)
lossen. Daarnaast is de LIV methode relatief simpel in gebruik. In het laat-ste hoofdstuk van dit proefschrift geven we de belangrijkste conclusies weeren presenteren we een discussie van de resultaten. We geven de voornaam-ste beperkingen aan, en suggereren hoe toekomstig onderzoek de openstaandevraagstukken kan beantwoorden.
![Page 276: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/276.jpg)
Stellingen
behorende bij het proefschrift
Latent Instrumental Variables– A New Approach to Solve for Endogeneity –
van
Peter Ebbes
23 december 2004
![Page 277: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/277.jpg)
1. Similar to the LIV approach, the instruments defined by Lewbel (1997)do not necessarily correspond to a ‘physical’ measure or an economictheory, and can be considered as ‘nuisance’ (chapter 2).
2. A drawback of most of the methods proposed to solve for price endo-geneity in market response models is that these methods are based onrather strong assumptions on the nature of endogeneity, and are, there-fore, expected to be limited in use (chapter 2).
3. All the parameters in the simple LIV model are identifiable as long asthe endogenous regressor has a non-normal distribution (chapter 3).
4. The LIV estimates for the regression parameters are consistent, regard-less of the choice form (chapters 3, 4, and 8).
5. The LIV approach can be used to investigate whether available ob-served instrumental variables are valid (chapter 4).
6. The LIV estimate of the return to education is to be preferred over theOLS and the classical IV estimates (chapter 5).
7. Family background instrumental variables in schooling applications aremost likely endogenous (chapter 5).
8. The Hausman-Taylor approach, which is used to solve for endogeneityproblems at level two, should be used with caution because its resultsare seriously biased in the presence of endogeneity at level one (chapter6).
9. The nonparametric Bayesian approach to model the distribution of the
![Page 278: University of Groningen Latent instrumental variables ... · My family and friends have always been very supportive, which is essential to me, and for which I cannot thank them enough](https://reader036.vdocuments.us/reader036/viewer/2022071016/5fcf99aee4ce126723624333/html5/thumbnails/278.jpg)
latent instrument illustrates that the exact choice ofm is not very im-portant, but that estimating its value may yield efficiency advantages incertain situations (chapter 7).
10. Omitted product attributes in a conjoint analysis study leads to biasedestimates for the part worth utilities, but correcting for it with instru-mental variables methods is not straightforward.
11. It is incorrect to label the Latent Instrumental Variables (LIV) approachas ‘latent’.
12. Increasing the amount of items in a choice set does not necessarily addto the happiness of a consumer. A nice example is ordering yourx-thbeer in an American bar, wherex > 2.
13. Writing down a formulated theorem (‘stelling’) right away, is a cleverstrategy.
14. It is possible in the USA to make a living out of suing others.
15. The duration of a game ‘Settlers of Catan’ is usually too short for thelaw of large numbers to apply. But labeling the winner as a ‘geluksvo-gel’ is typically not accepted.