health policy analysis via nonlinear regression methods
TRANSCRIPT
Health Policy Analysis via Nonlinear Regression Methods:
Estimation and Inference from a Potential Outcomes Perspective
by
Joseph V. Terza* Department of Economics
University of North Carolina at Greensboro Greensboro, NC 27402-6165
Phone: (336) 334-4892 E-mail: [email protected]
(April, 2011) *Professor, Department of Economics, Bryan School of Business and Economics, University of North Carolina at Greensboro, Rm. 444, Bryan Building, Greensboro, NC, 27402-6165, E-mail: [email protected] Phone: (336) 334-4892, Fax: (336) 334-5580. This work was supported by the National Institute on Drug Abuse (R01 DA013968-02), the Robert Wood Johnson Foundation Substance Abuse Policy Research Program (RWJF Grant #49981), and the National Institute on Alcohol Abuse and Alcoholism (3R01AA017890-03S1). The author is grateful for the helpful comments of Libby Dismuke and David Bradford, and for the excellent research assistance provided by F. Michael Kunz and Mujde Erten. Thanks to Jeff Desimone and John Mullahy for providing the data for the examples.
Abstract
Most empirical research in health economics (HE) and health services research (HSR) is conducted with the goal of providing scientific evidence that will serve to inform current and future health policy. Such policy analytic studies typically use nonexperimental (survey) data and focus on a particular variable (the policy variable) that is at present, or will in the future be, under the control of a policy decision-making entity. Broadly stated, the typical policy analytic goal is the estimation of the effect that the policy variable has on a targeted outcome of interest [henceforth the policy effect (PE)]. Because both the outcome and policy variables are random, and given the possible endogeneity of the latter, considerable rigor and specificity in defining the PE are required to make the concept operational (estimable and interpretable). For contexts in which the policy variable is binary, Rubin (1974, 1977) developed the potential outcomes framework (POF) which facilitates clear definition and interpretation of various policy relevant effects. The POF is clearly useful for causal analysis when the policy variable is binary. Many researchers in health economics (HE) and health services research (HSR), however, have applied nonlinear regression (NR) methods with the apparent goal of estimating policy relevant causal effects in cases wherein the policy variable px is not binary. Unfortunately, there is very little
guidance in the literature on how to implement and interpret nonlinear regression results for non-binary causal inference from a potential outcomes perspective. In the present paper, we detail an extended version of the POF that encompasses all three possible types of policy variable [binary, discrete (e.g. counts) and continuous] and accommodates possible endogeneity. As part of the discussion, we show how NR models and methods can be appropriately couched in this extended POF (EPOF). We also explore the relative practicality of the implementation of NR methods in our EPOF (vs. linear methods). We will derive the correct formulation of the asymptotic standard error for the generic NR-based policy effect estimator in the context of our EPOF and show that its computer implementation is fairly straightforward (e.g. requiring only a few lines of Mata code in a Stata 11® “do” file). This framework is comprehensive in that it: 1) fully reconciles the PO approach with causal modeling in a generic NR context; 2) affords extension of the PO approach, typically invoked in the treatment effects context, to cases in which xp is non-binary; 3) admits of policy effect estimators that are consistent even in the presence of endogeneity; and 4) is amenable to the derivation of correct formulations for the accompanying asymptotic inferential statistics that do not impose a heavy computational burden. It is hoped that the EPOF will be used as a template by empirical HE and HSR researchers for defining policy relevant NR-based estimation objectives and for interpreting the corresponding estimation results.
JEL Classification: C31, I11
Keywords: Incremental Effects, Marginal Effects, Treatment Effects, Endogeneity, Asymptotic Inference
1. Introduction
Most empirical research in health economics is conducted with the goal of providing
scientific evidence that will serve to inform current and future health policy. Such policy
analytic studies typically focus is on a particular variable (the policy variable -- xp) that is at
present, or will in the future be, under the control of a policy-making entity. Broadly stated, the
key policy analytic objective is estimation of the effect that a change in px would have on a
targeted outcome of interest (y) [henceforth the policy effect (PE)]. Because px and y are
random variables, and given that there are many ways to define a “change in px ,” considerable
rigor and specificity in defining the PE are required to make this concept operational (estimable
and interpretable). For contexts in which px is binary, Rubin (1974, 1977) developed the
potential outcomes framework (POF) which facilitates clear definition and interpretation of
various policy relevant effects. The key concept in this framework is the potential outcome
( *px
y ) – the random variable representing the outcome as it would have been manifested if the
policy variable were counterfactually fixed at *px (either *
px 0 or *px 1 ). In the POF,
interesting versions of the PE can be defined for various subsets of the population of interest
based on a counterfactual thought experiment in which the value of *px is first set at 0, and then
switched to 1. For example, suppose we are interested in evaluating the potential impact on
earnings of a fully effective substance abuse (SA) policy aimed at eradication. Such a PE
measure would be an upper bound on the potential effectiveness of any policy for treating
current abusers. In this context y represents earnings, px is a binary indicator ( px =1 if SA; 0,
2
otherwise), and *px
y denotes potential earnings as it would be if everyone in the relevant
population had *px as his mandated SA status ( *
px = 1 if SA; 0, otherwise). In this case the PE
of interest (the policy relevant estimation objective) would be the following average treatment
effect for the treated (ATET)1
SA 1 0 SA 1 1ATET E [y ] E [y ] (1)
where *
pSA 1 x
E [y ] denotes expectation over the subpopulation of substance abusers. A similar
measure can be defined for SA policies targeting prevention. In this case, the policy analyst
would seek to estimate the potential earnings gain from preventing SA among the non-abusers.
In the POF, this estimation objective would be formally expressed as the following average
treatment effect on the untreated (ATEU)
SA 0 0 SA 0 1ATEU E [y ] E [y ] (2)
where *
pSA 0 x
E [y ] denotes expectation over the subpopulation of non-abusers. Finally, the class
of SA policies of interest may target both eradication and prevention. In this case, the entire
population is relevant and, in the POF, the estimation objective would be the following average
treatment effect (ATE)
1 0ATE E[y ] E[y ] . (3)
1 Note that (1) is cast as the potential earnings gain from a fully effective SA policy. We could have written it more conventionally as SA 1 1 SA 1 0E [y ] E [y ] -- the impact of SA on earnings among substance abusers.
3
Expression (3) represents the expected impact (in terms of earnings) from a fully effective SA
policy that has both prevention and eradication components.2
The POF is clearly useful for causal analysis when px is binary. Many researchers in
health economics (HE) and health services research (HSR), however, have applied nonlinear
regression (NR) methods with the apparent goal of estimating policy relevant causal effects in
cases wherein px is not binary (discrete – e.g. a count; or continuous). Unfortunately, there is
very little guidance in the literature on how to implement and interpret nonlinear regression
results for non-binary causal inference from a potential outcomes perspective.3 In the present
paper, we detail an extended version of the POF that encompasses all three possible types of
policy variable [binary, discrete (e.g. counts) and continuous] and accommodates the possible
endogeneity of px . As part of the discussion, we show how NR models and methods can be
appropriately couched in this extended POF (EPOF). This framework is comprehensive in that
it: 1) fully reconciles the PO approach with causal modeling in a generic NR context; 2) affords
extension of the PO approach, typically invoked in the treatment effects context, to cases in
which xp is non-binary – e.g. real-valued and continuous (marginal effects) or real-valued and
discrete (but non-binary) [incremental effects]; 3) admits of policy effect estimators that are
consistent even in the presence of endogeneity; and 4) is amenable to the derivation of correct
formulations for the accompanying asymptotic inferential statistics that do not impose a heavy
computational burden. It is hoped that the EPOF will be used as a template by empirical HE and
2 There are other measures in the POF framework, which are relevant when treatment effects are heterogeneous at the individual level – viz., the local average treatment effect and, the marginal treatment effect. See Basu et al. (2007) for a comprehensive discussion of the various treatment effect concepts. 3 A notable exception is section 4.5.3 of Angrist and Pischke (2009).
4
HSR researchers for defining policy relevant NR-based estimation objectives and for interpreting
the corresponding estimation results.
We also explore the relative practicality of the implementation of NR methods in our
EPOF (vs. linear methods). It goes without saying that, given the breadth of available packaged
software options, the practicality of parameter estimation via NR methods is virtually on par with
that of the linear estimators (OLS and linear IV) – I point, in particular, to the Stata 11® “nl”,
“ml” and “gmm” procedures. The linear methods, however, enjoy an advantage when it comes
to computation of the correct standard error of the estimated policy effect of interest. As will be
made clear later in the context of our EPOF, no special programming is required for the correct
standard error in the linear case because the regression coefficient parameter itself is the policy
relevant estimation objective. We will derive the correct formulation of the asymptotic standard
error for the generic NR-based policy effect estimator in the context of our EPOF and show that
its computer implementation is fairly straightforward (e.g. requiring only a few lines of Mata
code in a Stata 11® “do” file).
The remainder of the paper is organized as follows. In the next section we discuss the
basic definitions and concepts and, within our EPOF, formulate the three most commonly
targeted counterfactual policy effect measures – treatment effects, incremental effects, and
marginal effects. We review the conventional use of NR in “effect” estimation in Section 3,
paying particular attention to the two most common approaches in the literature: conditional
mean estimation at fixed values of xp and the other regressors; and the method of recycled
predictions. In Section 4, we develop estimators that are consistent even in the presence of
endogeneity for all three versions of the counterfactual policy effect (treatment effects,
5
incremental effects, and marginal effects). Full development of the asymptotic properties and
accompanying correct asymptotic inferential statistics for our suggested estimation methods are
also given therein. In Section 5, we focus on the case in which xp is endogenous and
demonstrate how extant NR methods can be used to obtain the requisite consistent parameter
estimates for the policy effect estimators detailed in Section 4. Examples (programmed in
Stata/Mata 11®) are discussed in Section 6. The paper is summarized and conclusions are drawn
in the final section.
2. The Extended Potential Outcomes Framework
Here we extend the POF for the case in which the policy variable of interest ( px ) is
binary to encompass contexts in which px is a random variable of a generic type: Binary – in
which case the relevant support is {0, 1}; Discrete – with support being a subset of the integers,
e.g., {0, 1, 2, ...} (count data); or Continuous – having support equal to a subset of the real line,
e.g., [0, ∞) (with the entire real line as a specific case).4 Moreover, in our extended POF
(EPOF), the counterfactually mandated version of px ( *px ) is not restricted to a single value
(either 0 or 1). Instead it can be a random variable. This leads to a somewhat broader definition
of the potential outcome ( *px
y ). Here *px
y is representative of the distribution of potential
outcomes that would obtain if the distribution of the policy variable were mandated and fixed at
*px . Note that, as in the POF discussed in the introduction, *
px and *px
y are counterfactual. For
4 The support of a random variable is the largest closed set over which its probability density function is non-zero.
6
the ith individual in the population, the potential outcome ( *px (i)
y ) cannot be observed unless the
observed value of the policy variable ( pix ) is equal to the mandated value for that individual
( *p(i)x ).
To make our definition of the PE operational, we define the relevant policy as a change
from a policy scenario characterized by *px (possibly counterfactual) to a counterfactual one in
which the mandated version of the policy variable is * *p px Δ(x )+ , where *
pΔ(x ) denotes the
proposed policy increment which is possibly a function of *px .5 This hypothetical experiment is
counterfactual (or at least partially counterfactual) in the sense that the values of the policy
variable ( px ) that would be observed for some individuals in the population (say, in sampling)
will not be equal to the values assigned to them via *px ; for others the value of xp will not
coincide with * *p px Δ(x )+ ; and for still others the observable value of xp would equal neither
their corresponding value of *px nor * *
p px Δ(x )+ . Stated in terms of potential outcomes, the
policy analyst is interested in the difference between *px
E[y ]and * *p px Δ(x )
E[y ]+
that would be
exclusively attributable to the mandated (policy) change from *px to * *
p px Δ(x )+ .6
Henceforth, without loss of generality, we will focus on policies in which *px is taken to
be the currently observable version of the random variable xp (at the time of the relevant survey)
5Recall that in the generic case, *
px and * *p px Δ(x )+ are random variables.
6Without loss of generality, the definition of the policy effect (PE) may be based on any of a number of possible
policy relevant features of the distribution of *px
y . Instead of the expected value, the analyst may be interested in:
the variance; a particular quantile; the hazard at time to (for duration models); etc.
7
and assume that *px and *
pxy are the only variates that change as part of the policy. We refer to
this as the default policy scenario. We now proceed to give rigorous definition to the PE as it
pertains to the various policy variable types (discrete, continuous, and binary). In the default
policy scenario, for the case in which xp is discrete, the generic form of the estimation objective
would be the following incremental policy effect
* * *p p p
*INC p x Δ(x ) x
PE (Δ(x )) E[y ] E[y ]+
= - . (4)
For example, suppose the outcome of interest is the number of yearly office visits to the
physician, px is the per visit copay (measured in discrete $1 increments), and the proposed
policy entails a fixed and universal increase in the copay. In this case *pΔ(x ) Δº , where Δ is a
constant.
When xp is continuous, it is often the case that the policy analyst will not have a specific
increment to the policy variable in mind. In such cases, the following marginal policy effect
becomes the relevant estimation objective
INC
Δ 0MARG
lim PE (Δ)PE
Δ= (5)
where INCPE (Δ) is defined as in (4) with
*pΔ(x ) Δº . From (5) we have7
p px Δ x
Δ 0
E[y ] E[y ]lim
Δ
+
-
7 The first equality in this expression is an application of the law of iterated expectations (Wooldridge, 2010, pp. 18-22). The second equality holds under fairly general conditions (see Bierens, 2004, p. 45-46).
8
p px Δ p x p
Δ 0
E E[y | x Δ] E[y | x ]lim
Δ
+
é ù+ -ê úë û=
p p
p
x Δ p x px
Δ 0
E[y | x Δ] E[y | x ]E lim
Δ
+
é ù+ -ê ú= ê úê úë û
p
p
x px
p
E[y | x ]E
x
é ù¶ê ú= ê ú¶ê úë û
.
This derivation warrants some clarification. To conserve on notation, we use bE[a | b] to denote
the conditional expectation of the counterfactual random variable ba given that the random
variable b is fixed at a particular value in the support of b (say B). A more explicit version of
this notation might be bE[a | b B] . So, for instance we could replace px pE[y | x ] with
px p pE[y | x X ]= , where pX is a fixed value in the support of px . Given this notational
convention, it should be clear, for instance, that px Δ pE[y | x Δ]+ + is a nonstochastic scalar
function of the nonstochastic quantity pX Δ+ , so that p px Δ p x p
Δ 0
E[y | x Δ] E[y | x ]lim
Δ
+
+ - is
legitimate and equal to px p
p
E[y | x ]
x
¶
¶.
It is interesting to note that when xp is binary (1), (2) and (3) can be cast as special cases
in our EPOF. In the default policy scenario, the generic form of the targeted policy effect (PE)
would be
*p
* * *p p p
xBINARY x Δ(x ) x
PE ( 1) E(y ) E(y )+
é ù= - -ê úë û
(6)
9
from which (1), (2) and (3) follow as special cases. It is easy to verify that: (1) obtains for
* *p pΔ(x ) xº - ; we get (2) when * *
p pΔ(x ) 1 xº - ; and (3) follows from * *p pΔ(x ) 1 2xº - .
As the foregoing discussion makes clear, (4), (5) and (6) are logical PO-based targets for
health policy analysis. Which of them is apropos in a particular policy context will depend on
the support of the policy variable in question and whether or not the policy increment ( *pΔ(x ) ) is
known [(4) if xp is discrete or continuous and the policy increment is known; (5) if xp is
continuous and the policy increment is unknown; and (6) if xp is binary]. In the next section, we
review the conventional approaches to “effect” estimation implemented by empirical researchers
in HE and HSR and note that in applications of these approaches and extant renditions of the
underlying theory, connections with PO-based policy relevant estimation objectives [in
particular, (4), (5) and (6)] are not typically discussed. The main objective of the present paper is
to remedy this by providing applied researchers with a generic and comprehensive potential
outcomes framework for defining, estimating and interpreting causal effects in a NR context.
3. Conventional Approaches to PNR-Based Effect Estimation
The focus of the present paper is on the use of NR results for policy effect estimation.
The prototypical NR model, in its minimal form, is defined by a conditional mean regression
function like
p o p oE[y | x , x ] M(x , x ; τ)= (7)
10
where ox denotes a vector of observable controls (observable confounders), M( ) is a known
function, and τ is a vector of unknown regression parameters to be estimated.8 From elementary
econometrics we know that under the usual identification conditions, τ can be estimated by
applying the nonlinear least squares (NLS) method to (7) using a sample of data on the
observable variables y, px , and ox . If, in addition, the forms of higher-order moments of the
distribution of p o(y | x , x ) are known (assumed) [e.g. conditional heteroskedasticy, conditional
skewness, conditional kurtosis, etc.], statistical efficiency gains can be made by incorporating
such additional non-sample information in the estimation routine [e.g. generalized NLS]. In the
extreme, if the form of the probability density function of p o(y | x , x ) is known (assumed), a
fully efficient estimate of τ can be obtained via the maximum likelihood (ML) method.
Whatever the underlying assumptions of the model, we will henceforth assume that (7) holds,
and that we have a corresponding consistent estimator of τ ( τ̂ ) in hand. It should be noted here
that all of the above discussion holds regardless of whether or not (7) is representative of the
“true causal model.” This is an important point, which we will revisit below.
The typical empirical study in HE and HSR implementing NR methods makes no explicit
mention of the PO approach as it relates to policy objectives like (4), (5) and (6). Instead, the
focus is on the following “effect” measures
# #o oM(1, x ; τ) M(0, x ; τ)- (8)
and/or
8 The term “minimal” is used here to describe the level of the parametricity of the model. The minimal requisite assumption for a parametric NR model is knowledge of the form of the conditional mean regression function up to a vector of unknown parameters. A maximally parametric model would be one in which the form of the conditional density function is known up to the unknown parameter vector.
11
# #
p p o o
p o
p x x ,x x
M(x , x ; τ)
x= =
¶
¶ (9)
where #px and #
ox are fixed and known values of the policy variable ( px ) and vector of
observable confounders ( ox ), respectively; and M( ) is defined in (7). Sample analog estimators
of (8) and (9) are
# #o oˆ ˆM(1, x ; τ) M(0, x ; τ)- . (10)
and
# #
p p o o
p o
p x x ,x x
ˆM(x , x ; τ)
x= =
¶
¶ (11)
where τ̂ is the vector of parameter estimates obtained by applying a NR method based on (7) to
the observable data. Results from these estimators are typically included in the output from NR
software packages (e.g. see the “mfx” option and the “margins” procedure in Stata 11®,
StataCorp, 2009). Applied researchers in HE and HSR routinely opt to report estimates from
(10) and (11) but typically neglect to offer PO-based policy relevant interpretation of these
results.
Other empirical researchers use the following recycled prediction statistics as treatment
effect and marginal effect estimators, respectively
n
oi oi 1
1ˆ ˆM(1, x , τ) M(0, x , τ)
n
(12)
12
n
pi oi
i 1 p
ˆM(x , x , τ)1
n x
(13)
where the “i” subscript denotes the individual (i = 1, …, n), and n is the sample size (see Basu
and Rathouz, 2005; StataCorp, 2009). Here again, I could find no discussions in the literature
that relate (12) and (13) in any way to PO-based policy-relevant estimation objectives like (4),
(5) and (6). Therefore, even when recycled predictions estimates are reported, they are seldom
(if ever) interpreted from a PO-based policy analytic perspective.
4. Causally Interpretable NR-Based Treatment, Incremental and Marginal Effect
Estimators
We seek to offer the empirical health policy analyst means for clear and concise PO-
based causal interpretation of results obtained via NR estimation. We begin by rewriting the PO-
based expressions (3), (4) and (5) in a way that makes them amenable to estimation and
inference via NR methods. This discussion also suggests appropriately extended versions of (12)
and (13) that are consistent for the policy relevant objectives (3) and (5). An analogous
estimator for (4) is also detailed. Finally, we show that these estimators can be cast as
conventional two-stage optimization estimators. This affords a means for derivation of their
asymptotic properties.
We begin by noting that for any vector of confounders (v), using the law of iterated
expectations, we can write9
* *p px x
E[y ] E E[y | v] . (14)
9 Confounders are variables that are correlated (in sampling) with both the outcome and the policy variable (xp).
13
We define the comprehensive vector of confounders and the true causal model to be the vector
0x and function 0 0 0pμ (x , x ; τ ) , respectively, that satisfy both of the following conditions
0 0 0 0
p pE[y | x , x ] μ (x , x ; τ ) (15)
*p
0 0 * 0 0px
E[y | x ] μ (x , x ; τ ) (16)
for all *
px , where 0τ is defined as the true value of the vector of unknown parameters. This
definition says that the conditional mean regression function 0pE[y | x , x ] can be causally
interpreted in a potential outcomes (PO) sense, if and only if the set of confounders used as
conditioning variates includes all possible variables that are correlated with both y and xp. For
any other set of conditioning confounders, say v, pE[y | x , v] and *px
E[y | v] diverge. This is
important because the latter is representative, via (14), of the counterfactual entity of key interest
(viz. *px
E[y ]) and the former is estimable via observable data. If the vector of confounders is not
comprehensive, then the equality between the observable and counterfactual entities represented
in (15) and (16), respectively, will not hold and the data cannot be sufficiently informative to
allow accurate estimation of, and inference about, relevant targeted policy effects [like (3), (4)
and (5)].10
It is important to note the distinction between (15) and (16). The former applies to the
case in which xp is observable and jointly distributed with (and not necessarily independent of)
x0. In (16), *px is a counterfactually (exogenously) mandated version of px . The random
10 An illustrative example is detailed in an appendix that will be supplied upon request.
14
variable *px need not be degenerate and is, by design, distributed independently of 0x and px .
Such independence holds even if we set *p px x
for policy analytic purposes, as we did in
defining the policy analytic objectives (3), (4) and (5) [and (1) and (2)]. We defined this as the
default policy scenario.
If the set of confounders is comprehensive, using (15) and (16), we can rewrite (3), (4)
and (5) as
Treatment Effect
0 0 0 0 0 0ATEPE E[μ (1, x ; τ ) μ (0, x ; τ )]= - (17)
Incremental Effect
0 * 0 0 0 * 0 0INC p pPE (Δ) E[μ (x Δ, x ; τ ) μ (x , x ; τ )]= + - (18)
Marginal Policy Effect11
*p
*p
* 0 * 0 0px p
MARG * *xp p
E[y | x ] μ (x , x ; τ )PE E E
x x
é ù¶ é ù¶ê ú ê ú= =ê ú ê úê ú¶ ¶ê úë ûê úë û
. (19)
The sample analog estimators corresponding to (17) through (19) are
{ }n 0 0 0 0 0 0
ATE i ii 1
1ˆ ˆPE μ (1, x ; τ ) μ (0, x ; τ )
n=å= - (20)
{ }n 0 0 0 0 0 0
INC p pi i pi ii 1
1ˆ ˆPE (Δx ) μ (x Δ, x ; τ ) μ (x , x ; τ )
n=å= + - (21)
11 See the discussion following equation (5) for the definition of *
p
* *p px
E[y | x ] / x¶ ¶ .
15
0 0 0
n pi iMARG
i 1 p
ˆμ (x , x ; τ )1PE
n x=å
¶=
¶. (22)
These estimators are consistent for (3), (4) and (5), respectively, if and only if 0τ̂ is consistent
and the vector of confounders is comprehensive. This discussion directly links the recycled
prediction estimators (12) and (13) with (3) and (5), respectively, and shows that the former will
only be consistent for the latter if the set of confounders (controls) is comprehensive.
After dropping the “0” and “*” superscripts for notational convenience, we may
characterize the true causal model for any policy analytic context by rewriting (15) and (16) as
p o u p o uE[y | x , x , x ] μ(x , x , x ; τ) (23)
px o u p o uE[y | x , x ] μ(x , x , x ; τ) (24)
where ox and ux denote the observable and unobservable confounders (the latter may be a vector
or a scalar). Expressions (23) and (24) are validated by the implicit comprehensiveness of the
vector of confounders o ux [x x ] -- inclusion of the unobservable confounder ux ensures the
comprehensiveness of x. Using this notation, (20) through (22) become
{ }n
ATE oi ui oi uii 1
1ˆ ˆPE μ(1, x , x ; τ) μ(0, x , x ; τ)
n=å= - (25)
{ }n
INC pi oi ui pi oi uii 1
1ˆ ˆPE (Δ) μ(x Δ, x , x ; τ ) μ(x , x , x ; τ)
n=å= + - (26)
n pi oi uiMARG
i 1 p
ˆμ(x , x , x ; τ)1PE
n x=å
¶=
¶. (27)
16
Writing the PO-based policy relevant estimators in this way exposes a potential difficulty in their
implementation. As they stand, (25) through (27) are not feasible because uix is, by definition,
unobservable. We will discuss feasible versions of these estimators later.
We now turn to the asymptotic properties of these estimators. To simplify the discussion,
we rewrite (25), (26) and (27) in generic form as
n
ii 1
1PE pe
n=å= (28)
where
pi ii ˆpe pe(x , x , τ)=
i oi uix [x x ]
and
μ(1, x; τ) μ(0, x; τ) for (25) (28-1)
ppe(x , x, τ) = p pμ(x Δ, x, τ) μ(x , x, τ) for (26) (28-2)
p
p
μ(x , x, τ)
x
for (27). (28-3)
To obtain the asymptotic properties of (28) we note that it can be cast as a two-stage
optimization (2SOPT) estimator.12 2SOPT estimators are characterized by two objective
functions: Q( ), a full information objective function whose optimizer is a consistent estimator
of all parameters of the model;13 and Q1( ), a first stage objective function whose optimizer is a
12 For details on 2SOPT estimators see White (1994, Chapter 6); Newey and McFadden, 1994; and Wooldridge, 2010, Chapter 12). These authors extend the results of Murphy and Topel (1985) for two-stage maximum likelihood estimators to the more general class of 2SOPT estimators. 13 Here we use the term "full information" to indicate that Q( ) takes account of all of the available nonsample information. This does not imply that full information maximum likelihood estimation is necessarily feasible.
17
consistent estimator of a subvector of the full set of parameters of interest. In the 2SOPT
protocol, the full vector of parameters is (say) θ [δ ω ] . In the first stage, Q1 is optimized to
obtain a consistent estimate of δ. In the second stage, Q is optimized with respect to ω with δ
fixed at its first-stage estimated value ( δ̂ ). Let τ play the role of δ, and let PE (PE = PEATE,
PEARC(Δ), or PEMARG ) play the role of ω. The estimator of PE in (28) can be represented as a
2SOPT estimator by specifying the first-stage objective function as that which corresponds to the
in-hand consistent estimator of τ ( τ̂ ), say
n
1 1 ii 1
Q (τ) q (τ, z )=å= (29)
and the second-stage objective function as
n
ii 1
Q(PE) q(PE,z )=å= (30)
where
( )2
i iq(PE,z ) pe PE=- - (31)
and i i pi iz [y x x ] . It is, however, easy to show that the optimizer of (31) is (28).
Therefore, our 2SOPT characterization of (28) is valid.
Using existing results on the asymptotics of 2SOPT estimators, we have14
14 For a scalar function “s” of two vector arguments j and t (i.e. s = s(j, t) where s is a scalar and j and t are vectors) we define:
j
ss
j
¶ =
¶ = the gradient vector of s with respect to the elements of j
18
[ ] [ ] [ ]1 1
PE τ 1 ττ 1 PE τ PE τ ττ 1 τ 1 PEE q q E q E q E q E q E q q- - ¢é ù¢é ù é ù ¢- - ê úê ú ê úë û ë û ë û
12
PE PE PEE q E q-ùé ù é ù+ ê úúê ú ë ûë û û
. (32)
Moreover, PEq 2(pe PE) = - , ppe pe(x , x, τ)º , PE PEq 2 = , [ ]PE τ τE q 2E peé ù =- ê úë û and
2 2PEE q 4E (pe PE)é ù é ù = -ê ú ê úë û ë û . Therefore, (32) can be written
( ) [ ] [ ]τ τˆa var PE E pe AVAR(τ)E pe ¢=
[ ] [ ] [ ]1τ 1 ττ 1 τE (pe PE) q E q E pe
- ¢+ - [ ] [ ] [ ]1τ τ 1 τ 1E pe E q E (pe PE) q
- ¢+ -
2E (pe PE)é ù+ -ê úë û . (33)
It is typically the case that15
τ 1 pE q | x , x = 0é ùê úë û (34)
which implies that
[ ]τ 1 τ 1 pE (pe PE) q E E (pe PE) q | x , xé ùé ù- = - ê úê úë ûë û
and
2
jts
sj t
¶ =
¶ ¶= the matrix of cross-partial derivatives of s with respect to the elements of j and t.
We also assume that the former is a row vector, and the latter is a matrix with row dimension equal to that of the first subscript on and column dimension equal to that of the second subscript. 15 In an appendix that will be supplied upon request, we show that if the first-stage estimator of τ is maximum likelihood or nonlinear least squares, then condition (34) holds.
( ) 1
PE PE PE τ PE τˆa var PE E q E q AVAR(τ)E q- é ¢é ù é ù é ùê= ê ú ê ú ê úë û ë û ë ûêë
19
τ 1 pE (pe PE) E q | x , xé ùé ù= - ê úê úë ûë û = 0 (35)
so (33) becomes
( ) [ ] [ ]τ τˆa var PE E pe AVAR(τ)E pe ¢= 2E (pe PE)é ù+ -ê úë û . (36)
The asymptotic variance given in (36) can be consistently estimated using
( )
( ) n n n 2
τ τi i ii 1 i 1 i 1
pe pe (pe PE)ˆa var PE n AVAR(τ)
n n n= = =å å å
¢æ ö æ ö æ ö÷ ÷ ÷ç ç ç -÷ ÷ ÷ç ç ç÷ ÷ ÷ç ç ç÷ ÷ ÷ç ç ç= +÷ ÷ ÷ç ç ç÷ ÷ ÷ç ç ç÷ ÷ ÷ç ç ç÷ ÷ ÷÷ ÷ ÷ç ç çè ø è ø è ø
(37)
where τ ipe denotes τpe evaluated at xpi, xi and τ̂ ; and ˆAVAR(τ) is the estimated
asymptotic covariance matrix of τ̂ . In summary, PE is consistent and
( ) dn
(PE PE) n(0,1)a var PE
- ¾¾ . (38)
5. Estimating τ when xp is Endogenous: Accounting for the Unobservable Confounder xu
To complete the discussion, we must develop feasible versions of (25) through (27). This
requires a consistent estimator of τ and a means of dealing with the nonobservability of xu. As
we will see, estimation of τ will be based on (23) and whatever additional nonsample
information on ( p o uy | x , x , x ) is available. When xu is null, τ can be estimated using a
conventional NR method (e.g., NLS, ML) based on (23), and the policy effect estimator in (28)
20
coincides with the recycled predictions estimators (12) and (13) for treatment effects and
marginal effects, respectively. Therefore, as a corollary to the above discussion, we have that the
recycled predictions estimators are consistent for (3) and (5) and their asymptotic standard errors
can be obtained using the appropriate versions of (37).16 When xu is not null (i.e., when xp is
endogenous) it must be taken into account in both the estimation of τ and the policy effect [(3),
(4) and (5)]. In the following we separately discuss appropriate methods for dealing with
endogeneity for the cases in which xp is binary and non-binary (discrete or continuous).
5.1 Binary Endogenous Policy Variable
In this case we focus on (3) [i.e., (17)] as the estimation objective. It is clear that the
feasibility of estimating (17) [e.g. with a statistic similar to (25)] hinges on our ability to: 1)
appropriately account for xu in the formulation of the estimator; and 2) consistently estimate τ
given the endogeneity of xp [i.e., given that xu is present in (17)]. Following Terza (1994, 2009),
we can formalize the correlation between xp and xu by assuming that
p ux I t(w,α) x 0
(39)
where I(A) denotes the indicator function which takes the value 1 if condition A is true and 0
otherwise, t(w,α) is a known function, w = [xo w+], α is a conformable vector of unknown
parameters, w+ is a vector of identifying instrumental variables – variables that are correlated
with xp but do not otherwise influence the value of y -- and u(x | w) has a known distribution
function. Under these assumptions we can extend (23/24) as
16 Similarly when xu is null, the recycled predictions estimator of the incremental effect is (26) sans xui.
21
px o u p o u p u p o uE[y | x , x ] E[y x , x , x ] E[y x , w, x ] μ(x , x , x ; τ) (40)
so that (17) becomes
[ ]uATE (x |w) o u o uPE E E μ(1, x , x ; τ) μ(0, x , x ; τ)é ù= -ê úë û .
[ ]o u o u u uE μ(1, x , x ; τ) μ(0, x , x ;τ) g(x ) dx¥
-¥ò
é ùê ú= -ê úë û
(41)
where g( ) denotes the known probability density function (pdf) of u(x | w) . If we have a
consistent estimator of τ (say τ̂ ), (41) can be consistently estimated using the following feasible
replacement for (25)17
[ ]n
ATE o u o u u ui 1
1ˆ ˆPE μ(1, x , x ; τ) μ(0, x , x ; τ) g(x ) dx
n
¥
= -¥å ò
ì üï ïï ï= -í ýï ïï ïî þ. (42)
The asymptotic standard error of this estimator [and the relevant asymptotic t-statistic] can be
obtained from (37) [and (38)] by writing (42) as the special case of (28) in which (28-1) is
replaced by
[ ]p o u o u u upe(x , x, τ) = μ(1, x , x ; τ) μ(0, x , x ; τ) g(x ) dx¥
-¥ò - .
In developing a consistent estimator for τ in this context we follow Terza (1994, 2009)
and note that under the assumptions of the model
17 The requisite integral for (41) can be evaluated using quadrature or simulation approximation. The GAUSS INTQUAD1 procedure is accurate and efficient for the former, and Halton sequence based methods work well for the latter.
22
pE y x , w
t(w,α)
p o u u u p o u u ut(w,α)
p p
μ(x , x , x ; τ)g(x )dx μ(x , x , x ; τ)g(x )dxx 1 x
1 G( t(w,α)) G( t(w,α))
(43)
where G( ) denotes the known distribution function of u(x | w) . Based on (43), a simple two-
stage estimator can be implemented. First, estimate α by applying the ML method to the
appropriately specified binary response model. In the second stage, apply NLS to the regression
model defined by (43) with the first-stage estimate (say, α̂ ) substituted for α. This estimator is
consistent and asymptotically normal under general conditions, and standard asymptotic results
can be applied in adjusting the NLS standard errors for the fact that the estimator is two-stage
(see White, 1994, Chapter 6).18
5.2 Non-Binary Endogenous Policy Variable
We now turn to the estimation of (4) and (5) when xp is endogenous (i.e., when xu is not
null). Here we replace (39) with
p ux r(w,α) x
(44)
where r(w,α) is a known function, w is defined as in (39), and uE[x | w] 0 . Under these
assumptions (4) and (5) can be written
*
u p
* *INC p o u p o uw,x ,x
PE (Δ) E [μ(x Δ, x , x ; τ) μ(x , x , x ; τ)]= + -
18 For a more detailed discussion of the correct formulation of the asymptotic covariance matrix of this two-stage estimator, see Terza (1999).
23
*
p
* *p o p p o pw,x
E [μ(x Δ, x ,[x r(w,α)]; τ) μ(x , x ,[x r(w,α)]; τ)]= + - - - (45)
and
*p u
*p u
MARG *x ,w,xp
E[y | x , w, x ]PE E
x
é ù¶ê ú= ê ú¶ê úë û
*p
*p o p
*w,xp
μ(x , x ,[x r(w,α)]; τ)E
x
é ù¶ -ê ú= ê ú¶ê úë û (46)
and the two-stage residual inclusion (2SRI) method suggested by Terza et al. (2008) can be
implemented for the estimation of τ.19 In the first stage of the estimator, obtain a consistent
estimate of the vector α ( α̂ ) by applying NLS to (44). Next, compute the residual as
u p ˆx̂ x – r(w,α)= . (47)
In the second stage, estimate τ by applying NLS to the following NR obtained from (23)
y = p o uˆμ(x , x , x ; τ) e+ (48)
where ux̂ is defined as in (47), and e is the regression error term. This estimator is consistent
and asymptotically normal.20 Using the 2SRI results, the following feasible versions of (26) and
(27) can be implemented
{ }
nINC pi oi ui pi oi ui
i 1
1ˆ ˆ ˆ ˆPE (Δ) μ(x Δ, x , x ; τ ) μ(x , x , x ; τ)
n=å= + - (49)
19 Validation of (45) and (46) is detailed in an appendix that will be supplied upon request. 20 The asymptotic properties of this estimator can be derived as a special case of the generic 2SOPT estimator. Details are given in an appendix that will be supplied upon request.
24
n pi oi uiMARG
i 1 p
ˆ ˆμ(x , x , x ; τ)1PE
n x=å
¶=
¶ (50)
where τ̂ is the 2SRI estimate and uix̂ is the residual for the ith sample member [as defined in
(47)].
For the special case in which p o u p p o o u uμ(x , x , x ; τ) exp(x τ x τ x τ ) , the generalized
method of moments (GMM) estimator developed by Mullahy (1997) can be used to obtain the
requisite consistent estimator of τ. Mullahy (1997) assumes that u uE[exp(x τ ) | w] 1 . Terza
(2006) shows how this assumption can be used to develop feasible versions of (26) and (27). To
see this note that
*p u
* *INC p p o o u u p p o o u ux ,w,x
PE (Δ) E exp((x Δ)β x β x β ) exp(x β x β x β )é ù= + + + - + +ê úë û
*
u up
* *(x |w) p p o o u u (x |w) p p o o u ux ,w
E E [exp((x Δ)β x β x β )] E [exp(x β x β x β )]é ù= + + + - + +ê úë û
*p
*p p o o u ux ,w
E exp((x Δ)β x β )E[exp(x β ) | w]é= + +êë
*p p o o u uE[exp(x β x β )E[exp(x β ) | w]ù- + úû
*p
* *p p o o p p o ox ,w
E exp((x Δ)β x β ) exp(x β x β )é ù= + + - +ê úë û (51)
and
*p u
*p u
MARG *x ,w,xp
E[y | x , w, x ]PE E
x
é ù¶ê ú= ê ú¶ê úë û
*p
*p p o o
*x ,wp
exp(x β x β )E
x
é ù¶ +ê ú= ê ú¶ê úë û
25
*p
*p p o o px ,w
E exp(x β x β )βé ù= +ê úë û . (52)
The corresponding feasible versions of (26) and (27) are
{ }nINC pi p oi o pi p oi o
i 1
1 ˆ ˆ ˆ ˆPE (Δ) exp((x Δ)β x β ) exp(x β x β )n=
å= + + - + (53)
nMARG pi p oi o p
i 1
1 ˆ ˆ ˆPE exp(x β x β )βn=
å= + . (54)
6. Examples
The following three examples illustrate estimation of the average treatment effect, the
incremental effect, and the marginal effect measures given in (3), (4) and (5).
6.1 The Effect of Illegal Drug Use on Employment Status
For illustrative purposes we revisit the study by Desimone (2002) in which the author
explores how illegal drug use (abuse) affects individual labor market outcomes. The author’s
apparent objective is the estimation of the average treatment effect of substance abuse ( px ) on
employment status. We revisit this analysis with a well-defined PO-based estimation objective
[viz. (3)] and a clearly articulated method for achieving that objective [viz., the appropriate
version of (42)]. As discussed in the introduction to the present paper, the policy significance of
the average treatment effect as defined in (3) in this context is that it can be taken as a measure of
the potential benefit from a fully effective substance abuse policy aimed at prevention and
eradication.
26
We focus on one of the four year/drug combinations examined by DeSimone – viz.
1984/cocaine. In this case the outcome of interest and the policy variable are defined in the
following way
1 if worked at all in past year y = 0 otherwise.
1 if used cocaine at all in past year xp = 0 otherwise.
We assume that *
pxy (the potential outcome at the counterfactually mandated value of *
px [1 or 0])
follows a probit-type parametric process of the form
*p
*p p o o u ux
y I(x β x β x β ε 0) (55)
where o u(ε | x , x ) is standard normal distributed. The model includes xu to allow for the
potential endogeneity of SA. For instance, the unobservable confounders (xu) may include
comorbidities – sicker individuals may have a lower than average propensity to use illegal drugs,
and a simultaneously lower than average likelihood of being employed. We formalize the
relationship between xp and xu by assuming that the following version of (39) holds
p ux I wα x 0
(56)
where w = [xo w+], (xu | w) is standard normal distributed, w+ is the vector of identifying
instrumental variables (discussed later), and α is a vector of unknown parameters to be estimated.
27
Given that the vector of confounders (controls), o ux [x x ] , is inherently comprehensive, (55)
yields the following version of (40)
px o u p o u p u p p o o u uE[y | x , x ] E[y x , x , x ] E[y x , w, x ] Φ(x β x β x β ) (57)
where Φ( ) denotes the standard normal cumulative distribution function (cdf). Using (57),
which summarizes (15) and (16) [(23) and (24)] in the present context, we can write (3) as the
following version of (41)
ATE p o o u u o o u u u uPE E Φ(β x β x β ) Φ(x β x β ) φ(x ) dx¥
-¥ò
é ùé ùê ú= + + - +ê úë ûê úë û (58)
where φ( ) denotes the standard normal pdf. Equation (58) can be rewritten as21
ATE p o o o oPE E Φ(γ x γ ) Φ(x γ ) . (59)
where
2p pγ 1 ρ β
2o oγ 1 ρ β
2u
u 2u
βρ sgn(β )
1 + β (60)
and sgn(q) = 1 (-1) if q is positive (negative). It can also be shown that p oγ [γ γ ] (along with
ρ) can be consistently estimated via conventional bivariate probit analysis using a “packaged”
21 Details are given in an appendix that will be supplied upon request.
28
estimation procedure like “biprobit” in Stata 11®.22 The appropriate version of the average
treatment effect estimator in (42) then is
nATE p oi o oi o
i 1
1ˆ ˆ ˆPE Φ(γ x γ ) Φ(x γ )
n=å é ù= + -ê úë û (61)
where p oˆ ˆ ˆγ [γ γ ] denotes the bivariate probit estimate of γ. Asymptotic inference for (61) can
be drawn from the following version of (38)
d
ATE ATEATE
n(PE PE ) n(0,1)
a var(PE ) (62)
where
2n n n
ATEγ γATEi ATEi ATEii 1 i 1 i 1
ATE
pe pe pe PEˆa var(PE ) n ACOV(γ)
n n n
p oi o oi oATEi ˆ ˆ ˆpe Φ(γ x γ ) Φ(x γ )= + -
ˆACOV(γ) denotes the estimated asymptotic covariance matrix of γ̂
p oγ γ γATEi ATEi ATEipe [ pe pe ]
and the details of pγ ATEipe and
oγ ATEipe are given in an appendix that will be supplied upon
request.
We used the same analysis sample as DeSimone (2002), which was taken from the NLSY
(Center for Human Resource Research 1995). A cohort of individuals aged 14–22 was
interviewed in 1979, and reinterviewed annually until 1994. DeSimone uses the data from a 22 Details are given in an appendix that will be supplied upon request.
29
supplemental drug use questionnaire that was administered in 1984 and 1988. There is also
background information on these respondents from the 1979 and 1980 panels. The sample
includes only males. The author did not exclude the respondents with missing household
income; rather he included a dummy variable, an indicator for missing nonwage income. The
definitions of all variables included in the model can be found in Table 1. The bivariate probit
results for γ and α are shown in columns 1 and 4 of Table 2, respectively. Combining these
results with (61) we estimated the average treatment effect of SA to be -.15. This estimate is not
significantly different from zero (p-value = .216) which is in contrast to the result obtained by
DeSimone (2002). His marginal effect estimate (undefined) is -.226 and it is significant at the
5% level (p-value = .015). The null hypothesis that xp is exogenous can be tested based on the
bivariate probit estimate of the correlation coefficient ρ. The policy variable will be exogenous
if and only if βu = 0 and, as can be seen in equation (60), this will be true if and only if ρ = 0. As
shown in the last line of Table 2, the estimated value of ρ is .43 with a p-value of .084, reflecting
only marginal significance. We also applied the conventional linear IV method which yielded an
average treatment effect estimate of -.268 (p-value <.01). Note the difference between this result
and that obtained via bivariate probit.
For the purpose of comparison, we estimated the employment equation with simple
probit analysis ignoring the potential endogeneity of the drug use variable. In this case, we
estimated the model as defined in (55) and (57) after setting βu = 0. Here (58) becomes
ATE p o o o oPE E[Φ(β x β ) Φ(x β )]= + - (63)
which can be consistently estimated using the following version of (61)
30
nATE p oi o oi o
i 1
1 ˆ ˆ ˆPE Φ(β x β ) Φ(x β )n=
å é ù= + -ê úë û (64)
where p oˆ ˆ ˆβ [β β ] denotes the simple (conventional) probit estimate of p oβ [β β ] . The
correct asymptotic t-stat for (64) is obtained using the appropriate versions of (37) and (38).
The simple probit estimates, along with their attendant t-stats and p-values are shown in columns
8 through 10 of Table 2. Plugging these results into (64) we estimated the average treatment
effect of illegal drug use to be .005. This result is both counterintuitive and insignificant (p-
value = .353). All of the estimation was conducted using Stata/Mata® 11 code which will be
supplied upon request.
This example demonstrates the importance of clear and rigorous upfront definition of the
estimation objective. We were very clear and careful on this point in our re-analysis of the
DeSimone study. We cast the analysis in a PO framework and rigorously defined the
counterfactual effect of interest to be the average treatment effect as specified in (3). We then
applied the appropriate version of the corresponding consistent estimation protocol as developed
and discussed above. The author neither specifies his estimation objective nor offers any detail
regarding the marginal effect estimator he implements (results given in his Table 4).23 Perhaps
most importantly, there is no discussion of the relevant underlying asymptotic theory. In
particular, the formulation of the asymptotic standard errors or asymptotic t-stats of his marginal
effect estimators is not mentioned. Our estimated average treatment effect is 33% smaller than
his marginal effect estimate and in contrast to the marginal effect estimate reported by the author,
23The author reports “… changes in probability of employment induced by a change in the variable from zero to one while holding all other variables constant…” but neither discuss exactly how these “marginal effects” are computed nor rigorously defines the population-level effects that they are intended to estimate.
31
it is statistically insignificant. It is difficult to comment on possible reasons for the divergence of
our result from the author’s because there is very little detail reported in the paper about his
estimation objective and the marginal effect estimator he implemented.
6.2 The Effect of Smoking During Pregnancy on Birthweight
Here we revisit the very clever work of Mullahy (1997) in which he develops an easy to
apply GMM method for the exponential regression case (mentioned above in section 5.2).
Because his paper is mainly methodological in thrust, no attention is paid to the specification and
estimation of counterfactual effects like those defined in (3), (4) and (5). As one of his
illustrative applications of the method, he regresses infant birthweight (y) on smoking during
pregnancy ( px ), along with binary control variables indicating race and sex. His GMM method
is used to account for the potential endogeneity of the smoking variable. In the present example,
we implement Mullahy’s birthweight-smoking GMM regression results in estimating the
potential effect of a counterfactual experiment in which all who smoke during pregnancy have
their cigarette consumption ( *px ) exogenously reduced to zero. The corresponding estimation
objective is the following variant of (51)
*INC o o p p o oPE E exp(x β ) exp(x β x β )é ù= - +ê úë û (65)
in which *px is fixed at px (the default policy scenario), and the expectation is taken over the
subpopulation who smoked during pregnancy ( px 0> ). This is the potential expected reduction
in birthweight that would have prevailed if the women who smoked would have quit before
becoming pregnant. The following version of (49) can be used to consistently estimate (65)
32
{ }sn
INC oi o pi p oi oi 1 s
1 ˆ ˆ ˆPE exp(x β ) exp(x β x β )n=
å= - + (66)
where ns denotes the number of mothers who smoked during pregnancy and p oˆ ˆ ˆβ = [β β ]¢ is the
GMM estimate of p oβ = [β β ] . Asymptotic inference for (66) can be drawn from the
following version of (38)
ds
INC INCINC
n(PE PE ) n(0,1)
a var(PE ) (67)
where
2n n n
INCβ βINCi INCi INCii 1 i 1 i 1
INC
pe pe pe PEˆa var(PE ) n ACOV(β)
n n n
oi o pi p oi oINCi
ˆ ˆ ˆpe exp(x β ) exp(x β x β )
ˆACOV(β) denotes the estimated asymptotic covariance matrix of β̂
p oβ β βINCi INCi INCipe [ pe pe ]
and the details of pβ INCipe and
oβ INCipe are given in an appendix that will be supplied upon
request.
We used the same data as Mullahy (1997) which was taken from the Child Health
Supplement to the 1988 National Health Interview Survey. His analysis sample has 1,388
observations. The definitions of the variables included in the regressions are given in Table 3.
Mullahy’s GMM results are replicated in columns 1 through 3 of Table 4. Combining these
33
results with (66), we estimated the average incremental effect of quitting before pregnancy [the
counterfactual specified in (65)] to be .935. This estimate, which is significantly different from
zero (p-value = .001), says that if women who smoked had quit before becoming pregnant it
would have resulted in an average increase in birthweight of nearly 15 oz. for their babies. For
the purpose of comparison we also estimated the model, ignoring the possible endogeneity of
smoking, by applying simple NLS exponential regression. The NLS results are displayed in
columns 4 through 6 of Table 4. In principle, the estimation objective is still (65) and the
corresponding incremental effect estimator is
{ }n
INC oi o pi p oi oi 1 s
1PE exp(x β ) exp(x β x β )
n
=å= - + (68)
where p oβ = [β β ] ¢ is the NLS estimate of β . The correct asymptotic t-stat for (68) is obtained
using the appropriate versions of (37) and (38). Plugging the NLS estimate into (68) we
estimated the incremental effect of smoking during pregnancy to be .465. This value is also
statistically significant (p-value < .001) but is less than half the size of the estimated effect when
endogeneity is taken into account. All of the estimation was conducted using Stata/Mata® 11
code which will be supplied upon request.
6.3 Impact of Prescription Drug Use on Hospital Costs
Erten and Terza (2011) examine the effect of appropriate prescription drug (Rx) use on
subsequent hospital costs using the data analyzed by Stuart et al. (2009). Following Stuart et al.
(2009), they posit the following two-part model of hospital expenditure which accommodates the
fact that this outcome is typically observed with high frequency at zero. In this case, as in
34
similar empirical contexts, the two-part specification allows the process governing observation at
zero (e.g. whether or not the individual is hospitalized) to systematically differ from that which
determines non-zero observations (e.g. in-patient hospital expenditure for those who are
hospitalized). The former can be described as the hurdle component of the model, and the latter
is often called the levels part of the model. The hurdle ( *px
h ) and levels ( *px
y ) components of
potential hospital expenditure at a counterfactually mandated level of Rx usage ( *px ) are,
respectively, specified as
hurdle: *
p
*p p1 o o1 u u1 1x
h I(x β x β x β ε 0)= + + + > (69)
levels: *p
*p p2 o o2 u u2 2x
y exp(x β x β x β ε ) = + + + (70)
where the random variable *px
y is defined only if h = 1; the βs are parameters to be estimated; ε1
and ε2 are the random error terms; (ε1 | x) has a known distribution and
*2 p o uE[exp(ε ) | x , x , x ] 1= . The model includes latent confounders (xu) to allow for the
potential endogeneity of Rx utilization. For instance xu may include unobserved (by the
researcher) health status. If less healthy individuals tend to be admitted to the hospital and
remain there longer, and if Rx use is negatively correlated with good health then the (expected)
negative effect of Rx use on hospital expenditure will be underestimated (or even positive)
according to conventional methods that do not take account of endogeneity. The relationship
between xp and xu is formalized by assuming that the following version of (44) holds
p ux exp(wα) x
(71)
35
where w is defined as in (39), and uE[x | w] 0 . Given that the vector of confounders
(controls), o ux [x x ] , is inherently comprehensive, (69) and (70) yield the following version
of (40)
px o u p o u p uE[y | x , x ] E[y x , x , x ] E[y x , w, x ]
p p1 o o1 u u1 p p2 o o2 u u2Φ(x β x β x β )exp(x β x β x β ) . (72)
Using (72), which summarizes (15) and (16) [(23) and (24)] in the present context, we can write
(5) as the following version of (46)
MARGPE
p
p p1 o o1 p u1 p p2 o o2 p u2w,x
p
Φ(x β x β [x exp(wα)]β )exp(x β x β [x exp(wα)]β )E
x
é ù¶ + + - + + -ê ú= ê ú¶ê úë û.
(73) The appropriate version of the marginal effect estimator in (50) then is
n pi p1 oi o1 ui u1 pi p2 oi o2 ui u2MARG
i 1 p
ˆ ˆ ˆ ˆ ˆ ˆˆ ˆΦ(x β x β x β )exp(x β x β x β )1PE
n x=å
ì üï ï¶ + + + +ï ïï ï= í ýï ï¶ï ïï ïî þ
{n
pi p1 oi o1 ui u1 pi p2 oi o2 ui u2 p1i 1
1 ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆφ(x β x β x β )exp(x β x β x β )βn=
å= + + + +
}pi p1 oi o1 ui u1 pi p2 oi o2 ui u2 p2
ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆΦ(x β x β x β )exp(x β x β x β )β+ + + + +
(74)
36
where ui pi i ˆx̂ x w α= - ; and 1 p1 o1 u1ˆ ˆ ˆ ˆβ = [β β β ]¢ ¢ , 2 p2 o2 u2
ˆ ˆ ˆ ˆβ = [β β β ]¢ ¢ and α̂ are the 2SRI
estimates of 1 p1 o1 u1β = [β β β ]¢ ¢ , 2 p2 o2 u2β = [β β β ]¢ ¢ and α obtained in the following three
stages:
First Stage
Obtain α̂ by applying NLS to (71) and compute uix̂
Second Stage
Apply conventional probit analysis to the hurdle component of the model with uix̂ substituted
for ux to get 1β̂
Third Stage
Apply NLS to the levels component of the model with uix̂ substituted for ux to get 2β̂ .
Asymptotic inference for (74) can be drawn from the following version of (38)
d
MARG MARGMARG
n(PE PE ) n(0,1)
a var(PE ) (75)
where
2n n n
MARGη ηMARGi MARGi MARGii 1 i 1 i 1
MARG
pe pe pe PEˆa var(PE ) n ACOV(η)
n n n
pi p1 oi o1 ui u1 pi p2 oi o2 ui u 2 p1MARGi
ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆpe φ(x β x β x β )exp(x β x β x β )β
pi p1 oi o1 ui u1 pi p2 oi o2 ui u2 p2ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆΦ(x β x β x β )exp(x β x β x β )β+ + + + +
37
ˆACOV(η) denotes the estimated asymptotic covariance matrix of 1 2
ˆ ˆˆ ˆη = [β β α ]
1 2η β β αMARGi MARGi MARGi MARGipe [ pe pe pe ]
and the details of 1β MARGipe ,
2β MARGipe and α MARGipe are given in an appendix that will be
supplied upon request.
Erten and Terza (2011) use the same data as Stuart et. al. (2009) which were taken from
the 1999 and 2000 Medicare Current Beneficiary Survey (MCBS). It includes information on
health status, health care use and expenditures, health insurance coverage, and socioeconomic
and demographic characteristics of a nationally representative sample of Medicare beneficiaries.
Stuart et al. (2009) created a subsample of Medicare beneficiaries who were enrolled for 24
months with continuous Medicare Part A and Part B coverage. Moreover, to be included in the
analysis sample an individual was required to have had continuous drug coverage during the
study period of 24 months or no coverage at all. The sample size is 3,101 with 20 percent of the
observations having positive hospital costs. Stuart et al. (2009) used insurance coverage for Rx
as the identifying IV. To allay possible concerns about the validity of this IV, in identifying the
effect of Rx usage, Erten and Terza (2011) instead used determinants of Rx coverage that only
affect hospital expenditure through their effect on Rx usage (via their effect on Rx coverage).
The four IVs that they used are: (1) The percent of the work force in the respondent’s state that is
unionized -- a unionized work force has a higher probability of having coverage and indirectly
more prescription drug usage; (2) The average premium for Medigap Plan H, I, and J in the state
-- coverage for prescription drugs can be higher in those states with lower average premiums;
(3) A variable indicating if the state has a pharmaceutical assistance plan for low income
38
elders/or disabled Medicare beneficiaries; and (4) The state per capita income -- wealthy states
can be associated with more generous Medicare supplemental policies, including drug coverage.
The variables included in the analysis are defined in Table 5. Results for the 2SRI estimates are
shown in Tables 6 and 7. Plugging the 2SRI estimates into (74) we estimated the marginal effect
of Rx usage to be -$140.48. This value is statistically significant (p-value < .001). By
comparison, estimating the model using the conventional two-part approach (no correction for
endogeneity) produces a marginal effect estimate of $15.69 (p-value <.001). Note that the linear
IV method yields -$82.27 as the marginal effect estimate (p-value = .527). All of the estimation
was conducted using Stata/Mata® 11 code which will be supplied upon request.
7. Discussion
This paper offers a generic and unified framework for empirical policy analysis via NR
estimation. Although the use of NR results by empirical health policy researchers abounds, how
such results are related to PO-based policy relevant objectives is seldom discussed. Equally rare
in empirical studies is policy relevant interpretation of NR results. In this paper we draw the
currently absent, but much needed, connection between the PO approach and NR analysis. From
this discussion follow: 1) well-defined PO-based policy relevant estimation objectives; 2)
consistent estimators for these objectives designed to accommodate potential endogeneity; 3)
correct accompanying asymptotic inference; and 4) ease of policy relevant interpretation of
estimation results. To complete the discussion we offer real-data examples of the concepts and
methods, along with supporting Stata/Mata® 11 software.
39
References
Angrist and Pischke (2009): Mostly Harmless Econometrics, Princeton, NJ: Princeton
University Press. Basu, A. and Rathouz, P.J. (2005): “Estimating Marginal and Incremental Effects on Health
Outcomes Using Flexible Link and Variance Function Models,” Biostatistics, 6, 93-109. Basu, A., Heckman, J.J., Navarro-Lozano and Urzua, S. (2007): “ The Use of Instrumental
Variables in the Presence of Heterogeneity and Self-Selection: An Application to Treatments of Breast Cancer,” Health Economics, 16, 1133-1157.
Bierens, H. J. (2004): Introduction to the Mathematical and Statistical Foundations of
Econometrics, New York: Cambridge University Press.
Center for Human Resource Research (1995): NLS Users’ Guide. Columbus: Ohio State University, 1995.
DeSimone, J. (2002): “Illegal Drug Use and Employment,” Journal of Labor Economics, 20,
952–977. Erten, M.Z. and Terza, J.V. (2011): “Skewed Outcomes and Endogenous Regressors:
Prescription Drug Utilization and Hospital Cost Offsets,” Unpublished Manuscript, Peter Lamy Center on Drug Therapy and Aging, Department of Pharmaceutical Health Services Research, University of Maryland.
Mullahy, J. (1997): "Instrumental-Variable Estimation of Count Data Models: Applications to
Models of Cigarette Smoking Behavior," Review of Economics and Statistics, 79, 586-593.
Murphy, K.M., and Topel, R.H. (1985): "Estimation and Inference in Two- Step Econometric
Models," Journal of Business and Economic Statistics, 3, 370-379. Newey, W.K. and McFadden, D. (1994): Large Sample Estimation and Hypothesis Testing,
Handbook of Econometrics, Engle, R.F., and McFadden, D.L., Amsterdam: Elsevier Science B.V., 2111-2245, Chapter 36.
Rubin (1974): “Estimating Causal Effects of Treatments in Randomized and Non-randomized
Studies,” Journal of Educational Psychology, 66, 688-701. Rubin (1977): “Assignment to a Treatment Group on the Basis of a Covariate,” Journal of
Educational Statistics, 2, 1-26. StataCorp (2009): Stata: Release 11 Statistical Software, College Station, TX: StataCorp LP.
40
Stuart, B.C, Doshi, J., and Terza, J.V. (2009): “Assessing the Impact of Drug Use on Hospital Costs,” Health Services Research, 44, 128-144.
Terza, J. (1994): AAn Estimator for Nonlinear Regression Models with Endogenous Switching
and Sample Selection,@ Working paper, Department of Economics, Penn State University.
______ (1999): “Estimating Endogenous Treatment Effects in Retrospective Data
Analysis,”Value in Health, 2, (1999), 429-434. ______ (2006): “Estimation of Policy Effects Using Parametric Nonlinear Models: A
Contextual Critique of the Generalized Method of Moments,” Health Services and Outcomes Research Methodology, 6, 177-198.
______ (2009): “Parametric Nonlinear Regression with Endogenous Switching,” Econometric
Reviews, 28, 555-580. ______, Basu, A. and Rathouz, P. (2008): “Two-Stage Residual Inclusion Estimation:
Addressing Endogeneity in Health Econometric Modeling,” Journal of Health Economics, 27, 531-543.
White, H. (1994): Estimation, Inference and Specification Analysis, New York: Cambridge
University Press. Wooldridge, J.M. (2010): Econometric Analysis of Cross Section and Panel Data, 2nd Ed.
Cambridge, MA: MIT Press.
41
Table 1: Illegal Drug Use and Employment: Variable Definitions
Variable Type Definition Outcome Variable (y)
work84 binary (dependent variable) = 1 if hours worked past year 1984 is positive, 0 otherwise Potentially Endogenous Policy Variable (xp)
cyr84 binary (treatment variable) = 1 if used cocaine past year 1984, 0 otherwise Observable Confounders (xo)
educ84 count educational attainment May 1984 (years of education) black binary = 1 if black, 0 otherwise hisp binary = 1 if Hispanic, 0 otherwise urate84 continuous local unemployment rate 1984 metro84 binary = 1 if Standard Metropolitan Statistical Area residence 1984, 0
otherwise city84 binary = 1 if central city residence 1984, 0 otherwise neincr84a continuous fmincr84-eincr84
(faminc84 in 1982-84 $ - einc84 in 1982-84 $) noinc84 binary = 1 if “neincr84” is unknown or negative, 0 otherwise age84 count age at interview 1984 afqt2 continuous revised afqt percentile 1980 (afqt: the Armed
Forces Qualification Test (AFQT) percentile score from the 1980 survey)
mothed count mother education (years of education) mothwork binary = 1 if mother worked in 1978, 0 otherwise fathwork binary = 1 if father worked in 1978, 0 otherwise intpar84 binary = 1 if parent present at the interview 84, 0 otherwise intfr84 binary = 1 if friend present at the interview 84, 0 otherwise inthh84 binary = 1 if other hh member present at interview 84, 0 otherwise Instrumental Variables (w+)
both14 binary = 1 if father-mother present age 14, 0 otherwise alcpar binary = 1 if has an alcoholic parent, 0 otherwise decrim84 binary = if marijuana decriminalized 1984, 0 otherwise pcyrr84a continuous pcyr84 in 1982-84 $ (pcyr84: cocaine price past year 1984)
a Cocaine price and nonwage income are adjusted to 1982–84 real values using the Consumer Price Index for all urban consumers.
42
Table 2: Simple and Bivariate Probit Results
Variable
Bivariate Probit Estimates of Employment Equation (Corrected for Endogeneity)
Bivariate Probit Estimates of Illegal Drug Use Equation
Simple Probit Estimates of Employment Equation (Not Corrected for Endogeneity)
Estimate z-stat p-val Estimate z-stat p-val Estimate z-stat p-value cyr84 -0.795 -1.67 0.095 -- -- -- 0.042 0.35 0.729 educ84 0.069 2.84 0.004 -0.073 -3.57 <.001 0.080 3.35 0.001 black -0.379 -4.06 <.001 -0.153 -1.69 0.091 -0.362 -3.81 <.001 hisp 0.104 0.83 0.407 -0.102 -1.02 0.307 0.097 0.76 0.447 urate84 -0.061 -5.39 <.001 -0.033 -3.23 0.001 -0.060 -5.16 <.001 metro84 -0.032 -0.36 0.717 0.248 3.00 0.003 -0.067 -0.75 0.454 city84 -0.089 -0.86 0.389 0.233 2.74 0.006 -0.123 -1.18 0.239 neincr84a -0.093 -2.98 0.003 0.016 0.67 0.501 -0.093 -2.94 0.003 noinc84 -0.258 -2.72 0.007 0.065 0.75 0.451 -0.268 -2.78 0.005 age84 0.005 0.29 0.774 -0.007 -0.48 0.635 0.007 0.38 0.703 afqt2 0.097 4.73 <.001 0.083 5.30 <.001 0.089 4.26 <.001 mothed 0.002 0.12 0.905 0.038 2.98 0.003 -0.004 -0.27 0.790 mothwork 0.180 2.34 0.019 0.157 2.38 0.017 0.167 2.14 0.032 fathwork 0.259 3.01 0.003 -0.127 -1.56 0.119 0.283 3.26 0.001 intpar84 -0.038 -0.24 0.811 -0.150 -0.95 0.343 -0.025 -0.16 0.875 intfr84 -0.266 -1.20 0.229 0.333 1.81 0.070 -0.331 -1.47 0.142 inthh84 0.085 0.68 0.497 -0.207 -1.99 0.047 0.116 0.93 0.354 intercept 0.819 1.72 0.085 0.828 1.63 0.103 0.666 1.38 0.166 both14 -- -- -- -0.078 -1.05 0.292 alcpar -- -- -- 0.334 4.58 <.001 decrim84 -- -- -- 0.167 2.54 0.011 pcyrr84a -- -- -- -0.786 -5.95 <.001 ρ 0.431 1.73 0.084
43
Table 3: Birthweight Model: Variable Definitions
Variable Type Definition Outcome Variable (y)
BIRTHWT continuous (dependent variable)
= Birthweight (lbs.)
Potentially Endogenous Policy Variable (xp)
CIGSPREG count (treatment variable) = cigarettes smoked per day during pregnancy Observable Confounders (xo)
PARITY count = birth order WHITE binary = 1 if white, 0 otherwise MALE binary = 1 if male, 0 otherwise Instrumental Variables (w+)
EDFATHER continuous = years of education of father EDMOTHER continuous = years of education of mother FAMINCOM continuous = family income (/1000) CIGTAX88 continuous = per pack state excise tax on cigarettes
44
Table 4: GMM and NLS Results
Variable
GMM Estimates of Birthweight Equation
(Corrected for Endogeneity)
NLS Estimates of Birthweight Equation
(Not Corrected for Endogeneity)
Estimate z-stat p-val Estimate z-stat p-val CIGSPREG -0.005 -5.66 <.001 -0.010 -3.46 0.001
PARITY 0.014 2.94 0.003 0.018 3.33 0.001 WHITE 0.056 4.97 <.001 0.054 4.44 <.001 MALE 0.026 2.89 0.004 0.027 2.95 0.003
intercept 1.932 134.68 <.001 1.939 121.71 <.001
45
Table 5: Hospital Cost-Offsets Model: Variable Definitions
Variable Type Definition
Outcome Variables (hurdle – h; levels – y )
anyuse binary (hurdle variable) = 1 if hospitalized (positive hospital expenditures), 0 otherwise
clm_inp_yr2 continuous (levels variable) Hospital inpatient expenditures Potentially Endogenous Policy Variable (xp)
num_rx_yr2 count Number of prescription fills Observable Confounders (xo)
white_yr2 binary = 1 if race is white, 0 otherwise disabled_yr2 binary Medicare entitlement status – = 1 if SSDI disabled (<65),
0 otherwise disaged_yr2 binary Medicare entitlement status – = 1 if Aged/previously
disabled (>65), 0 otherwise age74_yr2 binary = 1 if 70 < age <74, 0 otherwise age79_yr2 binary = 1 if 75 < age <79, 0 otherwise age80plus_yr2 binary = if age >80, 0 otherwise marry_yr2 binary = 1 if married, 0 otherwise female_yr2 binary = 1 if female, 0 otherwise rural_yr2 binary =1 if residence is rural, 0 otherwise hsgrad_yr2 binary = 1 if high school graduate, 0 otherwise mid_yr2 binary = 1 if resides in Midwest census region, 0 otherwise south_yr2 binary = 1 if resides in southern census region, 0 otherwise west_yr2 binary = 1 if resides in western census region, 0 otherwise pay3_yr2 binary = 1 if annual income $10,001 - $20,000, 0 otherwise pay4_yr2 binary = 1 if annual income $20,001 - $30,000, 0 otherwise pay5_7_yr2 binary = 1 if annual income > $30,000, 0 otherwise risk_adjuster continuous DCG/HCC risk adjuster Instrumental Variables (w+)
UNION continuous % State workforce unionized gap_Prem continous Mean annual state-level Medigap premium (H,I,J plans) spap binary = 1 if State has pharmaceutical assistance plans, 0
otherwise PC_INCOME continuous State per capita income
46
Table 6: 2SRI Two-Part Model Results – Corrected for Endogeneity
Variable
Probit-2SRI Estimates of Hurdle Component – (Hospitalized or Not)
NLS-2SRI Estimates of Levels Component (Expenditure if Hospitalized)
Estimate z-stat p-val Estimate z-stat p-val num_rx_yr2 -0.024 -2.578 0.010 -0.040 -2.003 0.045 white_yr2 0.212 1.806 0.071 0.368 1.837 0.066 disabled_yr2 0.024 0.147 0.883 0.753 2.003 0.045 disaged_yr2 0.560 3.544 <.001 0.787 2.342 0.019 age74_yr2 -0.145 -1.177 0.239 0.078 0.399 0.690 age79_yr2 -0.057 -0.448 0.654 0.154 0.694 0.487 age80plus_yr2 -0.010 -0.077 0.939 -0.041 -0.195 0.846 marry_yr2 -0.017 -0.209 0.834 0.199 1.614 0.107 female_yr2 0.085 0.839 0.401 0.083 0.505 0.613 rural_yr2 -0.047 -0.618 0.537 -0.011 -0.069 0.945 hsgrad_yr2 0.054 0.643 0.520 -0.471 -3.255 0.001 mid_yr2 0.145 1.331 0.183 -0.463 -2.508 0.012 south_yr2 0.034 0.361 0.718 -0.331 -2.085 0.037 west_yr2 -0.069 -0.601 0.548 -0.052 -0.257 0.797 pay3_yr2 -0.218 -2.108 0.035 -0.261 -1.594 0.111 pay4_yr2 -0.257 -2.101 0.036 -0.490 -2.492 0.013 pay5_7_yr2 -0.320 -2.518 0.012 0.095 0.544 0.587 risk_adjuster 0.541 7.013 <.001 0.517 2.978 0.003
ux̂ 0.030 3.270 0.001 0.040 2.028 0.043 intercept -0.797 -3.478 0.001 9.717 26.257 <.001
47
Table 7: 2SRI First-Stage NLS Estimates for Rx-Use Equation
Variable Estimate z-stat p-val white_yr2 0.1604 2.924 0.003 white_yr2 0.2296 2.872 0.004 disabled_yr2 0.2843 4.142 0.000 disaged_yr2 -0.0551 -0.818 0.413 age74_yr2 -0.0799 -1.186 0.236 age79_yr2 -0.0717 -1.039 0.299 age80plus_yr2 0.2309 1.061 0.288 marry_yr2 0.0406 6.078 <.001 female_yr2 -0.0843 0.003 0.997 rural_yr2 0.0002 -2.156 0.031 hsgrad_yr2 0.1031 1.486 0.137 mid_yr2 0.0558 0.656 0.512 south_yr2 -0.1083 -1.219 0.223 west_yr2 -0.0713 -1.346 0.178 pay3_yr2 -0.0923 -1.467 0.142 pay4_yr2 -0.2041 -3.457 0.001 pay5_7_yr2 0.1950 12.635 <.001 risk_adjuster 0.0093 1.974 0.048 gap_Prem -0.0001 -1.492 0.136 spap -0.1144 -1.443 0.149 PC_INCOME 4.143e-06 0.621 0.535 intercept -1.6822 -7.019 <.001