health policy analysis via nonlinear regression methods

Health Policy Analysis via Nonlinear Regression Methods:

Estimation and Inference from a Potential Outcomes Perspective

by

Joseph V. Terza* Department of Economics

University of North Carolina at Greensboro Greensboro, NC 27402-6165

Phone: (336) 334-4892 E-mail: [email protected]

(April, 2011) *Professor, Department of Economics, Bryan School of Business and Economics, University of North Carolina at Greensboro, Rm. 444, Bryan Building, Greensboro, NC, 27402-6165, E-mail: [email protected] Phone: (336) 334-4892, Fax: (336) 334-5580. This work was supported by the National Institute on Drug Abuse (R01 DA013968-02), the Robert Wood Johnson Foundation Substance Abuse Policy Research Program (RWJF Grant #49981), and the National Institute on Alcohol Abuse and Alcoholism (3R01AA017890-03S1). The author is grateful for the helpful comments of Libby Dismuke and David Bradford, and for the excellent research assistance provided by F. Michael Kunz and Mujde Erten. Thanks to Jeff Desimone and John Mullahy for providing the data for the examples.

Abstract

Most empirical research in health economics (HE) and health services research (HSR) is conducted with the goal of providing scientific evidence that will serve to inform current and future health policy. Such policy analytic studies typically use nonexperimental (survey) data and focus on a particular variable (the policy variable) that is at present, or will in the future be, under the control of a policy decision-making entity. Broadly stated, the typical policy analytic goal is the estimation of the effect that the policy variable has on a targeted outcome of interest [henceforth the policy effect (PE)]. Because both the outcome and policy variables are random, and given the possible endogeneity of the latter, considerable rigor and specificity in defining the PE are required to make the concept operational (estimable and interpretable). For contexts in which the policy variable is binary, Rubin (1974, 1977) developed the potential outcomes framework (POF) which facilitates clear definition and interpretation of various policy relevant effects. The POF is clearly useful for causal analysis when the policy variable is binary. Many researchers in health economics (HE) and health services research (HSR), however, have applied nonlinear regression (NR) methods with the apparent goal of estimating policy relevant causal effects in cases wherein the policy variable px is not binary. Unfortunately, there is very little

guidance in the literature on how to implement and interpret nonlinear regression results for non-binary causal inference from a potential outcomes perspective. In the present paper, we detail an extended version of the POF that encompasses all three possible types of policy variable [binary, discrete (e.g. counts) and continuous] and accommodates possible endogeneity. As part of the discussion, we show how NR models and methods can be appropriately couched in this extended POF (EPOF). We also explore the relative practicality of the implementation of NR methods in our EPOF (vs. linear methods). We will derive the correct formulation of the asymptotic standard error for the generic NR-based policy effect estimator in the context of our EPOF and show that its computer implementation is fairly straightforward (e.g. requiring only a few lines of Mata code in a Stata 11® “do” file). This framework is comprehensive in that it: 1) fully reconciles the PO approach with causal modeling in a generic NR context; 2) affords extension of the PO approach, typically invoked in the treatment effects context, to cases in which xp is non-binary; 3) admits of policy effect estimators that are consistent even in the presence of endogeneity; and 4) is amenable to the derivation of correct formulations for the accompanying asymptotic inferential statistics that do not impose a heavy computational burden. It is hoped that the EPOF will be used as a template by empirical HE and HSR researchers for defining policy relevant NR-based estimation objectives and for interpreting the corresponding estimation results.

JEL Classification: C31, I11

Keywords: Incremental Effects, Marginal Effects, Treatment Effects, Endogeneity, Asymptotic Inference

1. Introduction

Most empirical research in health economics is conducted with the goal of providing

scientific evidence that will serve to inform current and future health policy. Such policy

analytic studies typically focus is on a particular variable (the policy variable -- xp) that is at

present, or will in the future be, under the control of a policy-making entity. Broadly stated, the

key policy analytic objective is estimation of the effect that a change in px would have on a

targeted outcome of interest (y) [henceforth the policy effect (PE)]. Because px and y are

random variables, and given that there are many ways to define a “change in px ,” considerable

rigor and specificity in defining the PE are required to make this concept operational (estimable

and interpretable). For contexts in which px is binary, Rubin (1974, 1977) developed the

potential outcomes framework (POF) which facilitates clear definition and interpretation of

various policy relevant effects. The key concept in this framework is the potential outcome

( *px

y ) – the random variable representing the outcome as it would have been manifested if the

policy variable were counterfactually fixed at *px (either *

px 0 or *px 1 ). In the POF,

interesting versions of the PE can be defined for various subsets of the population of interest

based on a counterfactual thought experiment in which the value of *px is first set at 0, and then

switched to 1. For example, suppose we are interested in evaluating the potential impact on

earnings of a fully effective substance abuse (SA) policy aimed at eradication. Such a PE

measure would be an upper bound on the potential effectiveness of any policy for treating

current abusers. In this context y represents earnings, px is a binary indicator ( px =1 if SA; 0,

2

otherwise), and *px

y denotes potential earnings as it would be if everyone in the relevant

population had *px as his mandated SA status ( *

px = 1 if SA; 0, otherwise). In this case the PE

of interest (the policy relevant estimation objective) would be the following average treatment

effect for the treated (ATET)1

SA 1 0 SA 1 1ATET E [y ] E [y ] (1)

where *

pSA 1 x

E [y ] denotes expectation over the subpopulation of substance abusers. A similar

measure can be defined for SA policies targeting prevention. In this case, the policy analyst

would seek to estimate the potential earnings gain from preventing SA among the non-abusers.

In the POF, this estimation objective would be formally expressed as the following average

treatment effect on the untreated (ATEU)

SA 0 0 SA 0 1ATEU E [y ] E [y ] (2)

where *

pSA 0 x

E [y ] denotes expectation over the subpopulation of non-abusers. Finally, the class

of SA policies of interest may target both eradication and prevention. In this case, the entire

population is relevant and, in the POF, the estimation objective would be the following average

treatment effect (ATE)

1 0ATE E[y ] E[y ] . (3)

1 Note that (1) is cast as the potential earnings gain from a fully effective SA policy. We could have written it more conventionally as SA 1 1 SA 1 0E [y ] E [y ] -- the impact of SA on earnings among substance abusers.

3

Expression (3) represents the expected impact (in terms of earnings) from a fully effective SA

policy that has both prevention and eradication components.2

The POF is clearly useful for causal analysis when px is binary. Many researchers in

health economics (HE) and health services research (HSR), however, have applied nonlinear

regression (NR) methods with the apparent goal of estimating policy relevant causal effects in

cases wherein px is not binary (discrete – e.g. a count; or continuous). Unfortunately, there is

very little guidance in the literature on how to implement and interpret nonlinear regression

results for non-binary causal inference from a potential outcomes perspective.3 In the present

paper, we detail an extended version of the POF that encompasses all three possible types of

policy variable [binary, discrete (e.g. counts) and continuous] and accommodates the possible

endogeneity of px . As part of the discussion, we show how NR models and methods can be

appropriately couched in this extended POF (EPOF). This framework is comprehensive in that

it: 1) fully reconciles the PO approach with causal modeling in a generic NR context; 2) affords

extension of the PO approach, typically invoked in the treatment effects context, to cases in

which xp is non-binary – e.g. real-valued and continuous (marginal effects) or real-valued and

discrete (but non-binary) [incremental effects]; 3) admits of policy effect estimators that are

consistent even in the presence of endogeneity; and 4) is amenable to the derivation of correct

formulations for the accompanying asymptotic inferential statistics that do not impose a heavy

computational burden. It is hoped that the EPOF will be used as a template by empirical HE and

2 There are other measures in the POF framework, which are relevant when treatment effects are heterogeneous at the individual level – viz., the local average treatment effect and, the marginal treatment effect. See Basu et al. (2007) for a comprehensive discussion of the various treatment effect concepts. 3 A notable exception is section 4.5.3 of Angrist and Pischke (2009).

4

HSR researchers for defining policy relevant NR-based estimation objectives and for interpreting

the corresponding estimation results.

We also explore the relative practicality of the implementation of NR methods in our

EPOF (vs. linear methods). It goes without saying that, given the breadth of available packaged

software options, the practicality of parameter estimation via NR methods is virtually on par with

that of the linear estimators (OLS and linear IV) – I point, in particular, to the Stata 11® “nl”,

“ml” and “gmm” procedures. The linear methods, however, enjoy an advantage when it comes

to computation of the correct standard error of the estimated policy effect of interest. As will be

made clear later in the context of our EPOF, no special programming is required for the correct

standard error in the linear case because the regression coefficient parameter itself is the policy

relevant estimation objective. We will derive the correct formulation of the asymptotic standard

error for the generic NR-based policy effect estimator in the context of our EPOF and show that

its computer implementation is fairly straightforward (e.g. requiring only a few lines of Mata

code in a Stata 11® “do” file).

The remainder of the paper is organized as follows. In the next section we discuss the

basic definitions and concepts and, within our EPOF, formulate the three most commonly

targeted counterfactual policy effect measures – treatment effects, incremental effects, and

marginal effects. We review the conventional use of NR in “effect” estimation in Section 3,

paying particular attention to the two most common approaches in the literature: conditional

mean estimation at fixed values of xp and the other regressors; and the method of recycled

predictions. In Section 4, we develop estimators that are consistent even in the presence of

endogeneity for all three versions of the counterfactual policy effect (treatment effects,

5

incremental effects, and marginal effects). Full development of the asymptotic properties and

accompanying correct asymptotic inferential statistics for our suggested estimation methods are

also given therein. In Section 5, we focus on the case in which xp is endogenous and

demonstrate how extant NR methods can be used to obtain the requisite consistent parameter

estimates for the policy effect estimators detailed in Section 4. Examples (programmed in

Stata/Mata 11®) are discussed in Section 6. The paper is summarized and conclusions are drawn

in the final section.

2. The Extended Potential Outcomes Framework

Here we extend the POF for the case in which the policy variable of interest ( px ) is

binary to encompass contexts in which px is a random variable of a generic type: Binary – in

which case the relevant support is {0, 1}; Discrete – with support being a subset of the integers,

e.g., {0, 1, 2, ...} (count data); or Continuous – having support equal to a subset of the real line,

e.g., [0, ∞) (with the entire real line as a specific case).4 Moreover, in our extended POF

(EPOF), the counterfactually mandated version of px ( *px ) is not restricted to a single value

(either 0 or 1). Instead it can be a random variable. This leads to a somewhat broader definition

of the potential outcome ( *px

y ). Here *px

y is representative of the distribution of potential

outcomes that would obtain if the distribution of the policy variable were mandated and fixed at

*px . Note that, as in the POF discussed in the introduction, *

px and *px

y are counterfactual. For

4 The support of a random variable is the largest closed set over which its probability density function is non-zero.

6

the ith individual in the population, the potential outcome ( *px (i)

y ) cannot be observed unless the

observed value of the policy variable ( pix ) is equal to the mandated value for that individual

( *p(i)x ).

To make our definition of the PE operational, we define the relevant policy as a change

from a policy scenario characterized by *px (possibly counterfactual) to a counterfactual one in

which the mandated version of the policy variable is * *p px Δ(x )+ , where *

pΔ(x ) denotes the

proposed policy increment which is possibly a function of *px .5 This hypothetical experiment is

counterfactual (or at least partially counterfactual) in the sense that the values of the policy

variable ( px ) that would be observed for some individuals in the population (say, in sampling)

will not be equal to the values assigned to them via *px ; for others the value of xp will not

coincide with * *p px Δ(x )+ ; and for still others the observable value of xp would equal neither

their corresponding value of *px nor * *

p px Δ(x )+ . Stated in terms of potential outcomes, the

policy analyst is interested in the difference between *px

E[y ]and * *p px Δ(x )

E[y ]+

that would be

exclusively attributable to the mandated (policy) change from *px to * *

p px Δ(x )+ .6

Henceforth, without loss of generality, we will focus on policies in which *px is taken to

be the currently observable version of the random variable xp (at the time of the relevant survey)

5Recall that in the generic case, *

px and * *p px Δ(x )+ are random variables.

6Without loss of generality, the definition of the policy effect (PE) may be based on any of a number of possible

policy relevant features of the distribution of *px

y . Instead of the expected value, the analyst may be interested in:

the variance; a particular quantile; the hazard at time to (for duration models); etc.

7

and assume that *px and *

pxy are the only variates that change as part of the policy. We refer to

this as the default policy scenario. We now proceed to give rigorous definition to the PE as it

pertains to the various policy variable types (discrete, continuous, and binary). In the default

policy scenario, for the case in which xp is discrete, the generic form of the estimation objective

would be the following incremental policy effect

* * *p p p

*INC p x Δ(x ) x

PE (Δ(x )) E[y ] E[y ]+

= - . (4)

For example, suppose the outcome of interest is the number of yearly office visits to the

physician, px is the per visit copay (measured in discrete $1 increments), and the proposed

policy entails a fixed and universal increase in the copay. In this case *pΔ(x ) Δº , where Δ is a

constant.

When xp is continuous, it is often the case that the policy analyst will not have a specific

increment to the policy variable in mind. In such cases, the following marginal policy effect

becomes the relevant estimation objective

INC

Δ 0MARG

lim PE (Δ)PE

Δ= (5)

where INCPE (Δ) is defined as in (4) with

*pΔ(x ) Δº . From (5) we have7

p px Δ x

Δ 0

E[y ] E[y ]lim

Δ

+

-

7 The first equality in this expression is an application of the law of iterated expectations (Wooldridge, 2010, pp. 18-22). The second equality holds under fairly general conditions (see Bierens, 2004, p. 45-46).

8

p px Δ p x p

Δ 0

E E[y | x Δ] E[y | x ]lim

Δ

+

é ù+ -ê úë û=

p p

p

x Δ p x px

Δ 0

E[y | x Δ] E[y | x ]E lim

Δ

+

é ù+ -ê ú= ê úê úë û

p

p

x px

p

E[y | x ]E

x

é ù¶ê ú= ê ú¶ê úë û

.

This derivation warrants some clarification. To conserve on notation, we use bE[a | b] to denote

the conditional expectation of the counterfactual random variable ba given that the random

variable b is fixed at a particular value in the support of b (say B). A more explicit version of

this notation might be bE[a | b B] . So, for instance we could replace px pE[y | x ] with

px p pE[y | x X ]= , where pX is a fixed value in the support of px . Given this notational

convention, it should be clear, for instance, that px Δ pE[y | x Δ]+ + is a nonstochastic scalar

function of the nonstochastic quantity pX Δ+ , so that p px Δ p x p

Δ 0

E[y | x Δ] E[y | x ]lim

Δ

+

+ - is

legitimate and equal to px p

p

E[y | x ]

x

¶

¶.

It is interesting to note that when xp is binary (1), (2) and (3) can be cast as special cases

in our EPOF. In the default policy scenario, the generic form of the targeted policy effect (PE)

would be

*p

* * *p p p

xBINARY x Δ(x ) x

PE ( 1) E(y ) E(y )+

é ù= - -ê úë û

(6)

9

from which (1), (2) and (3) follow as special cases. It is easy to verify that: (1) obtains for

* *p pΔ(x ) xº - ; we get (2) when * *

p pΔ(x ) 1 xº - ; and (3) follows from * *p pΔ(x ) 1 2xº - .

As the foregoing discussion makes clear, (4), (5) and (6) are logical PO-based targets for

health policy analysis. Which of them is apropos in a particular policy context will depend on

the support of the policy variable in question and whether or not the policy increment ( *pΔ(x ) ) is

known [(4) if xp is discrete or continuous and the policy increment is known; (5) if xp is

continuous and the policy increment is unknown; and (6) if xp is binary]. In the next section, we

review the conventional approaches to “effect” estimation implemented by empirical researchers

in HE and HSR and note that in applications of these approaches and extant renditions of the

underlying theory, connections with PO-based policy relevant estimation objectives [in

particular, (4), (5) and (6)] are not typically discussed. The main objective of the present paper is

to remedy this by providing applied researchers with a generic and comprehensive potential

outcomes framework for defining, estimating and interpreting causal effects in a NR context.

3. Conventional Approaches to PNR-Based Effect Estimation

The focus of the present paper is on the use of NR results for policy effect estimation.

The prototypical NR model, in its minimal form, is defined by a conditional mean regression

function like

p o p oE[y | x , x ] M(x , x ; τ)= (7)

10

where ox denotes a vector of observable controls (observable confounders), M( ) is a known

function, and τ is a vector of unknown regression parameters to be estimated.8 From elementary

econometrics we know that under the usual identification conditions, τ can be estimated by

applying the nonlinear least squares (NLS) method to (7) using a sample of data on the

observable variables y, px , and ox . If, in addition, the forms of higher-order moments of the

distribution of p o(y | x , x ) are known (assumed) [e.g. conditional heteroskedasticy, conditional

skewness, conditional kurtosis, etc.], statistical efficiency gains can be made by incorporating

such additional non-sample information in the estimation routine [e.g. generalized NLS]. In the

extreme, if the form of the probability density function of p o(y | x , x ) is known (assumed), a

fully efficient estimate of τ can be obtained via the maximum likelihood (ML) method.

Whatever the underlying assumptions of the model, we will henceforth assume that (7) holds,

and that we have a corresponding consistent estimator of τ ( τ̂ ) in hand. It should be noted here

that all of the above discussion holds regardless of whether or not (7) is representative of the

“true causal model.” This is an important point, which we will revisit below.

The typical empirical study in HE and HSR implementing NR methods makes no explicit

mention of the PO approach as it relates to policy objectives like (4), (5) and (6). Instead, the

focus is on the following “effect” measures

# #o oM(1, x ; τ) M(0, x ; τ)- (8)

and/or

8 The term “minimal” is used here to describe the level of the parametricity of the model. The minimal requisite assumption for a parametric NR model is knowledge of the form of the conditional mean regression function up to a vector of unknown parameters. A maximally parametric model would be one in which the form of the conditional density function is known up to the unknown parameter vector.

11

# #

p p o o

p o

p x x ,x x

M(x , x ; τ)

x= =

¶

¶ (9)

where #px and #

ox are fixed and known values of the policy variable ( px ) and vector of

observable confounders ( ox ), respectively; and M( ) is defined in (7). Sample analog estimators

of (8) and (9) are

# #o oˆ ˆM(1, x ; τ) M(0, x ; τ)- . (10)

and

# #

p p o o

p o

p x x ,x x

ˆM(x , x ; τ)

x= =

¶

¶ (11)

where τ̂ is the vector of parameter estimates obtained by applying a NR method based on (7) to

the observable data. Results from these estimators are typically included in the output from NR

software packages (e.g. see the “mfx” option and the “margins” procedure in Stata 11®,

StataCorp, 2009). Applied researchers in HE and HSR routinely opt to report estimates from

(10) and (11) but typically neglect to offer PO-based policy relevant interpretation of these

results.

Other empirical researchers use the following recycled prediction statistics as treatment

effect and marginal effect estimators, respectively

n

oi oi 1

1ˆ ˆM(1, x , τ) M(0, x , τ)

n

(12)

12

n

pi oi

i 1 p

ˆM(x , x , τ)1

n x

(13)

where the “i” subscript denotes the individual (i = 1, …, n), and n is the sample size (see Basu

and Rathouz, 2005; StataCorp, 2009). Here again, I could find no discussions in the literature

that relate (12) and (13) in any way to PO-based policy-relevant estimation objectives like (4),

(5) and (6). Therefore, even when recycled predictions estimates are reported, they are seldom

(if ever) interpreted from a PO-based policy analytic perspective.

4. Causally Interpretable NR-Based Treatment, Incremental and Marginal Effect

Estimators

We seek to offer the empirical health policy analyst means for clear and concise PO-

based causal interpretation of results obtained via NR estimation. We begin by rewriting the PO-

based expressions (3), (4) and (5) in a way that makes them amenable to estimation and

inference via NR methods. This discussion also suggests appropriately extended versions of (12)

and (13) that are consistent for the policy relevant objectives (3) and (5). An analogous

estimator for (4) is also detailed. Finally, we show that these estimators can be cast as

conventional two-stage optimization estimators. This affords a means for derivation of their

asymptotic properties.

We begin by noting that for any vector of confounders (v), using the law of iterated

expectations, we can write9

* *p px x

E[y ] E E[y | v] . (14)

9 Confounders are variables that are correlated (in sampling) with both the outcome and the policy variable (xp).

13

We define the comprehensive vector of confounders and the true causal model to be the vector

0x and function 0 0 0pμ (x , x ; τ ) , respectively, that satisfy both of the following conditions

0 0 0 0

p pE[y | x , x ] μ (x , x ; τ ) (15)

*p

0 0 * 0 0px

E[y | x ] μ (x , x ; τ ) (16)

for all *

px , where 0τ is defined as the true value of the vector of unknown parameters. This

definition says that the conditional mean regression function 0pE[y | x , x ] can be causally

interpreted in a potential outcomes (PO) sense, if and only if the set of confounders used as

conditioning variates includes all possible variables that are correlated with both y and xp. For

any other set of conditioning confounders, say v, pE[y | x , v] and *px

E[y | v] diverge. This is

important because the latter is representative, via (14), of the counterfactual entity of key interest

(viz. *px

E[y ]) and the former is estimable via observable data. If the vector of confounders is not

comprehensive, then the equality between the observable and counterfactual entities represented

in (15) and (16), respectively, will not hold and the data cannot be sufficiently informative to

allow accurate estimation of, and inference about, relevant targeted policy effects [like (3), (4)

and (5)].10

It is important to note the distinction between (15) and (16). The former applies to the

case in which xp is observable and jointly distributed with (and not necessarily independent of)

x0. In (16), *px is a counterfactually (exogenously) mandated version of px . The random

10 An illustrative example is detailed in an appendix that will be supplied upon request.

14

variable *px need not be degenerate and is, by design, distributed independently of 0x and px .

Such independence holds even if we set *p px x

for policy analytic purposes, as we did in

defining the policy analytic objectives (3), (4) and (5) [and (1) and (2)]. We defined this as the

default policy scenario.

If the set of confounders is comprehensive, using (15) and (16), we can rewrite (3), (4)

and (5) as

Treatment Effect

0 0 0 0 0 0ATEPE E[μ (1, x ; τ ) μ (0, x ; τ )]= - (17)

Incremental Effect

0 * 0 0 0 * 0 0INC p pPE (Δ) E[μ (x Δ, x ; τ ) μ (x , x ; τ )]= + - (18)

Marginal Policy Effect11

*p

*p

* 0 * 0 0px p

MARG * *xp p

E[y | x ] μ (x , x ; τ )PE E E

x x

é ù¶ é ù¶ê ú ê ú= =ê ú ê úê ú¶ ¶ê úë ûê úë û

. (19)

The sample analog estimators corresponding to (17) through (19) are

{ }n 0 0 0 0 0 0

ATE i ii 1

1ˆ ˆPE μ (1, x ; τ ) μ (0, x ; τ )

n=å= - (20)

{ }n 0 0 0 0 0 0

INC p pi i pi ii 1

1ˆ ˆPE (Δx ) μ (x Δ, x ; τ ) μ (x , x ; τ )

n=å= + - (21)

11 See the discussion following equation (5) for the definition of *

p

* *p px

E[y | x ] / x¶ ¶ .

15

0 0 0

n pi iMARG

i 1 p

ˆμ (x , x ; τ )1PE

n x=å

¶=

¶. (22)

These estimators are consistent for (3), (4) and (5), respectively, if and only if 0τ̂ is consistent

and the vector of confounders is comprehensive. This discussion directly links the recycled

prediction estimators (12) and (13) with (3) and (5), respectively, and shows that the former will

only be consistent for the latter if the set of confounders (controls) is comprehensive.

After dropping the “0” and “*” superscripts for notational convenience, we may

characterize the true causal model for any policy analytic context by rewriting (15) and (16) as

p o u p o uE[y | x , x , x ] μ(x , x , x ; τ) (23)

px o u p o uE[y | x , x ] μ(x , x , x ; τ) (24)

where ox and ux denote the observable and unobservable confounders (the latter may be a vector

or a scalar). Expressions (23) and (24) are validated by the implicit comprehensiveness of the

vector of confounders o ux [x x ] -- inclusion of the unobservable confounder ux ensures the

comprehensiveness of x. Using this notation, (20) through (22) become

{ }n

ATE oi ui oi uii 1

1ˆ ˆPE μ(1, x , x ; τ) μ(0, x , x ; τ)

n=å= - (25)

{ }n

INC pi oi ui pi oi uii 1

1ˆ ˆPE (Δ) μ(x Δ, x , x ; τ ) μ(x , x , x ; τ)

n=å= + - (26)

n pi oi uiMARG

i 1 p

ˆμ(x , x , x ; τ)1PE

n x=å

¶=

¶. (27)

16

Writing the PO-based policy relevant estimators in this way exposes a potential difficulty in their

implementation. As they stand, (25) through (27) are not feasible because uix is, by definition,

unobservable. We will discuss feasible versions of these estimators later.

We now turn to the asymptotic properties of these estimators. To simplify the discussion,

we rewrite (25), (26) and (27) in generic form as

n

ii 1

1PE pe

n=å= (28)

where

pi ii ˆpe pe(x , x , τ)=

i oi uix [x x ]

and

μ(1, x; τ) μ(0, x; τ) for (25) (28-1)

ppe(x , x, τ) = p pμ(x Δ, x, τ) μ(x , x, τ) for (26) (28-2)

p

p

μ(x , x, τ)

x

for (27). (28-3)

To obtain the asymptotic properties of (28) we note that it can be cast as a two-stage

optimization (2SOPT) estimator.12 2SOPT estimators are characterized by two objective

functions: Q( ), a full information objective function whose optimizer is a consistent estimator

of all parameters of the model;13 and Q1( ), a first stage objective function whose optimizer is a

12 For details on 2SOPT estimators see White (1994, Chapter 6); Newey and McFadden, 1994; and Wooldridge, 2010, Chapter 12). These authors extend the results of Murphy and Topel (1985) for two-stage maximum likelihood estimators to the more general class of 2SOPT estimators. 13 Here we use the term "full information" to indicate that Q( ) takes account of all of the available nonsample information. This does not imply that full information maximum likelihood estimation is necessarily feasible.

17

consistent estimator of a subvector of the full set of parameters of interest. In the 2SOPT

protocol, the full vector of parameters is (say) θ [δ ω ] . In the first stage, Q1 is optimized to

obtain a consistent estimate of δ. In the second stage, Q is optimized with respect to ω with δ

fixed at its first-stage estimated value ( δ̂ ). Let τ play the role of δ, and let PE (PE = PEATE,

PEARC(Δ), or PEMARG ) play the role of ω. The estimator of PE in (28) can be represented as a

2SOPT estimator by specifying the first-stage objective function as that which corresponds to the

in-hand consistent estimator of τ ( τ̂ ), say

n

1 1 ii 1

Q (τ) q (τ, z )=å= (29)

and the second-stage objective function as

n

ii 1

Q(PE) q(PE,z )=å= (30)

where

( )2

i iq(PE,z ) pe PE=- - (31)

and i i pi iz [y x x ] . It is, however, easy to show that the optimizer of (31) is (28).

Therefore, our 2SOPT characterization of (28) is valid.

Using existing results on the asymptotics of 2SOPT estimators, we have14

14 For a scalar function “s” of two vector arguments j and t (i.e. s = s(j, t) where s is a scalar and j and t are vectors) we define:

j

ss

j

¶ =

¶ = the gradient vector of s with respect to the elements of j

18

[ ] [ ] [ ]1 1

PE τ 1 ττ 1 PE τ PE τ ττ 1 τ 1 PEE q q E q E q E q E q E q q- - ¢é ù¢é ù é ù ¢- - ê úê ú ê úë û ë û ë û

12

PE PE PEE q E q-ùé ù é ù+ ê úúê ú ë ûë û û

. (32)

Moreover, PEq 2(pe PE) = - , ppe pe(x , x, τ)º , PE PEq 2 = , [ ]PE τ τE q 2E peé ù =- ê úë û and

2 2PEE q 4E (pe PE)é ù é ù = -ê ú ê úë û ë û . Therefore, (32) can be written

( ) [ ] [ ]τ τˆa var PE E pe AVAR(τ)E pe ¢=

[ ] [ ] [ ]1τ 1 ττ 1 τE (pe PE) q E q E pe

- ¢+ - [ ] [ ] [ ]1τ τ 1 τ 1E pe E q E (pe PE) q

- ¢+ -

2E (pe PE)é ù+ -ê úë û . (33)

It is typically the case that15

τ 1 pE q | x , x = 0é ùê úë û (34)

which implies that

[ ]τ 1 τ 1 pE (pe PE) q E E (pe PE) q | x , xé ùé ù- = - ê úê úë ûë û

and

2

jts

sj t

¶ =

¶ ¶= the matrix of cross-partial derivatives of s with respect to the elements of j and t.

We also assume that the former is a row vector, and the latter is a matrix with row dimension equal to that of the first subscript on and column dimension equal to that of the second subscript. 15 In an appendix that will be supplied upon request, we show that if the first-stage estimator of τ is maximum likelihood or nonlinear least squares, then condition (34) holds.

( ) 1

PE PE PE τ PE τˆa var PE E q E q AVAR(τ)E q- é ¢é ù é ù é ùê= ê ú ê ú ê úë û ë û ë ûêë

19

τ 1 pE (pe PE) E q | x , xé ùé ù= - ê úê úë ûë û = 0 (35)

so (33) becomes

( ) [ ] [ ]τ τâ var PE E pe AVAR(τ)E pe ¢= 2E (pe PE)é ù+ -ê úë û . (36)

The asymptotic variance given in (36) can be consistently estimated using

( )

( ) n n n 2

τ τi i ii 1 i 1 i 1

pe pe (pe PE)â var PE n AVAR(τ)

n n n= = =å å å

¢æ ö æ ö æ ö÷ ÷ ÷ç ç ç -÷ ÷ ÷ç ç ç÷ ÷ ÷ç ç ç÷ ÷ ÷ç ç ç= +÷ ÷ ÷ç ç ç÷ ÷ ÷ç ç ç÷ ÷ ÷ç ç ç÷ ÷ ÷÷ ÷ ÷ç ç çè ø è ø è ø

(37)

where τ ipe denotes τpe evaluated at xpi, xi and τ̂ ; and ÂVAR(τ) is the estimated

asymptotic covariance matrix of τ̂ . In summary, PE is consistent and

( ) dn

(PE PE) n(0,1)a var PE

- ¾¾ . (38)

5. Estimating τ when xp is Endogenous: Accounting for the Unobservable Confounder xu

To complete the discussion, we must develop feasible versions of (25) through (27). This

requires a consistent estimator of τ and a means of dealing with the nonobservability of xu. As

we will see, estimation of τ will be based on (23) and whatever additional nonsample

information on ( p o uy | x , x , x ) is available. When xu is null, τ can be estimated using a

conventional NR method (e.g., NLS, ML) based on (23), and the policy effect estimator in (28)

20

coincides with the recycled predictions estimators (12) and (13) for treatment effects and

marginal effects, respectively. Therefore, as a corollary to the above discussion, we have that the

recycled predictions estimators are consistent for (3) and (5) and their asymptotic standard errors

can be obtained using the appropriate versions of (37).16 When xu is not null (i.e., when xp is

endogenous) it must be taken into account in both the estimation of τ and the policy effect [(3),

(4) and (5)]. In the following we separately discuss appropriate methods for dealing with

endogeneity for the cases in which xp is binary and non-binary (discrete or continuous).

5.1 Binary Endogenous Policy Variable

In this case we focus on (3) [i.e., (17)] as the estimation objective. It is clear that the

feasibility of estimating (17) [e.g. with a statistic similar to (25)] hinges on our ability to: 1)

appropriately account for xu in the formulation of the estimator; and 2) consistently estimate τ

given the endogeneity of xp [i.e., given that xu is present in (17)]. Following Terza (1994, 2009),

we can formalize the correlation between xp and xu by assuming that

p ux I t(w,α) x 0

(39)

where I(A) denotes the indicator function which takes the value 1 if condition A is true and 0

otherwise, t(w,α) is a known function, w = [xo w+], α is a conformable vector of unknown

parameters, w+ is a vector of identifying instrumental variables – variables that are correlated

with xp but do not otherwise influence the value of y -- and u(x | w) has a known distribution

function. Under these assumptions we can extend (23/24) as

16 Similarly when xu is null, the recycled predictions estimator of the incremental effect is (26) sans xui.

21

px o u p o u p u p o uE[y | x , x ] E[y x , x , x ] E[y x , w, x ] μ(x , x , x ; τ) (40)

so that (17) becomes

[ ]uATE (x |w) o u o uPE E E μ(1, x , x ; τ) μ(0, x , x ; τ)é ù= -ê úë û .

[ ]o u o u u uE μ(1, x , x ; τ) μ(0, x , x ;τ) g(x ) dx¥

-¥ò

é ùê ú= -ê úë û

(41)

where g( ) denotes the known probability density function (pdf) of u(x | w) . If we have a

consistent estimator of τ (say τ̂ ), (41) can be consistently estimated using the following feasible

replacement for (25)17

[ ]n

ATE o u o u u ui 1

1ˆ ˆPE μ(1, x , x ; τ) μ(0, x , x ; τ) g(x ) dx

n

¥

= -¥å ò

ì üï ïï ï= -í ýï ïï ïî þ. (42)

The asymptotic standard error of this estimator [and the relevant asymptotic t-statistic] can be

obtained from (37) [and (38)] by writing (42) as the special case of (28) in which (28-1) is

replaced by

[ ]p o u o u u upe(x , x, τ) = μ(1, x , x ; τ) μ(0, x , x ; τ) g(x ) dx¥

-¥ò - .

In developing a consistent estimator for τ in this context we follow Terza (1994, 2009)

and note that under the assumptions of the model

17 The requisite integral for (41) can be evaluated using quadrature or simulation approximation. The GAUSS INTQUAD1 procedure is accurate and efficient for the former, and Halton sequence based methods work well for the latter.

22

pE y x , w

t(w,α)

p o u u u p o u u ut(w,α)

p p

μ(x , x , x ; τ)g(x )dx μ(x , x , x ; τ)g(x )dxx 1 x

1 G( t(w,α)) G( t(w,α))

(43)

where G( ) denotes the known distribution function of u(x | w) . Based on (43), a simple two-

stage estimator can be implemented. First, estimate α by applying the ML method to the

appropriately specified binary response model. In the second stage, apply NLS to the regression

model defined by (43) with the first-stage estimate (say, α̂ ) substituted for α. This estimator is

consistent and asymptotically normal under general conditions, and standard asymptotic results

can be applied in adjusting the NLS standard errors for the fact that the estimator is two-stage

(see White, 1994, Chapter 6).18

5.2 Non-Binary Endogenous Policy Variable

We now turn to the estimation of (4) and (5) when xp is endogenous (i.e., when xu is not

null). Here we replace (39) with

p ux r(w,α) x

(44)

where r(w,α) is a known function, w is defined as in (39), and uE[x | w] 0 . Under these

assumptions (4) and (5) can be written

*

u p

* *INC p o u p o uw,x ,x

PE (Δ) E [μ(x Δ, x , x ; τ) μ(x , x , x ; τ)]= + -

18 For a more detailed discussion of the correct formulation of the asymptotic covariance matrix of this two-stage estimator, see Terza (1999).

23

*

p

* *p o p p o pw,x

E [μ(x Δ, x ,[x r(w,α)]; τ) μ(x , x ,[x r(w,α)]; τ)]= + - - - (45)

and

*p u

*p u

MARG *x ,w,xp

E[y | x , w, x ]PE E

x


*p

*p o p

*w,xp

μ(x , x ,[x r(w,α)]; τ)E

x

é ù¶ -ê ú= ê ú¶ê úë û (46)

and the two-stage residual inclusion (2SRI) method suggested by Terza et al. (2008) can be

implemented for the estimation of τ.19 In the first stage of the estimator, obtain a consistent

estimate of the vector α ( α̂ ) by applying NLS to (44). Next, compute the residual as

u p ˆx̂ x – r(w,α)= . (47)

In the second stage, estimate τ by applying NLS to the following NR obtained from (23)

y = p o uˆμ(x , x , x ; τ) e+ (48)

where ux̂ is defined as in (47), and e is the regression error term. This estimator is consistent

and asymptotically normal.20 Using the 2SRI results, the following feasible versions of (26) and

(27) can be implemented

{ }

nINC pi oi ui pi oi ui

i 1

1ˆ ˆ ˆ ˆPE (Δ) μ(x Δ, x , x ; τ ) μ(x , x , x ; τ)

n=å= + - (49)

19 Validation of (45) and (46) is detailed in an appendix that will be supplied upon request. 20 The asymptotic properties of this estimator can be derived as a special case of the generic 2SOPT estimator. Details are given in an appendix that will be supplied upon request.

24

n pi oi uiMARG

i 1 p

ˆ ˆμ(x , x , x ; τ)1PE

n x=å

¶=

¶ (50)

where τ̂ is the 2SRI estimate and uix̂ is the residual for the ith sample member [as defined in

(47)].

For the special case in which p o u p p o o u uμ(x , x , x ; τ) exp(x τ x τ x τ ) , the generalized

method of moments (GMM) estimator developed by Mullahy (1997) can be used to obtain the

requisite consistent estimator of τ. Mullahy (1997) assumes that u uE[exp(x τ ) | w] 1 . Terza

(2006) shows how this assumption can be used to develop feasible versions of (26) and (27). To

see this note that

*p u

* *INC p p o o u u p p o o u ux ,w,x

PE (Δ) E exp((x Δ)β x β x β ) exp(x β x β x β )é ù= + + + - + +ê úë û

*

u up

* *(x |w) p p o o u u (x |w) p p o o u ux ,w

E E [exp((x Δ)β x β x β )] E [exp(x β x β x β )]é ù= + + + - + +ê úë û

*p

*p p o o u ux ,w

E exp((x Δ)β x β )E[exp(x β ) | w]é= + +êë

*p p o o u uE[exp(x β x β )E[exp(x β ) | w]ù- + úû

*p

* *p p o o p p o ox ,w

E exp((x Δ)β x β ) exp(x β x β )é ù= + + - +ê úë û (51)

and

*p u

*p u

MARG *x ,w,xp

E[y | x , w, x ]PE E

x


*p

*p p o o

*x ,wp

exp(x β x β )E

x

é ù¶ +ê ú= ê ú¶ê úë û

25

*p

*p p o o px ,w

E exp(x β x β )βé ù= +ê úë û . (52)

The corresponding feasible versions of (26) and (27) are

{ }nINC pi p oi o pi p oi o

i 1

1 ˆ ˆ ˆ ˆPE (Δ) exp((x Δ)β x β ) exp(x β x β )n=

å= + + - + (53)

nMARG pi p oi o p

i 1

1 ˆ ˆ ˆPE exp(x β x β )βn=

å= + . (54)

6. Examples

The following three examples illustrate estimation of the average treatment effect, the

incremental effect, and the marginal effect measures given in (3), (4) and (5).

6.1 The Effect of Illegal Drug Use on Employment Status

For illustrative purposes we revisit the study by Desimone (2002) in which the author

explores how illegal drug use (abuse) affects individual labor market outcomes. The author’s

apparent objective is the estimation of the average treatment effect of substance abuse ( px ) on

employment status. We revisit this analysis with a well-defined PO-based estimation objective

[viz. (3)] and a clearly articulated method for achieving that objective [viz., the appropriate

version of (42)]. As discussed in the introduction to the present paper, the policy significance of

the average treatment effect as defined in (3) in this context is that it can be taken as a measure of

the potential benefit from a fully effective substance abuse policy aimed at prevention and

eradication.

26

We focus on one of the four year/drug combinations examined by DeSimone – viz.

1984/cocaine. In this case the outcome of interest and the policy variable are defined in the

following way

1 if worked at all in past year y = 0 otherwise.

1 if used cocaine at all in past year xp = 0 otherwise.

We assume that *

pxy (the potential outcome at the counterfactually mandated value of *

px [1 or 0])

follows a probit-type parametric process of the form

*p

*p p o o u ux

y I(x β x β x β ε 0) (55)

where o u(ε | x , x ) is standard normal distributed. The model includes xu to allow for the

potential endogeneity of SA. For instance, the unobservable confounders (xu) may include

comorbidities – sicker individuals may have a lower than average propensity to use illegal drugs,

and a simultaneously lower than average likelihood of being employed. We formalize the

relationship between xp and xu by assuming that the following version of (39) holds

p ux I wα x 0

(56)

where w = [xo w+], (xu | w) is standard normal distributed, w+ is the vector of identifying

instrumental variables (discussed later), and α is a vector of unknown parameters to be estimated.

27

Given that the vector of confounders (controls), o ux [x x ] , is inherently comprehensive, (55)

yields the following version of (40)

px o u p o u p u p p o o u uE[y | x , x ] E[y x , x , x ] E[y x , w, x ] Φ(x β x β x β ) (57)

where Φ( ) denotes the standard normal cumulative distribution function (cdf). Using (57),

which summarizes (15) and (16) [(23) and (24)] in the present context, we can write (3) as the

following version of (41)

ATE p o o u u o o u u u uPE E Φ(β x β x β ) Φ(x β x β ) φ(x ) dx¥

-¥ò

é ùé ùê ú= + + - +ê úë ûê úë û (58)

where φ( ) denotes the standard normal pdf. Equation (58) can be rewritten as21

ATE p o o o oPE E Φ(γ x γ ) Φ(x γ ) . (59)

where

2p pγ 1 ρ β

2o oγ 1 ρ β

2u

u 2u

βρ sgn(β )

1 + β (60)

and sgn(q) = 1 (-1) if q is positive (negative). It can also be shown that p oγ [γ γ ] (along with

ρ) can be consistently estimated via conventional bivariate probit analysis using a “packaged”

21 Details are given in an appendix that will be supplied upon request.

28

estimation procedure like “biprobit” in Stata 11®.22 The appropriate version of the average

treatment effect estimator in (42) then is

nATE p oi o oi o

i 1

1ˆ ˆ ˆPE Φ(γ x γ ) Φ(x γ )

n=å é ù= + -ê úë û (61)

where p oˆ ˆ ˆγ [γ γ ] denotes the bivariate probit estimate of γ. Asymptotic inference for (61) can

be drawn from the following version of (38)

d

ATE ATEATE

n(PE PE ) n(0,1)

a var(PE ) (62)

where

2n n n

ATEγ γATEi ATEi ATEii 1 i 1 i 1

ATE

pe pe pe PEˆa var(PE ) n ACOV(γ)

n n n

p oi o oi oATEi ˆ ˆ ˆpe Φ(γ x γ ) Φ(x γ )= + -

ˆACOV(γ) denotes the estimated asymptotic covariance matrix of γ̂

p oγ γ γATEi ATEi ATEipe [ pe pe ]

and the details of pγ ATEipe and

oγ ATEipe are given in an appendix that will be supplied upon

request.

We used the same analysis sample as DeSimone (2002), which was taken from the NLSY

(Center for Human Resource Research 1995). A cohort of individuals aged 14–22 was

interviewed in 1979, and reinterviewed annually until 1994. DeSimone uses the data from a 22 Details are given in an appendix that will be supplied upon request.

29

supplemental drug use questionnaire that was administered in 1984 and 1988. There is also

background information on these respondents from the 1979 and 1980 panels. The sample

includes only males. The author did not exclude the respondents with missing household

income; rather he included a dummy variable, an indicator for missing nonwage income. The

definitions of all variables included in the model can be found in Table 1. The bivariate probit

results for γ and α are shown in columns 1 and 4 of Table 2, respectively. Combining these

results with (61) we estimated the average treatment effect of SA to be -.15. This estimate is not

significantly different from zero (p-value = .216) which is in contrast to the result obtained by

DeSimone (2002). His marginal effect estimate (undefined) is -.226 and it is significant at the

5% level (p-value = .015). The null hypothesis that xp is exogenous can be tested based on the

bivariate probit estimate of the correlation coefficient ρ. The policy variable will be exogenous

if and only if βu = 0 and, as can be seen in equation (60), this will be true if and only if ρ = 0. As

shown in the last line of Table 2, the estimated value of ρ is .43 with a p-value of .084, reflecting

only marginal significance. We also applied the conventional linear IV method which yielded an

average treatment effect estimate of -.268 (p-value <.01). Note the difference between this result

and that obtained via bivariate probit.

For the purpose of comparison, we estimated the employment equation with simple

probit analysis ignoring the potential endogeneity of the drug use variable. In this case, we

estimated the model as defined in (55) and (57) after setting βu = 0. Here (58) becomes

ATE p o o o oPE E[Φ(β x β ) Φ(x β )]= + - (63)

which can be consistently estimated using the following version of (61)

30

nATE p oi o oi o

i 1

1 ˆ ˆ ˆPE Φ(β x β ) Φ(x β )n=

å é ù= + -ê úë û (64)

where p oˆ ˆ ˆβ [β β ] denotes the simple (conventional) probit estimate of p oβ [β β ] . The

correct asymptotic t-stat for (64) is obtained using the appropriate versions of (37) and (38).

The simple probit estimates, along with their attendant t-stats and p-values are shown in columns

8 through 10 of Table 2. Plugging these results into (64) we estimated the average treatment

effect of illegal drug use to be .005. This result is both counterintuitive and insignificant (p-

value = .353). All of the estimation was conducted using Stata/Mata® 11 code which will be

supplied upon request.

This example demonstrates the importance of clear and rigorous upfront definition of the

estimation objective. We were very clear and careful on this point in our re-analysis of the

DeSimone study. We cast the analysis in a PO framework and rigorously defined the

counterfactual effect of interest to be the average treatment effect as specified in (3). We then

applied the appropriate version of the corresponding consistent estimation protocol as developed

and discussed above. The author neither specifies his estimation objective nor offers any detail

regarding the marginal effect estimator he implements (results given in his Table 4).23 Perhaps

most importantly, there is no discussion of the relevant underlying asymptotic theory. In

particular, the formulation of the asymptotic standard errors or asymptotic t-stats of his marginal

effect estimators is not mentioned. Our estimated average treatment effect is 33% smaller than

his marginal effect estimate and in contrast to the marginal effect estimate reported by the author,

23The author reports “… changes in probability of employment induced by a change in the variable from zero to one while holding all other variables constant…” but neither discuss exactly how these “marginal effects” are computed nor rigorously defines the population-level effects that they are intended to estimate.

31

it is statistically insignificant. It is difficult to comment on possible reasons for the divergence of

our result from the author’s because there is very little detail reported in the paper about his

estimation objective and the marginal effect estimator he implemented.

6.2 The Effect of Smoking During Pregnancy on Birthweight

Here we revisit the very clever work of Mullahy (1997) in which he develops an easy to

apply GMM method for the exponential regression case (mentioned above in section 5.2).

Because his paper is mainly methodological in thrust, no attention is paid to the specification and

estimation of counterfactual effects like those defined in (3), (4) and (5). As one of his

illustrative applications of the method, he regresses infant birthweight (y) on smoking during

pregnancy ( px ), along with binary control variables indicating race and sex. His GMM method

is used to account for the potential endogeneity of the smoking variable. In the present example,

we implement Mullahy’s birthweight-smoking GMM regression results in estimating the

potential effect of a counterfactual experiment in which all who smoke during pregnancy have

their cigarette consumption ( *px ) exogenously reduced to zero. The corresponding estimation

objective is the following variant of (51)

*INC o o p p o oPE E exp(x β ) exp(x β x β )é ù= - +ê úë û (65)

in which *px is fixed at px (the default policy scenario), and the expectation is taken over the

subpopulation who smoked during pregnancy ( px 0> ). This is the potential expected reduction

in birthweight that would have prevailed if the women who smoked would have quit before

becoming pregnant. The following version of (49) can be used to consistently estimate (65)

32

{ }sn

INC oi o pi p oi oi 1 s

1 ˆ ˆ ˆPE exp(x β ) exp(x β x β )n=

å= - + (66)

where ns denotes the number of mothers who smoked during pregnancy and p oˆ ˆ ˆβ = [β β ]¢ is the

GMM estimate of p oβ = [β β ] . Asymptotic inference for (66) can be drawn from the

following version of (38)

ds

INC INCINC

n(PE PE ) n(0,1)

a var(PE ) (67)

where

2n n n

INCβ βINCi INCi INCii 1 i 1 i 1

INC

pe pe pe PEˆa var(PE ) n ACOV(β)

n n n

oi o pi p oi oINCi

ˆ ˆ ˆpe exp(x β ) exp(x β x β )

ˆACOV(β) denotes the estimated asymptotic covariance matrix of β̂

p oβ β βINCi INCi INCipe [ pe pe ]

and the details of pβ INCipe and

oβ INCipe are given in an appendix that will be supplied upon

request.

We used the same data as Mullahy (1997) which was taken from the Child Health

Supplement to the 1988 National Health Interview Survey. His analysis sample has 1,388

observations. The definitions of the variables included in the regressions are given in Table 3.

Mullahy’s GMM results are replicated in columns 1 through 3 of Table 4. Combining these

33

results with (66), we estimated the average incremental effect of quitting before pregnancy [the

counterfactual specified in (65)] to be .935. This estimate, which is significantly different from

zero (p-value = .001), says that if women who smoked had quit before becoming pregnant it

would have resulted in an average increase in birthweight of nearly 15 oz. for their babies. For

the purpose of comparison we also estimated the model, ignoring the possible endogeneity of

smoking, by applying simple NLS exponential regression. The NLS results are displayed in

columns 4 through 6 of Table 4. In principle, the estimation objective is still (65) and the

corresponding incremental effect estimator is

{ }n

INC oi o pi p oi oi 1 s

1PE exp(x β ) exp(x β x β )

n

=å= - + (68)

where p oβ = [β β ] ¢ is the NLS estimate of β . The correct asymptotic t-stat for (68) is obtained

using the appropriate versions of (37) and (38). Plugging the NLS estimate into (68) we

estimated the incremental effect of smoking during pregnancy to be .465. This value is also

statistically significant (p-value < .001) but is less than half the size of the estimated effect when

endogeneity is taken into account. All of the estimation was conducted using Stata/Mata® 11

code which will be supplied upon request.

6.3 Impact of Prescription Drug Use on Hospital Costs

Erten and Terza (2011) examine the effect of appropriate prescription drug (Rx) use on

subsequent hospital costs using the data analyzed by Stuart et al. (2009). Following Stuart et al.

(2009), they posit the following two-part model of hospital expenditure which accommodates the

fact that this outcome is typically observed with high frequency at zero. In this case, as in

34

similar empirical contexts, the two-part specification allows the process governing observation at

zero (e.g. whether or not the individual is hospitalized) to systematically differ from that which

determines non-zero observations (e.g. in-patient hospital expenditure for those who are

hospitalized). The former can be described as the hurdle component of the model, and the latter

is often called the levels part of the model. The hurdle ( *px

h ) and levels ( *px

y ) components of

potential hospital expenditure at a counterfactually mandated level of Rx usage ( *px ) are,

respectively, specified as

hurdle: *

p

*p p1 o o1 u u1 1x

h I(x β x β x β ε 0)= + + + > (69)

levels: *p

*p p2 o o2 u u2 2x

y exp(x β x β x β ε ) = + + + (70)

where the random variable *px

y is defined only if h = 1; the βs are parameters to be estimated; ε1

and ε2 are the random error terms; (ε1 | x) has a known distribution and

*2 p o uE[exp(ε ) | x , x , x ] 1= . The model includes latent confounders (xu) to allow for the

potential endogeneity of Rx utilization. For instance xu may include unobserved (by the

researcher) health status. If less healthy individuals tend to be admitted to the hospital and

remain there longer, and if Rx use is negatively correlated with good health then the (expected)

negative effect of Rx use on hospital expenditure will be underestimated (or even positive)

according to conventional methods that do not take account of endogeneity. The relationship

between xp and xu is formalized by assuming that the following version of (44) holds

p ux exp(wα) x

(71)

35

where w is defined as in (39), and uE[x | w] 0 . Given that the vector of confounders

(controls), o ux [x x ] , is inherently comprehensive, (69) and (70) yield the following version

of (40)

px o u p o u p uE[y | x , x ] E[y x , x , x ] E[y x , w, x ]

p p1 o o1 u u1 p p2 o o2 u u2Φ(x β x β x β )exp(x β x β x β ) . (72)

Using (72), which summarizes (15) and (16) [(23) and (24)] in the present context, we can write

(5) as the following version of (46)

MARGPE

p

p p1 o o1 p u1 p p2 o o2 p u2w,x

p

Φ(x β x β [x exp(wα)]β )exp(x β x β [x exp(wα)]β )E

x

é ù¶ + + - + + -ê ú= ê ú¶ê úë û.

(73) The appropriate version of the marginal effect estimator in (50) then is

n pi p1 oi o1 ui u1 pi p2 oi o2 ui u2MARG

i 1 p

ˆ ˆ ˆ ˆ ˆ ˆˆ ˆΦ(x β x β x β )exp(x β x β x β )1PE

n x=å

ì üï ï¶ + + + +ï ïï ï= í ýï ï¶ï ïï ïî þ

{n

pi p1 oi o1 ui u1 pi p2 oi o2 ui u2 p1i 1

1 ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆφ(x β x β x β )exp(x β x β x β )βn=

å= + + + +

}pi p1 oi o1 ui u1 pi p2 oi o2 ui u2 p2

ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆΦ(x β x β x β )exp(x β x β x β )β+ + + + +

(74)

36

where ui pi i ˆx̂ x w α= - ; and 1 p1 o1 u1ˆ ˆ ˆ ˆβ = [β β β ]¢ ¢ , 2 p2 o2 u2

ˆ ˆ ˆ ˆβ = [β β β ]¢ ¢ and α̂ are the 2SRI

estimates of 1 p1 o1 u1β = [β β β ]¢ ¢ , 2 p2 o2 u2β = [β β β ]¢ ¢ and α obtained in the following three

stages:

First Stage

Obtain α̂ by applying NLS to (71) and compute uix̂

Second Stage

Apply conventional probit analysis to the hurdle component of the model with uix̂ substituted

for ux to get 1β̂

Third Stage

Apply NLS to the levels component of the model with uix̂ substituted for ux to get 2β̂ .

Asymptotic inference for (74) can be drawn from the following version of (38)

d

MARG MARGMARG

n(PE PE ) n(0,1)

a var(PE ) (75)

where

2n n n

MARGη ηMARGi MARGi MARGii 1 i 1 i 1

MARG

pe pe pe PEˆa var(PE ) n ACOV(η)

n n n

pi p1 oi o1 ui u1 pi p2 oi o2 ui u 2 p1MARGi

ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆpe φ(x β x β x β )exp(x β x β x β )β

pi p1 oi o1 ui u1 pi p2 oi o2 ui u2 p2ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆΦ(x β x β x β )exp(x β x β x β )β+ + + + +

37

ˆACOV(η) denotes the estimated asymptotic covariance matrix of 1 2

ˆ ˆˆ ˆη = [β β α ]

1 2η β β αMARGi MARGi MARGi MARGipe [ pe pe pe ]

and the details of 1β MARGipe ,

2β MARGipe and α MARGipe are given in an appendix that will be

supplied upon request.

Erten and Terza (2011) use the same data as Stuart et. al. (2009) which were taken from

the 1999 and 2000 Medicare Current Beneficiary Survey (MCBS). It includes information on

health status, health care use and expenditures, health insurance coverage, and socioeconomic

and demographic characteristics of a nationally representative sample of Medicare beneficiaries.

Stuart et al. (2009) created a subsample of Medicare beneficiaries who were enrolled for 24

months with continuous Medicare Part A and Part B coverage. Moreover, to be included in the

analysis sample an individual was required to have had continuous drug coverage during the

study period of 24 months or no coverage at all. The sample size is 3,101 with 20 percent of the

observations having positive hospital costs. Stuart et al. (2009) used insurance coverage for Rx

as the identifying IV. To allay possible concerns about the validity of this IV, in identifying the

effect of Rx usage, Erten and Terza (2011) instead used determinants of Rx coverage that only

affect hospital expenditure through their effect on Rx usage (via their effect on Rx coverage).

The four IVs that they used are: (1) The percent of the work force in the respondent’s state that is

unionized -- a unionized work force has a higher probability of having coverage and indirectly

more prescription drug usage; (2) The average premium for Medigap Plan H, I, and J in the state

-- coverage for prescription drugs can be higher in those states with lower average premiums;

(3) A variable indicating if the state has a pharmaceutical assistance plan for low income

38

elders/or disabled Medicare beneficiaries; and (4) The state per capita income -- wealthy states

can be associated with more generous Medicare supplemental policies, including drug coverage.

The variables included in the analysis are defined in Table 5. Results for the 2SRI estimates are

shown in Tables 6 and 7. Plugging the 2SRI estimates into (74) we estimated the marginal effect

of Rx usage to be -$140.48. This value is statistically significant (p-value < .001). By

comparison, estimating the model using the conventional two-part approach (no correction for

endogeneity) produces a marginal effect estimate of $15.69 (p-value <.001). Note that the linear

IV method yields -$82.27 as the marginal effect estimate (p-value = .527). All of the estimation

was conducted using Stata/Mata® 11 code which will be supplied upon request.

7. Discussion

This paper offers a generic and unified framework for empirical policy analysis via NR

estimation. Although the use of NR results by empirical health policy researchers abounds, how

such results are related to PO-based policy relevant objectives is seldom discussed. Equally rare

in empirical studies is policy relevant interpretation of NR results. In this paper we draw the

currently absent, but much needed, connection between the PO approach and NR analysis. From

this discussion follow: 1) well-defined PO-based policy relevant estimation objectives; 2)

consistent estimators for these objectives designed to accommodate potential endogeneity; 3)

correct accompanying asymptotic inference; and 4) ease of policy relevant interpretation of

estimation results. To complete the discussion we offer real-data examples of the concepts and

methods, along with supporting Stata/Mata® 11 software.

39

References

Angrist and Pischke (2009): Mostly Harmless Econometrics, Princeton, NJ: Princeton

University Press. Basu, A. and Rathouz, P.J. (2005): “Estimating Marginal and Incremental Effects on Health

Outcomes Using Flexible Link and Variance Function Models,” Biostatistics, 6, 93-109. Basu, A., Heckman, J.J., Navarro-Lozano and Urzua, S. (2007): “ The Use of Instrumental

Variables in the Presence of Heterogeneity and Self-Selection: An Application to Treatments of Breast Cancer,” Health Economics, 16, 1133-1157.

Bierens, H. J. (2004): Introduction to the Mathematical and Statistical Foundations of

Econometrics, New York: Cambridge University Press.

Center for Human Resource Research (1995): NLS Users’ Guide. Columbus: Ohio State University, 1995.

DeSimone, J. (2002): “Illegal Drug Use and Employment,” Journal of Labor Economics, 20,

952–977. Erten, M.Z. and Terza, J.V. (2011): “Skewed Outcomes and Endogenous Regressors:

Prescription Drug Utilization and Hospital Cost Offsets,” Unpublished Manuscript, Peter Lamy Center on Drug Therapy and Aging, Department of Pharmaceutical Health Services Research, University of Maryland.

Mullahy, J. (1997): "Instrumental-Variable Estimation of Count Data Models: Applications to

Models of Cigarette Smoking Behavior," Review of Economics and Statistics, 79, 586-593.

Murphy, K.M., and Topel, R.H. (1985): "Estimation and Inference in Two- Step Econometric

Models," Journal of Business and Economic Statistics, 3, 370-379. Newey, W.K. and McFadden, D. (1994): Large Sample Estimation and Hypothesis Testing,

Handbook of Econometrics, Engle, R.F., and McFadden, D.L., Amsterdam: Elsevier Science B.V., 2111-2245, Chapter 36.

Rubin (1974): “Estimating Causal Effects of Treatments in Randomized and Non-randomized

Studies,” Journal of Educational Psychology, 66, 688-701. Rubin (1977): “Assignment to a Treatment Group on the Basis of a Covariate,” Journal of

Educational Statistics, 2, 1-26. StataCorp (2009): Stata: Release 11 Statistical Software, College Station, TX: StataCorp LP.

40

Stuart, B.C, Doshi, J., and Terza, J.V. (2009): “Assessing the Impact of Drug Use on Hospital Costs,” Health Services Research, 44, 128-144.

Terza, J. (1994): AAn Estimator for Nonlinear Regression Models with Endogenous Switching

and Sample Selection,@ Working paper, Department of Economics, Penn State University.

______ (1999): “Estimating Endogenous Treatment Effects in Retrospective Data

Analysis,”Value in Health, 2, (1999), 429-434. ______ (2006): “Estimation of Policy Effects Using Parametric Nonlinear Models: A

Contextual Critique of the Generalized Method of Moments,” Health Services and Outcomes Research Methodology, 6, 177-198.

______ (2009): “Parametric Nonlinear Regression with Endogenous Switching,” Econometric

Reviews, 28, 555-580. ______, Basu, A. and Rathouz, P. (2008): “Two-Stage Residual Inclusion Estimation:

Addressing Endogeneity in Health Econometric Modeling,” Journal of Health Economics, 27, 531-543.

White, H. (1994): Estimation, Inference and Specification Analysis, New York: Cambridge

University Press. Wooldridge, J.M. (2010): Econometric Analysis of Cross Section and Panel Data, 2nd Ed.

Cambridge, MA: MIT Press.

41

Table 1: Illegal Drug Use and Employment: Variable Definitions

Variable Type Definition Outcome Variable (y)

work84 binary (dependent variable) = 1 if hours worked past year 1984 is positive, 0 otherwise Potentially Endogenous Policy Variable (xp)

cyr84 binary (treatment variable) = 1 if used cocaine past year 1984, 0 otherwise Observable Confounders (xo)

educ84 count educational attainment May 1984 (years of education) black binary = 1 if black, 0 otherwise hisp binary = 1 if Hispanic, 0 otherwise urate84 continuous local unemployment rate 1984 metro84 binary = 1 if Standard Metropolitan Statistical Area residence 1984, 0

otherwise city84 binary = 1 if central city residence 1984, 0 otherwise neincr84a continuous fmincr84-eincr84

(faminc84 in 1982-84 $ - einc84 in 1982-84 $) noinc84 binary = 1 if “neincr84” is unknown or negative, 0 otherwise age84 count age at interview 1984 afqt2 continuous revised afqt percentile 1980 (afqt: the Armed

Forces Qualification Test (AFQT) percentile score from the 1980 survey)

mothed count mother education (years of education) mothwork binary = 1 if mother worked in 1978, 0 otherwise fathwork binary = 1 if father worked in 1978, 0 otherwise intpar84 binary = 1 if parent present at the interview 84, 0 otherwise intfr84 binary = 1 if friend present at the interview 84, 0 otherwise inthh84 binary = 1 if other hh member present at interview 84, 0 otherwise Instrumental Variables (w+)

both14 binary = 1 if father-mother present age 14, 0 otherwise alcpar binary = 1 if has an alcoholic parent, 0 otherwise decrim84 binary = if marijuana decriminalized 1984, 0 otherwise pcyrr84a continuous pcyr84 in 1982-84 $ (pcyr84: cocaine price past year 1984)

a Cocaine price and nonwage income are adjusted to 1982–84 real values using the Consumer Price Index for all urban consumers.

42

Table 2: Simple and Bivariate Probit Results

Variable

Bivariate Probit Estimates of Employment Equation (Corrected for Endogeneity)

Bivariate Probit Estimates of Illegal Drug Use Equation

Simple Probit Estimates of Employment Equation (Not Corrected for Endogeneity)

Estimate z-stat p-val Estimate z-stat p-val Estimate z-stat p-value cyr84 -0.795 -1.67 0.095 -- -- -- 0.042 0.35 0.729 educ84 0.069 2.84 0.004 -0.073 -3.57 <.001 0.080 3.35 0.001 black -0.379 -4.06 <.001 -0.153 -1.69 0.091 -0.362 -3.81 <.001 hisp 0.104 0.83 0.407 -0.102 -1.02 0.307 0.097 0.76 0.447 urate84 -0.061 -5.39 <.001 -0.033 -3.23 0.001 -0.060 -5.16 <.001 metro84 -0.032 -0.36 0.717 0.248 3.00 0.003 -0.067 -0.75 0.454 city84 -0.089 -0.86 0.389 0.233 2.74 0.006 -0.123 -1.18 0.239 neincr84a -0.093 -2.98 0.003 0.016 0.67 0.501 -0.093 -2.94 0.003 noinc84 -0.258 -2.72 0.007 0.065 0.75 0.451 -0.268 -2.78 0.005 age84 0.005 0.29 0.774 -0.007 -0.48 0.635 0.007 0.38 0.703 afqt2 0.097 4.73 <.001 0.083 5.30 <.001 0.089 4.26 <.001 mothed 0.002 0.12 0.905 0.038 2.98 0.003 -0.004 -0.27 0.790 mothwork 0.180 2.34 0.019 0.157 2.38 0.017 0.167 2.14 0.032 fathwork 0.259 3.01 0.003 -0.127 -1.56 0.119 0.283 3.26 0.001 intpar84 -0.038 -0.24 0.811 -0.150 -0.95 0.343 -0.025 -0.16 0.875 intfr84 -0.266 -1.20 0.229 0.333 1.81 0.070 -0.331 -1.47 0.142 inthh84 0.085 0.68 0.497 -0.207 -1.99 0.047 0.116 0.93 0.354 intercept 0.819 1.72 0.085 0.828 1.63 0.103 0.666 1.38 0.166 both14 -- -- -- -0.078 -1.05 0.292 alcpar -- -- -- 0.334 4.58 <.001 decrim84 -- -- -- 0.167 2.54 0.011 pcyrr84a -- -- -- -0.786 -5.95 <.001 ρ 0.431 1.73 0.084

43

Table 3: Birthweight Model: Variable Definitions

Variable Type Definition Outcome Variable (y)

BIRTHWT continuous (dependent variable)

= Birthweight (lbs.)

Potentially Endogenous Policy Variable (xp)

CIGSPREG count (treatment variable) = cigarettes smoked per day during pregnancy Observable Confounders (xo)

PARITY count = birth order WHITE binary = 1 if white, 0 otherwise MALE binary = 1 if male, 0 otherwise Instrumental Variables (w+)

EDFATHER continuous = years of education of father EDMOTHER continuous = years of education of mother FAMINCOM continuous = family income (/1000) CIGTAX88 continuous = per pack state excise tax on cigarettes

44

Table 4: GMM and NLS Results

Variable

GMM Estimates of Birthweight Equation

(Corrected for Endogeneity)

NLS Estimates of Birthweight Equation

(Not Corrected for Endogeneity)

Estimate z-stat p-val Estimate z-stat p-val CIGSPREG -0.005 -5.66 <.001 -0.010 -3.46 0.001

PARITY 0.014 2.94 0.003 0.018 3.33 0.001 WHITE 0.056 4.97 <.001 0.054 4.44 <.001 MALE 0.026 2.89 0.004 0.027 2.95 0.003

intercept 1.932 134.68 <.001 1.939 121.71 <.001

45

Table 5: Hospital Cost-Offsets Model: Variable Definitions

Variable Type Definition

Outcome Variables (hurdle – h; levels – y )

anyuse binary (hurdle variable) = 1 if hospitalized (positive hospital expenditures), 0 otherwise

clm_inp_yr2 continuous (levels variable) Hospital inpatient expenditures Potentially Endogenous Policy Variable (xp)

num_rx_yr2 count Number of prescription fills Observable Confounders (xo)

white_yr2 binary = 1 if race is white, 0 otherwise disabled_yr2 binary Medicare entitlement status – = 1 if SSDI disabled (<65),

0 otherwise disaged_yr2 binary Medicare entitlement status – = 1 if Aged/previously

disabled (>65), 0 otherwise age74_yr2 binary = 1 if 70 < age <74, 0 otherwise age79_yr2 binary = 1 if 75 < age <79, 0 otherwise age80plus_yr2 binary = if age >80, 0 otherwise marry_yr2 binary = 1 if married, 0 otherwise female_yr2 binary = 1 if female, 0 otherwise rural_yr2 binary =1 if residence is rural, 0 otherwise hsgrad_yr2 binary = 1 if high school graduate, 0 otherwise mid_yr2 binary = 1 if resides in Midwest census region, 0 otherwise south_yr2 binary = 1 if resides in southern census region, 0 otherwise west_yr2 binary = 1 if resides in western census region, 0 otherwise pay3_yr2 binary = 1 if annual income $10,001 - $20,000, 0 otherwise pay4_yr2 binary = 1 if annual income $20,001 - $30,000, 0 otherwise pay5_7_yr2 binary = 1 if annual income > $30,000, 0 otherwise risk_adjuster continuous DCG/HCC risk adjuster Instrumental Variables (w+)

UNION continuous % State workforce unionized gap_Prem continous Mean annual state-level Medigap premium (H,I,J plans) spap binary = 1 if State has pharmaceutical assistance plans, 0

otherwise PC_INCOME continuous State per capita income

46

Table 6: 2SRI Two-Part Model Results – Corrected for Endogeneity

Variable

Probit-2SRI Estimates of Hurdle Component – (Hospitalized or Not)

NLS-2SRI Estimates of Levels Component (Expenditure if Hospitalized)

Estimate z-stat p-val Estimate z-stat p-val num_rx_yr2 -0.024 -2.578 0.010 -0.040 -2.003 0.045 white_yr2 0.212 1.806 0.071 0.368 1.837 0.066 disabled_yr2 0.024 0.147 0.883 0.753 2.003 0.045 disaged_yr2 0.560 3.544 <.001 0.787 2.342 0.019 age74_yr2 -0.145 -1.177 0.239 0.078 0.399 0.690 age79_yr2 -0.057 -0.448 0.654 0.154 0.694 0.487 age80plus_yr2 -0.010 -0.077 0.939 -0.041 -0.195 0.846 marry_yr2 -0.017 -0.209 0.834 0.199 1.614 0.107 female_yr2 0.085 0.839 0.401 0.083 0.505 0.613 rural_yr2 -0.047 -0.618 0.537 -0.011 -0.069 0.945 hsgrad_yr2 0.054 0.643 0.520 -0.471 -3.255 0.001 mid_yr2 0.145 1.331 0.183 -0.463 -2.508 0.012 south_yr2 0.034 0.361 0.718 -0.331 -2.085 0.037 west_yr2 -0.069 -0.601 0.548 -0.052 -0.257 0.797 pay3_yr2 -0.218 -2.108 0.035 -0.261 -1.594 0.111 pay4_yr2 -0.257 -2.101 0.036 -0.490 -2.492 0.013 pay5_7_yr2 -0.320 -2.518 0.012 0.095 0.544 0.587 risk_adjuster 0.541 7.013 <.001 0.517 2.978 0.003

ux̂ 0.030 3.270 0.001 0.040 2.028 0.043 intercept -0.797 -3.478 0.001 9.717 26.257 <.001

47

Table 7: 2SRI First-Stage NLS Estimates for Rx-Use Equation

Variable Estimate z-stat p-val white_yr2 0.1604 2.924 0.003 white_yr2 0.2296 2.872 0.004 disabled_yr2 0.2843 4.142 0.000 disaged_yr2 -0.0551 -0.818 0.413 age74_yr2 -0.0799 -1.186 0.236 age79_yr2 -0.0717 -1.039 0.299 age80plus_yr2 0.2309 1.061 0.288 marry_yr2 0.0406 6.078 <.001 female_yr2 -0.0843 0.003 0.997 rural_yr2 0.0002 -2.156 0.031 hsgrad_yr2 0.1031 1.486 0.137 mid_yr2 0.0558 0.656 0.512 south_yr2 -0.1083 -1.219 0.223 west_yr2 -0.0713 -1.346 0.178 pay3_yr2 -0.0923 -1.467 0.142 pay4_yr2 -0.2041 -3.457 0.001 pay5_7_yr2 0.1950 12.635 <.001 risk_adjuster 0.0093 1.974 0.048 gap_Prem -0.0001 -1.492 0.136 spap -0.1144 -1.443 0.149 PC_INCOME 4.143e-06 0.621 0.535 intercept -1.6822 -7.019 <.001

health policy analysis via nonlinear regression methods

Documents