the stata journal - mahidol university...the stata journal editors h. joseph newton department of...

330
The Stata Journal Volume 15 Number 1 2015 ® A Stata Press publication StataCorp LP College Station, Texas

Upload: others

Post on 10-Feb-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • The Stata JournalVolume 15 Number 1 2015

    ®

    A Stata Press publicationStataCorp LPCollege Station, Texas

  • The Stata JournalEditors

    H. Joseph Newton

    Department of Statistics

    Texas A&M University

    College Station, Texas

    [email protected]

    Nicholas J. Cox

    Department of Geography

    Durham University

    Durham, UK

    [email protected]

    Associate Editors

    Christopher F. Baum, Boston College

    Nathaniel Beck, New York University

    Rino Bellocco, Karolinska Institutet, Sweden, and

    University of Milano-Bicocca, Italy

    Maarten L. Buis, University of Konstanz, Germany

    A. Colin Cameron, University of California–Davis

    Mario A. Cleves, University of Arkansas for

    Medical Sciences

    William D. Dupont, Vanderbilt University

    Philip Ender, University of California–Los Angeles

    David Epstein, Columbia University

    Allan Gregory, Queen’s University

    James Hardin, University of South Carolina

    Ben Jann, University of Bern, Switzerland

    Stephen Jenkins, London School of Economics and

    Political Science

    Ulrich Kohler, University of Potsdam, Germany

    Frauke Kreuter, Univ. of Maryland–College Park

    Peter A. Lachenbruch, Oregon State University

    Jens Lauritsen, Odense University Hospital

    Stanley Lemeshow, Ohio State University

    J. Scott Long, Indiana University

    Roger Newson, Imperial College, London

    Austin Nichols, Urban Institute, Washington DC

    Marcello Pagano, Harvard School of Public Health

    Sophia Rabe-Hesketh, Univ. of California–Berkeley

    J. Patrick Royston, MRC Clinical Trials Unit,

    London

    Philip Ryan, University of Adelaide

    Mark E. Schaffer, Heriot-Watt Univ., Edinburgh

    Jeroen Weesie, Utrecht University

    Ian White, MRC Biostatistics Unit, Cambridge

    Nicholas J. G. Winter, University of Virginia

    Jeffrey Wooldridge, Michigan State University

    Stata Press Editorial Manager

    Lisa Gilmore

    Stata Press Copy Editors

    David Culwell, Shelbi Seiner, and Deirdre Skaggs

    The Stata Journal publishes reviewed papers together with shorter notes or comments, regular columns, book

    reviews, and other material of interest to Stata users. Examples of the types of papers include 1) expository

    papers that link the use of Stata commands or programs to associated principles, such as those that will serve

    as tutorials for users first encountering a new field of statistics or a major new technique; 2) papers that go

    “beyond the Stata manual” in explaining key features or uses of Stata that are of interest to intermediate

    or advanced users of Stata; 3) papers that discuss new commands or Stata programs of interest either to

    a wide spectrum of users (e.g., in data management or graphics) or to some large segment of Stata users

    (e.g., in survey statistics, survival analysis, panel analysis, or limited dependent variable modeling); 4) papers

    analyzing the statistical properties of new or existing estimators and tests in Stata; 5) papers that could

    be of interest or usefulness to researchers, especially in fields that are of practical importance but are not

    often included in texts or other journals, such as the use of Stata in managing datasets, especially large

    datasets, with advice from hard-won experience; and 6) papers of interest to those who teach, including Stata

    with topics such as extended examples of techniques and interpretation of results, simulations of statistical

    concepts, and overviews of subject areas.

    The Stata Journal is indexed and abstracted by CompuMath Citation Index, Current Contents/Social and Behav-

    ioral Sciences, RePEc: Research Papers in Economics, Science Citation Index Expanded (also known as SciSearch),

    Scopus, and Social Sciences Citation Index.

    For more information on the Stata Journal, including information for authors, see the webpage

    http://www.stata-journal.com

    http://www.stata-journal.com

  • Subscriptions are available from StataCorp, 4905 Lakeway Drive, College Station, Texas 77845, telephone

    979-696-4600 or 800-STATA-PC, fax 979-696-4601, or online at

    http://www.stata.com/bookstore/sj.html

    Subscription rates listed below include both a printed and an electronic copy unless otherwise mentioned.

    U.S. and Canada Elsewhere

    Printed & electronic Printed & electronic

    1-year subscription $115 1-year subscription $145

    2-year subscription $210 2-year subscription $270

    3-year subscription $285 3-year subscription $375

    1-year student subscription $ 85 1-year student subscription $115

    1-year institutional subscription $345 1-year institutional subscription $375

    2-year institutional subscription $625 2-year institutional subscription $685

    3-year institutional subscription $875 3-year institutional subscription $965

    Electronic only Electronic only

    1-year subscription $ 85 1-year subscription $ 85

    2-year subscription $155 2-year subscription $155

    3-year subscription $215 3-year subscription $215

    1-year student subscription $ 55 1-year student subscription $ 55

    Back issues of the Stata Journal may be ordered online at

    http://www.stata.com/bookstore/sjj.html

    Individual articles three or more years old may be accessed online without charge. More recent articles may

    be ordered online.

    http://www.stata-journal.com/archives.html

    The Stata Journal is published quarterly by the Stata Press, College Station, Texas, USA.

    Address changes should be sent to the Stata Journal, StataCorp, 4905 Lakeway Drive, College Station, TX

    77845, USA, or emailed to [email protected].

    ®

    Copyright c© 2015 by StataCorp LP

    Copyright Statement: The Stata Journal and the contents of the supporting files (programs, datasets, and

    help files) are copyright c© by StataCorp LP. The contents of the supporting files (programs, datasets, andhelp files) may be copied or reproduced by any means whatsoever, in whole or in part, as long as any copy

    or reproduction includes attribution to both (1) the author and (2) the Stata Journal.

    The articles appearing in the Stata Journal may be copied or reproduced as printed copies, in whole or in part,

    as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal.

    Written permission must be obtained from StataCorp if you wish to make electronic copies of the insertions.

    This precludes placing electronic copies of the Stata Journal, in whole or in part, on publicly accessible websites,

    fileservers, or other locations where the copy may be accessed by anyone other than the subscriber.

    Users of any of the software, ideas, data, or other materials published in the Stata Journal or the supporting

    files understand that such use is made without warranty of any kind, by either the Stata Journal, the author,

    or StataCorp. In particular, there is no warranty of fitness of purpose or merchantability, nor for special,

    incidental, or consequential damages such as loss of profits. The purpose of the Stata Journal is to promote

    free communication among Stata users.

    The Stata Journal, electronic version (ISSN 1536-8734) is a publication of Stata Press. Stata, , Stata

    Press, Mata, , and NetCourse are registered trademarks of StataCorp LP.

    http://www.stata.com/bookstore/sj.htmlhttp://www.stata.com/bookstore/sjj.htmlhttp://www.stata-journal.com/archives.html

  • Volume 15 Number 1 2015

    The Stata Journal

    Articles and Columns 1

    Announcement of the Stata Journal Editors’ Prize 2015 . . . . . . . . . . . . . . . . . . . . . . . . 1

    twopm: Two-part models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 3

    Implementing intersection bounds in Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .V. Chernozhukov, W. Kim, S. Lee, and A. Rosen 21

    More power through symbolic computation: Extending Stata by using the Maximacomputer algebra system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .G. L. Lo Magno 45

    Time-efficient algorithms for robust estimators of location, scale, symmetry, andtail heaviness . . . . . . . . . . . . . . . . . . . . W. Gelade, V. Verardi, and C. Vermandele 77

    Generating univariate and multivariate nonnormal data . . . . . . . . . . . . . . . . . . S. Lee 95

    Bayesian optimal interval design for phase I oncology clinical trials . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. M. Fellman and Y. Yuan 110

    Fixed-effect panel threshold model using Stata . . . . . . . . . . . . . . . . . . . . . . . . .Q. Wang 121

    Frailty models and frailty-mixture models for recurrent event times . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Y. Xu and Y. B. Cheung 135

    newspell: Easy management of complex spell data . . . . . . . . . . . . . . . . . . . . H. Kröger 155

    Estimating net survival using a life-table approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . E. Coviello, P. W. Dickman, K. Seppä, and A. Pokhrel 173

    Estimating and modeling relative survival . . . . . . . P. W. Dickman and E. Coviello 186

    A robust test for weak instruments in Stata . . . . . . . . .C. E. Pflueger and S. Wang 216

    Regression models for count data from truncated distributions . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. W. Hardin and J. M. Hilbe 226

    dynemp: A routine for distributed microdata analysis of business dynamics . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Criscuolo, P. N. Gal, and C. Menon 247

    Tools for checking calibration of a Cox model in external validation: Predictionof population-averaged survival curves based on risk groups . . . . . . P. Royston 275

    Nonparametric pairwise multiple comparisons in independent groups using Dunn’stest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A. Dinno 292

    Generating nonnegatively correlated binary random variates . . . . . . . . . . . M. Chen 301

    Review of Alan Acock’s Discovering Structural Equation Modeling Using Stata,Revised Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Williams 309

    Notes and Comments 316

    Stata tip 122: Variable bar widths in two-way graphs . . . . . . . . . . . . . . . . . . . B. Jann 316

  • 6

    Stata tip 123: Spell boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox 319

    Software Updates 324

  • The Stata Journal (2015)15, Number 1, pp. 1–2

    Announcement of the Stata Journal Editors’Prize 2015

    The editors of the Stata Journal are pleased to invite nominations for their 2015prize in accordance with the following rules. Nominations should be sent as privateemail to [email protected] by July 31, 2015.

    1. The Stata Journal Editors’ Prize is awarded annually to one or more authors ofa specified paper or papers published in the Stata Journal in the previous threeyears.

    2. The prize will consist of a framed certificate and an honorarium of U.S. $500,courtesy of the publisher of the Stata Journal. The prize may be awarded inperson at a Stata Conference or Stata Users Group meeting of the recipient’s orrecipients’ choice or as otherwise arranged.

    3. Nominations for the prize in a given year will be requested in the Stata Journalin the first issue of each year and simultaneously through announcements on theStata Journal website and on Statalist. Nominations should be sent to the edi-tors by private email to [email protected] by July 31 in that year. Therecipient(s) will be announced in the Stata Journal in the last issue of each yearand simultaneously through announcements on the Stata Journal website and onStatalist.

    4. Nominations should name the author(s) and one or more papers published in theStata Journal in the previous three years and explain why the work concerned isworthy of the prize. The precise time limits will be the annual volumes of the StataJournal, so that, for example, the prize for 2015 will be for work published in theannual volumes for 2012, 2013, or 2014. The rationale might include originality,depth, elegance, or unifying power of work; usefulness in cracking key problemsor allowing important new methodologies to be widely implemented; and clarityor expository excellence of the work. Comments on the excellence of the softwarewill also be appropriate when software was published with the paper(s). Nomina-tions might include evidence of citations or downloads or of impact either withinor outside the community of Stata users. These suggestions are indicative ratherthan exclusive, and any special or unusual merits of the work concerned may nat-urally be mentioned. Nominations may also mention, when relevant, any body oflinked work published in other journals or previously in the Stata Journal or StataTechnical Bulletin. Work on any or all of statistical analysis, data management,statistical graphics, and Stata or Mata programming may be nominated.

    c© 2015 StataCorp LP gn0063

  • 2 Announcement of the Stata Journal Editors’ Prize 2015

    5. Nominations will be considered confidential both before and after award of theprize. Neither anonymous nor public nominations will be accepted. Authors maynot nominate themselves and so doing will exclude those authors from consider-ation. The editors of the Stata Journal may not be nominated. Employees ofStataCorp may not be nominated. Such exclusions apply to any person with suchstatus at any time between January 1 of the year in question and the announce-ment of the prize. The associate editors of the Stata Journal may be nominated.

    6. The recipient(s) of the award will be selected by the editors of the Stata Journal,who reserve the right to take advice in confidence from appropriate persons, sub-ject to such persons not having been nominated themselves. The editors’ decisionis final and not open to discussion.

    Previous awards of the Prize were to David Roodman (2012); Erik Thorlund Parnerand Per Kragh Andersen (2013); and Roger Newson (2014). For full details, please seeStata Journal 12: 571–574 (2012); Stata Journal 13: 669–671 (2013); and Stata Journal14: 703–707 (2014).

    H. Joseph Newton and Nicholas J. CoxEditors, Stata Journal

  • The Stata Journal (2015)15, Number 1, pp. 3–20

    twopm: Two-part models

    Federico BelottiCentre for Economic and International Studies

    University of Rome Tor VergataRome, Italy

    [email protected]

    Partha DebHunter College and Graduate Center, CUNY

    New York, NYand National Bureau of Economic Research

    Cambridge, [email protected]

    Willard G. Manning1

    University of ChicagoChicago, IL

    Edward C. NortonUniversity of Michigan

    Ann Arbor, MIand National Bureau of Economic Research

    Cambridge, [email protected]

    Abstract. In this article, we describe twopm, a command for fitting two-partmodels for mixed discrete-continuous outcomes. In the two-part model, a binarychoice model is fit for the probability of observing a positive-versus-zero outcome.Then, conditional on a positive outcome, an appropriate regression model is fitfor the positive outcome. The twopm command allows the user to leverage thecapabilities of predict and margins to calculate predictions and marginal effectsand their standard errors from the combined first- and second-part models.

    Keywords: st0368, twopm, two-part models, cross-sectional data, predictions,marginal effects

    1 Introduction

    Many outcomes (yi) in empirical analyses are mixed discrete-continuous random vari-ables. They have two basic statistical features: 1) yi ≥ 0, and 2) yi = 0 is observedoften enough that there are compelling substantive and statistical reasons for specialtreatment. In other words, because of the mass point at zero, a single index modelfor such data may not be desirable. The two-part model provides one approach to ac-count for the mass of zeros. In the two-part model, a binary choice model is fit for the

    1. Willard G. Manning passed away in November 2014.

    c© 2015 StataCorp LP st0368

  • 4 twopm: Two-part models

    probability of observing a positive-versus-zero outcome. Then, conditional on a positiveoutcome, an appropriate regression model is fit for the positive outcome. In this article,we describe the command twopm, which can be used to conveniently fit two-part modelsand calculate predictions and marginal effects.

    The two-part model has a long history. Since the 1970s, meteorologists have used ver-sions of a two-part model for rainfall (Cole and Sherriff 1972; Todorovic and Woolhiser1975; Katz 1977). Economists also used two-part models in the 1970s. Cragg (1971)developed the two-part model as an extension of the tobit model. The two-part modelbecame widely used in health economics and health services research after a team atRAND Corporation used it to model health care expenditures in the context of the HealthInsurance Experiment (Duan et al. 1984) (see Mihaylova et al. [2011] for more on thewidespread use of the two-part model for health care cost data). Two-part models arealso appropriate for other mixed discrete-continuous outcomes such as household-levelconsumption of food items and other consumables.

    The two-part model has a commonly used counterpart for count data called the“hurdle” model (see Cameron and Trivedi [2013]; Jones [1989]; and Hilbe [2005]). Weuse the term “two-part” model to distinguish models for continuous outcomes frommodels for count data. Hilbe (2005) provides a command for hurdle models for countdata.

    The Heckman selection model (Heckman 1979), also referred to as the adjustedor generalized tobit (Amemiya 1985; Maddala 1983), is a multiple-index model thatcan also be fit as an alternative to the two-part model for mixed discrete-continuousoutcomes. However, there are conceptual and statistical differences between the twomodels, and these have been debated extensively in the literature (see Poirier and Ruud[1981]; Duan et al. [1984]; Hay and Olsen [1984]; Manning, Duan, and Rogers [1987];Hay, Leu, and Rohrer [1987]; Leung and Yu [1996]; and Dow and Norton [2003]).

    A few points are important to reiterate here. First, despite their superficial similarity,the two-part model should not be viewed as being nested within the Heckman selectionmodel and equivalent when there is no selection on unobservables. The two-part modeldoes not make any assumption about the correlation between the errors of the binaryand continuous equations. Second, from a conceptual standpoint, the zeros in theHeckman selection model denote censored values of the positive outcome, while zeros inthe two-part model are true zeros. Third, Monte Carlo evidence shows that when thedata are generated from the generalized tobit model without exclusion restrictions toidentify the “zeros” equation, the two-part model generally produces better estimatesof the conditional mean and of marginal effects than the correctly specified generalizedtobit model: the reason is that the correlation parameter is very poorly identified.When data are generated from a generalized tobit with an exclusion restriction, thetwo-part model estimates of the conditional mean and marginal effects are not muchworse than those obtained from the generalized tobit model. Because there are usuallyfew situations in which exclusion restrictions distinguish the “zeros” equation from the“positives” equation, assuming that the analyst is interested in estimates of E(y|x) andof ∂E(y|x)/∂x, the two-part model is almost always an adequate (if not superior on

  • F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 5

    precision grounds) way to model mixed discrete-continuous outcomes if there are noexclusion restrictions.

    The twopm package has several advantages compared with estimating the parame-ters of each part separately. First, it incorporates svy:, so it can adjust for complexsurvey design in the parameter estimates and the standard errors of those estimates.Complex survey design is common in large surveys; ignoring the survey structure canlead to biased estimates of population parameters. Second, it is easy to conduct jointstatistical tests of parameters from both parts of the two-part model. Sometimes, itis appropriate to conduct a test of the joint significance of a variable that appears inboth parts of the model. Third, it is easy to recover overall predicted values of thedependent variable and marginal effects for the combined model using the postestima-tion commands predict and margins. Note that these predicted values will be for theentire sample, as opposed to predictions based on the second (conditional) part of themodel, which would typically be for the conditional sample of those with positive values.Fourth, our program produces estimates of predictions on the y scale (the raw scale),incorporating appropriate retransformation from the estimation scales when ln(y) is re-gressed using ordinary least squares (OLS) in the second part. Fifth, it automaticallycomputes standard errors of predicted values and marginal effects and accounts for bothparts of the model, any complex survey design, and robust standard errors based on thedelta method. In terms of the amount of effort saved by the user, this is perhaps themost important feature of the twopm command. However, standard errors for marginsand marginal effects in the model that require retransformation must be obtained viabootstrap methods.

    2 Two-part models

    A two-part model is a flexible statistical model specifically designed to deal with limiteddependent variables. The distinguishing feature of these variables is that the range ofvalues they may assume has a lower bound occurring in a fair number of observations.The basic framework is as follows. Suppose that there is an event that may or may notoccur. When it does occur, one observes a positive random variable. When it does not,the observed outcome takes a zero value, thus becoming a zero-censored variable. Forinstance, in explaining individual annual health expenditure, the event is represented bya specific disease. If the illness occurs, then some not-for-free treatment will be needed,and a positive expense will be observed. In these situations, a two-part model allowsthe censoring mechanism and the outcome to be modeled to use separate processes. Inother words, it permits the zeros and nonzeros to be generated by different densities asa special type of mixture model. The zeros are typically handled using a model for theprobability of a positive outcome,

    φ(y > 0) = Pr(y > 0|x) = F (xδ)where x is a vector of explanatory variables, δ is the corresponding vector of parametersto be estimated, and F is the cumulative distribution function of an independent and

  • 6 twopm: Two-part models

    identically distributed error term, typically chosen to be from extreme value (logit) ornormal (probit) distributions. For the positives, the model is usually represented as

    φ(y|y > 0,x) = g(xγ)where x is a vector of explanatory variables, γ is the corresponding vector of parametersto be estimated, and g is an appropriate density function for y|y > 0. The likelihoodcontribution for an observation can be written as

    φ(y) = {1− F (xδ)}i(i=0) × {F (xδ)g(xγ)}i(y>0)

    where i(.) denotes the indicator function. Then, the log-likelihood contribution is

    ln{φ(y)} = i(i = 0)ln{1− F (xδ)}+ i(i > 0)[ln{F (xδ)}+ ln{g(xγ)}]Because the δ and γ parameters are additively separable in the log-likelihood contribu-tion for each observation, the models for the zeros and the positives can be estimatedseparately.

    Note that the overall mean can be written as the product of expectations from thefirst and second parts of the model, as follows:

    E(y|x) = Pr(y > 0|x)× E(y|y > 0,x)This is derived from the first principles of statistics decomposition of a joint distribu-tion into marginal and conditional distributions. It is always true, with or withoutseparability or specific F and g(·).

    Estimating the parameters of the two-part model is straightforward. The threshold,Pr(y > 0|x), is modeled using a regression model for binary outcomes such as theprobit or logit. The positives, E(y|y > 0,x) or g(y|y > 0,x), where g(·) denotes adensity function, are modeled using a regression framework for a continuous outcome;for example, they can be modeled using OLS regression or a generalized linear model(GLM). The second part is commonly modeled by OLS regression, with or without atransformation applied to y|y > 0. It is straightforward to use OLS regression specifiedas y = xγ+ε to estimate the second part. But, in many applications, and ubiquitous inthe health economics and health services literature, the second part is specified as OLSregression of ln(y|y > 0,x) written as ln(y) = xγ + ε. In that case, if ε is independentand identically normally distributed, then

    E(y|y > 0,x) = exγ × e0.5σ2 (1)where σ2 is the variance of the distribution of ε; that is, it is the variance of the erroron the log scale. If ε is not normally distributed but it is homoskedastic, then Duan(1983) showed that

    E(y|y > 0,x) = exγ × E (eε) (2)More recently, researchers have used the GLM framework (McCullagh and Nelder 1989)to model (y|y > 0,x) using a nonlinear transformation of a linear index function directly.Then

    E(y|y > 0,x) = g−1(xγ)

  • F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 7

    where g is the link function in the GLM. Other approaches such as regressions withBox–Cox transformations and quantile regressions may also be used (not available intwopm).

    The error terms in the two equations do not need to be independent to get consistentestimates of the parameters δ and γ. There is a misconception, especially in the earlyliterature, that the two-part model assumes independence of binary outcomes and isconditional on positive, continuous outcomes. Also note that in the description above,the vector of covariates x is the same in both parts of the model. Although this is likelyin most applications, sometimes, there may be legitimate theoretical (conceptual) orstatistical reasons for using different independent variables in the two equations. Forcompleteness, twopm has a syntax that allows for different covariates in each equation,but we do not generally recommend its use without appropriate justification.

    Predictions of yi, (ŷi|xi) can be constructed by multiplying predictions from eachpart of the model, observation by observation; that is,

    ŷi|xi = (p̂i|xi)× (ŷi|yi > 0,xi) (3)

    where p̂i|xi is the predicted probability that yi > 0. Predictions for each part, confidenceintervals for those predictions, and marginal effects of covariates on the outcomes ineach part can be computed with existing commands. While one can construct overallpredictions and marginal effects with a few lines of code, twopm makes it very easyto calculate them with the standard postestimation commands predict and margins.Unless retransformation is required, predict and margins produce standard errors ofthese predictions or marginal effects by using the delta method. When postestimationretransformation is required, bootstrap can be used with predict and margins toobtain standard errors.

    Note that margins calls the prediction programs associated with the estimationcommand; that is, using margins following twopm calls predict, which in turn calls ourprogram to calculate predictions of y based on (3).

    3 The twopm command

    twopm fits two-part models with logit and probit specifications for the first part andOLS [on y and on ln(y)] and GLM regression for the second part. twopm can be specifiedusing one of two syntaxes. The first syntax automatically specifies the same regressors(and functional forms in the index) in the first and second parts and is generally rec-ommended. The second syntax allows the user to specify different regressors in the firstand second parts. Although not generally recommended, there may be theoretically orstatistically motivated situations where such a model may be applicable.

  • 8 twopm: Two-part models

    3.1 Syntax

    The syntax for using twopm with specification of the same regressors in the first andsecond parts is

    twopm depvar[indepvars

    ] [if] [

    in] [

    weight], firstpart(f options)

    secondpart(s options)[vce(vcetype) robust cluster(clustvar) suest

    level(#) nocnsreport display options]

    Syntax for using twopm with specification of different regressors in the first andsecond parts is

    twopm equation1 equation2[if] [

    in] [

    weight], firstpart(f options)

    secondpart(s options)[vce(vcetype) robust cluster(clustvar) suest

    level(#) nocnsreport display options]

    where equation1 and equation2 are specified as

    (depvar[=] [

    indepvars])

    Note that indepvars may contain factor variables, and depvar and indepvars maycontain time-series operators. iweights, aweights, and pweights are allowed. twopmmay be used with the svy: and bootstrap prefixes.

    3.2 Options

    firstpart(f options) specifies the first part of the model for a binary outcome. Itshould be logit or probit. Each can be specified with its options except vce(),which should be specified as a twopm option. See the manual entries for [R] logitand [R] probit. firstpart() is required.

    secondpart(s options) specifies the second part of the model for a positive outcome.It should be regress or glm. Each can be specified with its options except vce(),which should be specified as a twopm option. See the manual entries for [R] regressand [R] glm. secondpart() is required.

    vce(vcetype) specifies the type of standard error reported, including types that arederived from asymptotic theory, that are robust to some kinds of misspecification,that allow for intragroup correlation, and that use bootstrap or jackknife methods;see [R] vce option.

    vce(conventional), the default, uses the conventionally derived variance estimatorsfor the first and second part of the model.

  • F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 9

    Note that options related to the variance estimators for both parts must be specifiedusing vce(vcetype) in the twopm syntax. Specifying vce(robust) is equivalent tospecifying vce(cluster clustvar).

    robust is the synonym for vce(robust).

    cluster(clustvar) is the synonym for vce(cluster clustvar).

    suest combines the estimation results of the first and second parts of the model toderive a simultaneous (co)variance matrix of the sandwich or robust type. Typicalapplications of suest are tests for cross-part hypotheses using test or testnl.

    level(#); see [R] estimation options.

    nocnsreport; see [R] estimation options.

    display options : noomitted, vsquish, noemptycells, baselevels, allbaselevels;see [R] estimation options.

    3.3 Postestimation

    predict[type

    ]newvar

    [if] [

    in],[ {normal | duan} scores nooffset ]

    and

    predict[type

    ] {stub* |newvar1 . . . newvarq} [ if ] [ in ], scorescalculate predicted values or estimates of E(y|x) and equation-level scores, respectively.While the first syntax is available both in and out of sample, type predict . . . ife(sample) if predictions are wanted only for the estimation sample and if the secondsyntax for equation-level scores is restricted to the estimation sample. For predictedvalues estimated after the second-part regression of ln(y|y > 0), the following optionsare available:

    normal uses normal theory retransformation to obtain fitted values. Either normal orduan must be specified when a linear regression of the log of the second-part outcomeis estimated.

    duan uses Duan’s (1983) smearing retransformation to obtain fitted values. Eithernormal or duan must be specified when a linear regression of the log of the second-part outcome is estimated.

    scores creates a score variable for each part in the model. Because the score for thesecond part of the model makes sense only for the estimation subsample (whereY > 0), the calculation is automatically restricted to the estimation subsample.

    nooffset specifies that the calculation should be made ignoring any offset or exposurevariable specified when fitting the model. This may be used with most statistics.

    If neither the offset(varname) option nor the exposure(varname) option is spec-ified when fitting the model, specifying nooffset does nothing.

  • 10 twopm: Two-part models

    4 Examples

    We show two examples of two-part models for total annual health care expendituresusing the medical expenditure panel survey 2004 data. We use two common versions ofthe two-part model to estimate predicted values of total expenditures and to calculatemarginal or incremental effects of age and gender. In the first example, we fit a probitmodel in the first part and a GLM with the log link and gamma distribution for the secondpart. In the second example, we fit a logit model in the first part and an OLS regressionwith a logged dependent variable for the second part. We limit the covariates to justage and gender. The twopm command is compatible with complex survey commands,so after reading in the data, we set up the data for survey commands using svyset.

    . * Use MEPS data on health care expenditures

    . use http://www.econometrics.it/stata/data/meps_ashe_subset5(MEPS04 date with edits)

    . svyset [pweight=wtdper], strata(varstr) psu(varpsu)

    pweight: wtdperVCE: linearized

    Single unit: missingStrata 1: varstr

    SU 1: varpsuFPC 1:

    After adjusting for the complex survey design, we see that the mean of health careexpenditures is $3,839, with nearly 18% having a value of 0. The mean age is about 46(range from 18 to 85) and just over half of participants are women.

    . * Summarize data

    . svy: mean exp_tot age female(running mean on estimation sample)

    Survey: Mean estimation

    Number of strata = 203 Number of obs = 19386Number of PSUs = 448 Population size = 187973715

    Design df = 245

    LinearizedMean Std. Err. [95% Conf. Interval]

    exp_tot 3838.939 99.94525 3642.078 4035.801age 45.79115 .2293769 45.33935 46.24295

    female .5201957 .0031165 .5140571 .5263343

    4.1 Probit with GLM with log link and gamma distribution

    Here we provide the command to estimate the parameters of the two-part model witha probit in the first part and a GLM with the log link and gamma distribution in thesecond part, taking into account the complex survey design.

  • F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 11

    . * Two-part model, with probit first part and GLM second part

    . svy: twopm exp_tot c.age i.female, firstpart(probit)> secondpart(glm, family(gamma) link(log))(running twopm on estimation sample)

    Survey data analysis

    Number of strata = 203 Number of obs = 19386Number of PSUs = 448 Population size = 187973715

    Design df = 245F( 2, 244) = 671.26Prob > F = 0.0000

    Linearizedexp_tot Coef. Std. Err. t P>|t| [95% Conf. Interval]

    probitage .0250999 .000793 31.65 0.000 .0235379 .0266618

    1.female .564196 .0271783 20.76 0.000 .5106631 .6177289_cons -.2386055 .0389997 -6.12 0.000 -.3154229 -.1617881

    glmage .0287867 .0012973 22.19 0.000 .0262314 .0313421

    1.female .1995253 .0538871 3.70 0.000 .0933842 .3056665_cons 6.80357 .086506 78.65 0.000 6.63318 6.97396

    The estimated coefficients for age and female are positive in both parts and statis-tically significant at the 1% level. Both the probability of spending and the amount ofspending conditional on any spending increase with age. Women are more likely thanmen to spend at least $1, and, conditional on spending any amount, they are more likelyto spend more than men. In this simple example, we have not controlled for or testedfor heteroskedasticity.

    We can use the margins command as a postestimation command to predict the totalspending. The predicted total spending is about $3,870 per person per year, which isrelatively close to the actual average of $3,839.

    . * Overall conditional mean

    . margins

    Predictive margins Number of obs = 19386Model VCE : Linearized

    Expression : twopm combined expected values, predict()

    Delta-methodMargin Std. Err. z P>|z| [95% Conf. Interval]

    _cons 3870.714 94.98674 40.75 0.000 3684.544 4056.885

    Next, we show the marginal (or incremental) effects for the combined probit andGLM version of the two-part model. The marginal effect of age averages $128 per yearof age, and women spend more than men by about $1,140. Note that if a covariate hadopposite signs in each part of the model, then it would be possible for the joint testof significance of the coefficients to be statistically significant, along with the overallmarginal effect being insignificant (although that is not the case here).

  • 12 twopm: Two-part models

    . * Marginal effects, averaged over the sample

    . margins, dydx(*)

    Average marginal effects Number of obs = 19386Model VCE : Linearized

    Expression : twopm combined expected values, predict()dy/dx w.r.t. : age 1.female

    Delta-methoddy/dx Std. Err. z P>|z| [95% Conf. Interval]

    age 127.8325 6.372966 20.06 0.000 115.3417 140.32321.female 1139.541 186.7794 6.10 0.000 773.4597 1505.621

    Note: dy/dx for factor levels is the discrete change from the base level.

    Because the marginal effects vary over the life course, we computed marginal effectsconditional at four ages (20, 40, 60, and 80). When we calculate the marginal effectsover the life course, we see that the marginal effects of both age and gender increase withage. For example, although women spend more than men at all ages, this difference ismuch greater for elderly women than for young women. This is due to the assumed loglink in GLM, even with a simple linear specification of age.

    . * Marginal effects at different ages

    . margins, dydx(*) at(age=(20(20)80))

    Conditional marginal effects Number of obs = 19386Model VCE : Linearized

    Expression : twopm combined expected values, predict()dy/dx w.r.t. : age 1.female

    1._at : age = 20

    2._at : age = 40

    3._at : age = 60

    4._at : age = 80

    Delta-methoddy/dx Std. Err. z P>|z| [95% Conf. Interval]

    age_at1 51.35857 1.357531 37.83 0.000 48.69786 54.019292 95.64313 3.140771 30.45 0.000 89.48733 101.79893 169.311 9.60291 17.63 0.000 150.4896 188.13244 295.7016 24.66708 11.99 0.000 247.355 344.0482

    1.female_at1 589.6436 60.45588 9.75 0.000 471.1522 708.13492 942.5697 127.344 7.40 0.000 692.9801 1192.1593 1431.437 260.9784 5.48 0.000 919.9285 1942.9454 2228.771 505.6082 4.41 0.000 1237.797 3219.745

    Note: dy/dx for factor levels is the discrete change from the base level.

  • F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 13

    It is often of interest to know whether a covariate is jointly significant in both partsof the two-part model. In this example, age and gender are statistically significant ineach part, so it is no surprise that they are each jointly significant in both parts.

    . * Test whether coefficients on interaction terms are jointly zero

    . test age

    Adjusted Wald test

    ( 1) [probit]age = 0( 2) [glm]age = 0

    F( 2, 244) = 803.99Prob > F = 0.0000

    . test 1.female

    Adjusted Wald test

    ( 1) [probit]1.female = 0( 2) [glm]1.female = 0

    F( 2, 244) = 226.39Prob > F = 0.0000

    When twopm is used together with the svy: prefix (and the default option for the(co)variance matrix vce(linearized)), a simultaneous “linearized” (co)variance matrixof the sandwich or robust type is automatically estimated. This ensures that hypothesesinvolving parameters across both parts can be correctly tested with test or testnl.When estimation is performed without the svy: prefix and cross-part hypotheses areof interest, we suggest using the suest option within twopm. This option produces asimultaneous (co)variance matrix of the sandwich or robust type; thus test (or testnl)will use the correct formula to perform the Wald test (see [R] suest).

    4.2 Logit with OLS with logged dependent variable

    Next, we provide an example using another common model, the two-part model withlogit in the first part and OLS with log-transformed y in the second part. For theretransformation to the raw scale, we do not impose the restrictive assumption that thelog-scale errors have a normal distribution. This assumption is often wrong and can leadto widely biased estimates of the conditional mean and marginal effects. Instead, weuse Duan’s (1983) smearing estimator. The twopm command automatically calculatesthe smearing estimate for use in postestimation commands.

    In this example, we do not control for complex survey design. When one usesbootstrapping (which is necessary in this model with retransformation), the simple wayof bootstrapping is incorrect. Here we focus on the importance of bootstrapping toaccount for the uncertainty in the estimated retransformation parameter.

  • 14 twopm: Two-part models

    . * Two-part model, with logit first part and OLS second part

    . twopm exp_tot c.age i.female, firstpart(logit) secondpart(regress, log)

    Fitting logit regression for first part:

    Iteration 0: log likelihood = -9062.9759Iteration 1: log likelihood = -8139.4972Iteration 2: log likelihood = -8062.7898Iteration 3: log likelihood = -8062.5899Iteration 4: log likelihood = -8062.5899

    Fitting OLS regression for second part:

    Two-part model

    Log pseudolikelihood = -37216.38 Number of obs = 19386

    Part 1: logit

    Number of obs = 19386LR chi2(2) = 2000.77Prob > chi2 = 0.0000

    Log likelihood = -8062.5899 Pseudo R2 = 0.1104

    Part 2: regress_log

    Number of obs = 15946F( 2, 15943) = 1490.33Prob > F = 0.0000R-squared = 0.1575Adj R-squared = 0.1574

    Log likelihood = -29153.79 Root MSE = 1.5060

    exp_tot Coef. Std. Err. z P>|z| [95% Conf. Interval]

    logitage .047287 .0013987 33.81 0.000 .0445456 .0500284

    female1 .9684718 .0404988 23.91 0.000 .8890957 1.047848

    _cons -.8706272 .0597288 -14.58 0.000 -.9876934 -.7535609

    regress_logage .0358123 .000678 52.82 0.000 .0344835 .0371412

    female1 .3511679 .0242542 14.48 0.000 .3036305 .3987054

    _cons 5.329011 .037319 142.80 0.000 5.255867 5.402155

    As before, the estimated coefficients for age and female are positive in both partsand statistically significant at the 1% level. The z statistics are similar in the logit andprobit models, as expected. Again both the probability of spending and the amount ofspending conditional on any spending increase with age. Women are more likely thanmen to spend at least $1, and (conditional on spending any amount) they spend more.Again we have not controlled for or tested for heteroskedasticity.

    The predicted total expenditures from this model are considerably higher than inthe model with probit and GLM. The predicted total expenditures are about $4,090 per

  • F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 15

    person per year, which is far higher than the actual average. This calculation uses Duan(1983) smearing as part of the retransformation of the second part.

    . * Overall conditional mean

    . margins, predict(duan) postWarning: cannot perform check for estimable functions.

    Predictive margins Number of obs = 19386

    Expression : twopm combined expected values, predict(duan)

    Delta-methodMargin Std. Err. z P>|z| [95% Conf. Interval]

    _cons 4090.519 59.4288 68.83 0.000 3974.041 4206.998

    Alternatively, we could have created a variable for the conditional mean for each obser-vation using predict yhat duan, duan.

    Note that margins does not produce the correct standard errors for estimates whenusing retransformation. More specifically, while margins takes the uncertainty of pa-rameter estimates into account in the index function for each part of the model, it doesnot account for estimation of σ2 in (1) or E(eε) in (2). Although the margins com-mand automatically computes the unconditional marginal effects after running twopm,the default delta-method standard errors are incorrect and will generally be too small.Therefore, after fitting a log OLS model in the second part, one must calculate standarderrors and confidence intervals for margins using a nonparametric bootstrap.

  • 16 twopm: Two-part models

    The following is a simple program to bootstrap the standard errors for margins:

    . * Overall conditional mean

    . capture program drop Ey_boot

    . program define Ey_boot, eclass1. twopm exp_tot c.age i.female, firstpart(logit) secondpart(regress, log)2. margins, predict(duan) nose post3. end

    . bootstrap _b, seed(14) reps(1000): Ey_boot(running Ey_boot on estimation sample)

    Bootstrap replications (1000)1 2 3 4 5

    .................................................. 50

    .................................................. 100

    .................................................. 150

    .................................................. 200

    .................................................. 250

    .................................................. 300

    .................................................. 350

    .................................................. 400

    .................................................. 450

    .................................................. 500

    .................................................. 550

    .................................................. 600

    .................................................. 650

    .................................................. 700

    .................................................. 750

    .................................................. 800

    .................................................. 850

    .................................................. 900

    .................................................. 950

    .................................................. 1000

    Predictive margins Number of obs = 19386Replications = 1000

    Observed Bootstrap Normal-basedCoef. Std. Err. z P>|z| [95% Conf. Interval]

    _cons 4090.519 97.54505 41.93 0.000 3899.335 4281.704

    The bootstrapped standard errors are roughly twice as large as the delta-methodstandard errors. In our experience, ignoring the uncertainty in the retransformationfactor will bias the standard errors downward by a large amount, as in this example.

    For the marginal effects, we again need to bootstrap the standard errors when usingmargins. In the two-part model with the logit and OLS with ln(y), age has a marginaleffect of about $165 per year, while female has an incremental effect of almost $1,800.

  • F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 17

    . * Marginal effects, averaged over the sample

    . capture program drop dydx_boot

    . program define dydx_boot, eclass1. twopm exp_tot c.age i.female, firstpart(logit) secondpart(regress, log)2. margins, dydx(*) predict(duan) nose post3. end

    . bootstrap _b, seed(14) reps(1000): dydx_boot(running dydx_boot on estimation sample)

    Bootstrap replications (1000)1 2 3 4 5

    .................................................. 50

    .................................................. 100

    .................................................. 150

    .................................................. 200

    .................................................. 250

    .................................................. 300

    .................................................. 350

    .................................................. 400

    .................................................. 450

    .................................................. 500

    .................................................. 550

    .................................................. 600

    .................................................. 650

    .................................................. 700

    .................................................. 750

    .................................................. 800

    .................................................. 850

    .................................................. 900

    .................................................. 950

    .................................................. 1000

    Average marginal effects Number of obs = 19386Replications = 1000

    Observed Bootstrap Normal-basedCoef. Std. Err. z P>|z| [95% Conf. Interval]

    age 165.6376 5.193646 31.89 0.000 155.4583 175.8171.female 1784.333 96.14553 18.56 0.000 1595.892 1972.775

    The different results demonstrate that the model used does matter. However, with-out further testing, it is unclear which model performs better in a statistical sense. Webelieve that using the two-part model can make a substantial difference, as can theretransformation approach for ln(y) models, as Duan (1983) showed. Both are likelysources of the differences between estimates in our examples.

    5 Discussion

    This version of twopm considers only a subset of two-part models where the positiveoutcomes are continuous. It does not deal with discrete or count outcomes. twopmallows for modeling of the second part using OLS (regress) or GLM (glm) but notnumerous other plausible models for continuous outcomes, such as regressions withBox–Cox transformations (boxcox), quantile regressions (qreg), and other approachesavailable in user-written packages.

  • 18 twopm: Two-part models

    The two-part model is typically specified using the same set of covariates in bothparts, and this is how we have specified our examples. However, this restriction isgenerally not required for all two-part model applications. The issue is not just about thesame variables appearing in each part; model selection (with suitable safeguards againstoverfitting) may suggest different functional forms for variables in the index functions.For example, income may be either income or income and income2, or ln(income).Alternatively, in our example, we used age and female, but a more adequate functionmay involve interactions and polynomials, which could vary by model part. One canstill obtain marginal effects of age and female without restricting the functional formto be the same.

    When the second part of the two-part model is modeled using OLS regression of ln(y),

    a retransformation is required to go from l̂n(y) to ŷ. twopm provides retransformationsbased on homoskedastic, normally distributed errors and a nonparametric approach byDuan (1983) that also assumes homoskedastic errors. But heteroskedasticity is commonin this context, and the retransformations based on homoskedastic errors are not con-sistent. Because of the complexity of dealing with heteroskedastic retransformations,we have not allowed for this possibility. We suggest users consider the gamma GLMwith log link as an alternative for consistent estimation of coefficients, predictions, andmarginal effects.

    As with all estimation approaches, we suggest checking the specification of the two-part model to see whether the specification is appropriate for the given data. The fitfor each of two equations for the probability of any use or expenditure and the levelof use or expenditures can be assessed with conventional tests and approaches in theliterature as well as with link (Pregibon 1980) and regression-equation specificationerror tests (Ramsey 1969). But the overall fit of the two parts combined has a morelimited set of checks available. The twopm postestimation commands provide predictionsthat can be used to calculate various tests, including the modified Hosmer–Lemeshowtest (Hosmer and Lemeshow 1980) and the Pearson correlation test as implemented inManning, Basu, and Mullahy (2005).

    6 Acknowledgments

    We thank Vincenzo Atella for stimulating discussions and thank Tom Weichle, AndreaPiano Mortari, Joanna Kopinska, and Valentina Conti for giving twopm a thoroughworkout and for offering insightful suggestions. We also thank an anonymous refereefor comments that improved the article.

    7 ReferencesAmemiya, T. 1985. Advanced Econometrics. Cambridge: Harvard University Press.

    Cameron, A. C., and P. K. Trivedi. 2013. Regression Analysis of Count Data. 2nd ed.Cambridge: Cambridge University Press.

  • F. Belotti, P. Deb, W. G. Manning, and E. C. Norton 19

    Cole, J. A., and J. D. F. Sherriff. 1972. Some single- and multi-site models of rainfallwithin discrete time increments. Journal of Hydrology 17: 97–113.

    Cragg, J. G. 1971. Some statistical models for limited dependent variables with appli-cation to the demand for durable goods. Econometrica 39: 829–844.

    Dow, W. H., and E. C. Norton. 2003. Choosing between and interpreting the Heckitand two-part models for corner solutions. Health Services and Outcomes ResearchMethodology 4: 5–18.

    Duan, N. 1983. Smearing estimate: A nonparametric retransformation method. Journalof the American Statistical Association 78: 605–610.

    Duan, N., W. G. Manning, Jr., C. N. Morris, and J. P. Newhouse. 1984. Choosingbetween the sample-selection model and the multi-part model. Journal of Businessand Economic Statistics 2: 283–289.

    Hay, J. W., R. Leu, and P. Rohrer. 1987. Ordinary least squares and sample-selectionmodels of health-care demand: Monte Carlo comparison. Journal of Business andEconomic Statistics 5: 499–506.

    Hay, J. W., and R. J. Olsen. 1984. Let them eat cake: A note on comparing alternativemodels of the demand for medical care. Journal of Business and Economic Statistics2: 279–282.

    Heckman, J. J. 1979. Sample selection bias as a specification error. Econometrica 47:153–161.

    Hilbe, J. 2005. hplogit: Stata module to estimate Poisson-logit hurdle regression. Sta-tistical Software Components S456405, Department of Economics, Boston College.http://ideas.repec.org/c/boc/bocode/s456405.html.

    Hosmer, D. W., Jr., and S. Lemeshow. 1980. Goodness of fit tests for the multiplelogistic regression model. Communications in Statistics—Theory and Methods 9:1043–1069.

    Jones, A. M. 1989. A double-hurdle model of cigarette consumption. Journal of AppliedEconometrics 4: 23–39.

    Katz, R. W. 1977. Precipitation as a chain-dependent process. Journal of AppliedMeteorology 16: 671–676.

    Leung, S. F., and S. Yu. 1996. On the choice between sample selection and two-partmodels. Journal of Econometrics 72: 197–229.

    Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics.Cambridge: Cambridge University Press.

    Manning, W. G., A. Basu, and J. Mullahy. 2005. Generalized modeling approaches torisk adjustment of skewed outcomes data. Journal of Health Economics 24: 465–488.

  • 20 twopm: Two-part models

    Manning, W. G., N. Duan, and W. H. Rogers. 1987. Monte Carlo evidence on the choicebetween sample selection and two-part models. Journal of Econometrics 35: 59–82.

    McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. 2nd ed. London:Chapman & Hall/CRC.

    Mihaylova, B., A. Briggs, A. O’Hagan, and S. G. Thompson. 2011. Review of statisticalmethods for analysing healthcare resources and costs. Health Economics 20: 897–916.

    Poirier, D. J., and P. A. Ruud. 1981. On the appropriateness of endogenous switching.Journal of Econometrics 16: 249–256.

    Pregibon, D. 1980. Goodness of link tests for generalized linear models. Journal of theRoyal Statistical Society, Series C 29: 15–23.

    Ramsey, J. B. 1969. Tests for specification errors in classical linear least-squares regres-sion analysis. Journal of the Royal Statistical Society, Series B 31: 350–371.

    Todorovic, P., and D. A. Woolhiser. 1975. A stochastic model of n-day precipitation.Journal of Applied Meteorology 14: 17–24.

    About the authors

    Federico Belotti is a researcher at the Centre for Economics and International Studies of theUniversity of Rome Tor Vergata.

    Partha Deb is a professor of economics at Hunter College and the Graduate Center, CUNY,and a research associate at the National Bureau of Economic Research.

    Will Manning was an influential health economist, starting with his pioneering work on theRAND Health Insurance Experiment in the 1970s. With others at RAND, he advocated movingaway from tobit and sample-selection models to deal with distributions of dependent variablesthat had a large mass at zero. The two-part model, in all its forms, is now the dominant modelfor health care costs and expenditures. He continued to push the field of health econometricsby helping to develop new methods and advocating the work of others who have found betterways of modeling health care and its costs. His passing is a great loss to the profession and toso many of us personally.

    Edward C. Norton is a professor in the Department of Health Management and Policy and inthe Department of Economics at the University of Michigan, and he is a research associate atthe National Bureau of Economic Research.

  • The Stata Journal (2015)15, Number 1, pp. 21–44

    Implementing intersection bounds in Stata

    Victor ChernozhukovMassachusetts Institute of Technology

    Cambridge, [email protected]

    Wooyoung KimUniversity of Wisconsin–Madison

    Madison, [email protected]

    Sokbae LeeInstitute for Fiscal Studies and

    Seoul National UniversitySeoul, Korea

    [email protected]

    Adam M. RosenUniversity College London and

    Centre for Microdata Methods and PracticeLondon, UK

    [email protected]

    Abstract. We present the clrbound, clr2bound, clr3bound, and clrtest com-mands for estimation and inference on intersection bounds as developed by Cher-nozhukov, Lee, and Rosen (2013, Econometrica 81: 667–737). The intersectionbounds framework encompasses situations where a population parameter of inter-est is partially identified by a collection of consistently estimable upper and lowerbounds. The identified set for the parameter is the intersection of regions definedby this collection of bounds. More generally, the methodology can be appliedto settings where an estimable function of a vector-valued parameter is boundedfrom above and below, as is the case when the identified set is characterized byconditional moment inequalities.

    The commands clrbound, clr2bound, and clr3bound provide bound estimatesthat can be used directly for estimation or to construct asymptotically validconfidence sets. clrtest performs an intersection bound test of the hypothesisthat a collection of lower intersection bounds is no greater than zero. The com-mand clrbound provides bound estimates for one-sided lower or upper intersectionbounds on a parameter, while clr2bound and clr3bound provide two-sided boundestimates using both lower and upper intersection bounds. clr2bound uses Bon-ferroni’s inequality to construct two-sided bounds that can be used to performasymptotically valid inference on the identified set or the parameter of interest,whereas clr3bound provides a generally tighter confidence interval for the pa-rameter by inverting the hypothesis test performed by clrtest. More broadly,inversion of this test can also be used to construct confidence sets based on condi-tional moment inequalities as described in Chernozhukov, Lee, and Rosen (2013).The commands include parametric, series, and local linear estimation procedures.

    Keywords: st0369, clrbound, clr2bound, clr3bound, clrtest, intersection bounds,bound analysis, conditional moments, partial identification, infinite dimensionalconstraints, adaptive moment selection

    1 Introduction

    In this article, we present the clrbound, clr2bound, clr3bound, and clrtest com-mands for estimation and inference on intersection bounds as developed by Cher-

    c© 2015 StataCorp LP st0369

  • 22 Intersection bounds in Stata

    nozhukov, Lee, and Rosen (2013). These commands, summarized in table 1, enableone to perform hypothesis tests and construct set estimates and asymptotically validconfidence sets for parameters restricted by intersection bounds. The procedures useparametric, series, and local linear estimators, and they can be used to conduct in-ference on parameters restricted by conditional moment inequalities. The inferencemethod developed by Chernozhukov, Lee, and Rosen (2013) uses sup-norm test statis-tics. There are many related articles in the literature that develop alternative methodsfor inference with conditional moment inequalities, such as Andrews and Shi (2013,2014), Armstrong (2015, 2014), Armstrong and Chan (2013), Chetverikov (2011), andLee, Song, and Whang (2013a,b).

    Table 1. Intersection bound commands. Bound estimates can be used to constructasymptotically valid confidence intervals for parameters and identified sets restricted byintersection bounds.

    Command Description

    clrtest Test the hypothesis that the maximum of lower intersectionbounds is nonpositive.

    clrbound Compute a one-sided bound estimate.clr2bound Compute two-sided bound estimates using Bonferroni’s inequality.clr3bound Compute two-sided bound estimates by inverting clrtest.

    Our software adds to a small but growing set of publicly available software for boundestimation and inference, including Beresteanu and Manski (2000a,b) and Beresteanu,Molinari, and Steeg Morris (2010). Beresteanu and Manski (2000a,b) implement boundestimation by using kernel regression for bounds derived in the analysis of treatmentresponse, as considered by Manski (1990), Manski (1997), Manski and Pepper (2000),and others. Our software applies to a broader set of intersection bound problems,and it complements existing software by additionally providing parametric and seriesestimators as well as methods for bias correction and asymptotically valid inference.The software by Beresteanu, Molinari, and Steeg Morris (2010) can be used to replicatethe results in the work of Beresteanu and Molinari (2008) and to compute consistent setestimates for best linear prediction coefficients with interval-censored outcomes. It canalso perform inference on any pair of elements of the best linear prediction coefficientvector.

    In section 2, we recall the underlying framework of the intersection bounds set up byChernozhukov, Lee, and Rosen (2013). In section 3, we describe the details of how ourStata program conducts hypothesis tests and constructs bound estimates. In section 4,we explain how to install our command. In sections 5, 6, 7, and 8, we describe theclr2bound, clrbound, clrtest and clr3bound commands, respectively. We explainhow each command is used, what each command does, the available command optionsfor each, and the stored results. In section 9, we illustrate the use of all four of thesecommands using data from the National Longitudinal Survey of Youth of 1979 (NLSY79),

  • V. Chernozhukov, W. Kim, S. Lee, and A. Rosen 23

    as in Carneiro and Lee (2009). Specifically, we use these commands to estimate andperform inference on returns to education using monotone treatment response (MTR)and monotone instrumental variable (MIV) bounds developed by Manski and Pepper(2000).

    2 Framework

    We begin by considering a parameter of interest θ∗, which is bounded above and belowby intersection bounds of the form

    maxj∈Jl

    supxlj∈X lj

    θlj(xlj) ≤ θ∗ ≤ min

    j∈Juinf

    xuj ∈Xujθuj (x

    uj ) (1)

    where {θlj(·) : j ∈ Jl} and {θuj (·) : j ∈ Ju} are consistently estimable lower- and upper-bounding functions. X lj and X uj are known sets of values for the arguments of thesefunctions, and Jl and Ju are index sets with a finite number of positive integers. Theinterval of all values that lie within the bounds in (1) is the identified set, denoted

    ΘI ≡ (θl0, θu0 ) (2)where

    θl0 ≡ maxj∈Jl

    supxlj∈X lj

    θlj(xlj), θ

    u0 ≡ min

    j∈Juinf

    xuj ∈Xujθuj (x

    uj )

    We focus on the common case where the bounding functions θlj(·) and θuj (·) are condi-tional expectation functions, such that

    θkj (·) ≡ E(Y kj |Xkj = ·), k = l, u

    where Y kj and Xkj are the dependent variable and explanatory variables of a conditional

    mean regression for each j and k, respectively. We allow for the possibility that theexplanatory variables Xkj are different or the same across j and k.

    Many articles in the recent literature on partial identification feature bounds of theform given in (1) and (2) on a parameter of interest or on a function of a parameter of in-terest. Characterizing the asymptotic distribution of plug-in estimators for these boundsis complicated because they are the infimum and supremum of an estimated function.Moreover, using sample analogs for bound estimates is known to produce substantialfinite sample bias. The inferential methods of Chernozhukov, Lee, and Rosen (2013)overcome these problems to produce asymptotically valid confidence sets for θ∗ and forΘI and bias-corrected estimates for the upper and lower bounds of ΘI . Our approach isto first form precision-corrected estimators for the bounding functions θkj (·) for each jand k and then apply the max, sup, min, and inf operators to these precision-correctedestimators. The degree of the precision-correction is chosen to obtain bias-correctedbound estimates or bound estimates that achieve asymptotically valid inference at a de-sired level. Chernozhukov, Lee, and Rosen (2013) provide asymptotic theory for formal

  • 24 Intersection bounds in Stata

    justification and algorithms for implementing these methods. The commands describedin this article implement these algorithms in Stata.1

    Chernozhukov, Lee, and Rosen (2013) provide examples of bound characterizationsto which these methods apply. A leading example is given by the nonparametric boundsof Manski (1989, 1990) on mean treatment response and average treatment effects withinstrumental variable restrictions. So called worst-case bounds on mean treatment re-sponse θ∗ = θ∗(x) ≡ E{Y (t)|X = x} from treatment t ∈ (0, 1) conditional on vectorX = x are given by

    θl(x) ≤ θ∗(x) ≤ θu(x) (3)

    where

    θl(x) ≡ E {Y × 1(Z = t)|X = x} , θu(x) ≡ E {Y × 1(Z = t) + 1(Z �= t)|X = x}

    Here Z ∈ (0, 1) denotes the observed treatment, and Y (·) maps potential treatments tooutcomes, which are normalized to lie on the unit interval, Y (·) : {0, 1} → [0, 1]. We ob-serve outcome Y = Y (Z) but do not observe the potential outcome from the counterfac-tual treatment Y (1−Z). This causes the lack of point identification of E{Y (t)|X = x}.The width of the bounds is P (Z �= t), which is the probability that observed treatmentZ differs from t.

    Researchers are often willing to invoke instrumental variable restrictions, or level-setrestrictions as in Manski (1990), that limit the degree to which the conditional expec-tation E{Y (t)|X = x} varies with x. For instance, x may comprise two componentsx = (w, v) with component v excluded from affecting the conditional mean function, sothat

    ∀v ∈ V, E{Y (t)|X = (w, v)} = E{Y (t)|W = w}

    where V denotes the support of V . Then, with θ∗(w) := E{Y (t)|W = w} and (3)holding for x = (w, v) for any fixed w and all v ∈ V, it follows that

    supv∈V

    θl{(w, v)} ≤ θ∗(w) ≤ infv∈V

    θu{(w, v)} (4)

    which is precisely the form of (1) with singleton (and thus omitted) sets Jl and Ju,X l = X u = V, and θ∗ = θ∗(w). One can apply this reasoning to obtain upper and lowerbounds on θ∗(w) for all values of w. In section 9, we demonstrate our Stata commandswith bounds on a conditional expectation similar to those in (4) applied to data fromthe NLSY79; however, we use a MIV restriction first considered by Manski and Pepper(2000) instead of the instrumental variable restriction used above.

    The estimation problem of Chernozhukov, Lee, and Rosen (2013) is to obtain esti-

    mators θ̂ln0(p) and θ̂un0(p), which provide bias-corrected estimates or the endpoints of

    1. All of our commands require the package moremata (Jann 2005).

  • V. Chernozhukov, W. Kim, S. Lee, and A. Rosen 25

    confidence intervals, depending on the chosen value of p; for example, p = 1/2 for half-median-unbiased bound estimates, or p = 1−α for confidence intervals. By construction,these estimators satisfy

    Pn{θl0 ≥ θ̂ln0(p)

    } ≥ p− o(1), and Pn{θu0 ≤ θ̂un0(p)} ≥ p− o(1) (5)Chernozhukov, Lee, and Rosen (2013), who focus on the upper bound for θ∗, providefurther detail on implementation. They explain how the estimation procedure can beeasily adapted for the lower bound for θ∗. The command clrbound presented belowgives estimators for these one-sided intersection bounds.

    If one wishes to perform inference on the identified set, then one can use the in-tersection of upper and lower one-sided intervals each based on p̃ = (1 + p)/2 as an

    asymptotic level-p confidence set {θ̂ln0(p̃), θ̂un0(p̃)} for ΘI , which satisfies

    lim infn→∞ Pn

    [ΘI ∈

    {θ̂ln0 (p̃) , θ̂

    un0 (p̃)

    }]≥ p (6)

    by (5) and Bonferroni’s inequality. For example, to obtain a 95% confidence set forΘI , one can use upper and lower one-sided intervals each with 97.5% nominal coverageprobability. The command clr2bound, described in section 5, provides this type ofconfidence interval.

    Because θ∗ ∈ ΘI , such confidence intervals are asymptotically valid but generallyconservative for θ∗.2 Alternatively, one may consider inference on θ∗ by first transform-ing the collection of lower and upper bounds in (1) into a collection of only one-sidedbounds on a function of θ∗. Specifically, the inequalities in (1) are equivalent to

    T0(θ∗) ≡ max

    k∈{l,u}maxj∈Jk

    supxkj∈Xkj

    Tjk(xkj , θ

    ∗) ≤ 0 (7)where

    Tju(xkj , θ

    ∗) ≡ θ∗ − θuj (xkj ) , Tjl (xkj , θ∗) ≡ θlj (xkj )− θ∗ (8)For any conjectured value of θ∗, say, θnull, one can apply estimation methods fromChernozhukov, Lee, and Rosen (2013) to perform the hypothesis test

    H0 : T0(θnull) ≤ 0 vs. H1 : T0(θnull) > 0 (9)

    This is carried out by placing T0(θnull) in the role of the bounding function θl0 (·) in (1)

    to produce an estimator T̂n0 (θnull, p), such that

    Pn

    {T0 (θnull) ≥ T̂n0 (θnull, p)

    }≥ p− o (1) (10)

    2. Differences between confidence regions for an identified set ΘI and a single point θ∗ within that

    set have been well studied in the prior literature. See, for instance, Imbens and Manski (2004),Chernozhukov, Hong, and Tamer (2007), Stoye (2009), and Romano and Shaikh (2010).

  • 26 Intersection bounds in Stata

    which is analogous to the construction of θ̂ln0(p) in (5). The null hypothesis H0 is then

    rejected in favor of H1 at the 1− p significance level if T̂n0 (θnull, p) > 0. The commandclrtest, which we describe in section 7, performs such a test. When we invert this test,the set of θnull such that T̂n0 (θnull, p) ≤ 0 is an asymptotically valid level p confidenceset for θ∗ because

    lim infn→∞ Pn

    [θ∗ ∈

    {θnull : T̂n0 (θnull, p) ≤ 0

    }]≥ p (11)

    by construction. The command clr3bound, which we describe in section 8, producesprecisely this confidence set.

    3 Implementation

    In this section, we describe our implementation for estimating one-sided bounds. Wefocus on the lower intersection bounds and drop the l superscript to simplify notation.

    We let J denote the number of inequalities concerned. Suppose that we have ob-servations {(Yji, Xji) : i = 1, . . . , n, j = 1, . . . , J}, where n is the sample size. For eachj = 1, . . . , J , we let yj denote the n× 1 vector whose ith element is Yji, and we let Xjdenote the n × dj matrix whose ith row is X ′ji, where dj is the dimension of Xji. Weallow multidimensional Xj for only parametric estimation. We set dj = 1 for series andlocal linear estimation.

    To evaluate the supremum in (1) numerically, we set a dense set of grid points foreach j = 1, . . . , J , say, (x1, . . . ,xJ ), where xj = (x

    ′j1, . . . , x

    ′jMj

    )′ for some sufficientlylarge numbers Mj , and j = 1, . . . , J , where each xjm is a dj × 1 vector. We also let Ψjdenote theMj×dj matrix whosemth row is x′jm, wherem = 1, . . . ,Mj and j = 1, . . . , J .The number of grid points can be different for different inequalities.

    3.1 Parametric estimation

    To define

    X :=

    ⎛⎜⎝ X1 · · · 0...0 · · · XJ

    ⎞⎟⎠ , y :=⎛⎜⎝ y1...

    yJ

    ⎞⎟⎠ , and Ψ :=⎛⎜⎝ Ψ1 · · · 0...

    0 · · · ΨJ

    ⎞⎟⎠we let θj(xj) ≡ {θj(xj1), · · · , θj(xjMj )}′ and θ ≡ {θ1(x1)′, . . . ,θJ(xJ )′}′. Then theestimator of θ is θ̂ ≡ Ψβ̂, where β̂ = (X′X)−1X′y. Also the heteroskedasticity-robuststandard error of θ̂, say, ŝ, can be computed as

    ŝ ≡√

    diagvec(V)

    where

    Ω ={diag

    (y −Xβ̂

    )}2, V = Ψ(X′X)−1X′ΩX(X′X)−1Ψ′

  • V. Chernozhukov, W. Kim, S. Lee, and A. Rosen 27

    diag(a) is the diagonal matrix whose diagonal terms are elements of the vector a, anddiagvec(A) is the vector whose elements are diagonal elements of the matrix A.

    To obtain a precision-corrected estimate, we maximize the precision-corrected curve,which we get by multiplying the function estimate minus the critical value by the stan-dard error. To compute the critical value, say, k(p), define

    Σ̂ := {diag (ŝ)}−1 V {diag (ŝ)}−1

    Let chol(A) denote the Cholesky decomposition of the matrix A, such that

    A = chol(A)chol(A)′

    We simulate pseudorandom numbers from the N(0, 1) distribution and construct a

    dim(Σ̂)×R-dimensional matrix, say, ZR. Then the critical value is selected as

    k(p) = the pth quantile of maxcol.

    {chol

    (Σ̂)ZR

    }(12)

    where maxcol.(B) is a set of maximum values in each column of the matrix B. Then

    our bias-corrected estimator θ̂n0(p) for maxj∈Jl supxlj∈X lj θlj(x

    lj) is

    θ̂n0(p) = maxcol.

    {Ψβ̂ − k(p)ŝ

    }(13)

    The critical value in (13) is obtained under the least favorable case. To improve theestimator, we carry out the following adaptive inequality selection (AIS) procedure:

    1. Set γ̃n ≡ 1−0.1/ log n. Let ψ′k denote the kth row ofΨ, where k = 1, . . . ,∑Jj=1Mj .

    Keep each row ψ′k of Ψ if and only if

    ψ′kβ̂ ≥ θ̂n0(γ̃n)− 2k(γ̃n)ŝkwhere ŝk is the kth element of ŝ.

    2. Replace Ψ with the kept rows of Ψ in step 1. Then, recompute V and Σ̂ toupdate the critical value in (12) and obtain the final estimate θ̂n0(p) in (13) withthe updated critical value.

    3.2 Series estimation

    The implementation of series estimation is similar to that of parametric estimation.For each j = 1, . . . , J , we let pnj(x) ≡ {pn,1(x), . . . , pn,κj (x)}′, and we denote the κj-dimensional vector of approximating functions by cubic B-splines. Here the numberof series terms κj can be different for each inequality. We let X̃j denote the n × κjmatrix whose ith row is pnj(Xji)

    ′ and Ψ̃j denote the Mj × κj matrix whose mthrow is pnj(xjm)

    ′. We can then complete the same procedure described in section 3.1,substituting X̃j and Ψ̃j for Xj and Ψj , respectively.

  • 28 Intersection bounds in Stata

    In this implementation, the dimension dj of Xji is 1, and the approximating func-tions are cubic B-splines. However, it is possible to implement high-dimensional Xjiand other possible basis functions by programming a suitable design matrix manu-ally and running our commands with an option of parametric estimation. This isbasically equivalent to modifying X̃j and Ψ̃j in series estimation. See section 4.2 ofChernozhukov, Lee, and Rosen (2013) for details.

    3.3 Local linear estimation

    For any vector v, we let ρ̂j(v) denote the vector whose kth element is the local linearregression estimate of yj on Xj at the kth element of v. In detail, the kth element ofρ̂j(v), say, ρ̂j(vk), is defined as follows,

    ρ̂j(vk) ≡ e′1(X ′vkW jXvk)−1X ′vkW jyjwhere e1 ≡ (1, 0)′,

    Xvk ≡

    ⎛⎜⎝ 1 (Xj1 − vk)... ...1 (Xjn − vk)

    ⎞⎟⎠ , W j ≡ diag{K (Xj1 − vkhj

    ), . . . ,K

    (Xjn − vk

    hj

    )}

    K(·) is a kernel function, and hj is the bandwidth for inequality j. Recall that thedimension dj of Xji is one in local linear estimation. In our implementation, we usedthe following kernel function:

    K(s) =15

    16

    (1− s2)2 1 (| s |≤ 1)

    Then the estimator of θ ≡ {θ1(x1)′, . . . ,θJ (xJ)′}′ is θ̂ ≡ {ρ̂1(ψ1)′, . . . , ρ̂J(ψJ)′}′,where ψj denotes the Mj × 1 vector whose mth element is xjm.

    Now we let ŝj denote the Mj × 1 vector whose mth element is√g2jm(yj ,Xj)/nhj ,

    where

    g2jm(yj ,Xj) = n−1

    n∑i=1

    ĝji(Yji, Xji, xjm)2

    ĝji(Yji, Xji, xjm) =Yji − ρ̂j(Xji)√hj f̂j(xjm)

    K

    (xjm −Xji

    hj

    )f̂j(xjm) is the kernel estimate of the density of the covariate for the jth inequality,evaluated at xjm. Then we can compute ŝ as ŝ = (ŝ

    ′1, . . . , ŝ

    ′J )

    ′.

    To compute the critical value k(p), we let Φj denote the Mj × n matrix whose mthrow is {ĝj1(Yj1, Xj1, xjm), . . . , ĝjn(Yjn, Xjn, xjm)}/

    √nhjg2jm(yj ,Xj). We define

    Φ ≡

    ⎛⎜⎝ Φ1...ΦJ

    ⎞⎟⎠

  • V. Chernozhukov, W. Kim, S. Lee, and A. Rosen 29

    We simulate pseudorandom numbers from the N(0, 1) distribution and construct ann×R matrix, ZR. We then select the critical value as

    k(p) = the pth quantile of maxcol. (ΦZR) (14)

    The calculation of the bias-corrected estimator θ̂n0(p) is almost the same as that ofparametric estimation. That is,

    θ̂n0(p) = maxcol.

    {θ̂ − k(p)ŝ

    }However, the AIS procedure is slightly different because we do not use Ψ in local linearestimation.

    1. Set γ̃n ≡ 1− .1/ log n. Keep the mth row of each Φj , j = 1, . . . , J , if and only if

    ρ̂j (xjm) ≥ θ̂n0 (γ̃n)− 2k (γ̃n) ŝjmwhere ŝjm is the mth element of ŝj .

    2. For j = 1, . . . , J , replace Φj with the kept rows of Φj in step 1. Then, recompute

    the critical value in (14), and obtain the final estimate θ̂n0(p) with the updatedcritical value.

    4 Installation of the clrbound package

    All of our commands require the package moremata (Jann 2005), which can be installedby typing ssc install moremata, replace in the Stata Command window.

    5 The clr2bound command

    5.1 Syntax

    The syntax of clr2bound is as follows:

    clr2bound ((lowerdepvar1 indepvars1 range1)[(lowerdepvar2 indepvars2 range2) . . . (lowerdepvarN indepvarsN rangeN)

    ])

    ((upperdepvarN+1 indepvarsN+1 rangeN+1)[(upperdepvarN+2 indepvarsN+2 rangeN+2) . . .

    (upperdepvarN+M indepvarsN+M rangeN+M)])[if] [

    in] [

    ,

    method(series | local) notest null(real) level(numlist) noaisminsmooth(#) maxsmooth(#) noundersmooth bandwidth(#) rnd(#)

    norseed seed(#)]

  • 30 Intersection bounds in Stata

    5.2 Description

    The command clr2bound estimates a two-sided confidence interval [θ̂ln0(p̃), θ̂un0(p̃)],

    where p̃ = (level() + 1)/2. By (5) and Bonferroni’s inequality, this interval containsthe identified set ΘI with probability at least level() asymptotically, that is, suchthat (6) holds with p = level(). The variables lowerdepvar1, . . . , lowerdepvarN arethe dependent variables (Y lj ’s) for the lower-bounding functions, and the variables up-perdepvarN+1, . . . , upperdepvarN+M are the dependent variables (Y uj ’s) for the upper-bounding functions. The variables indepvars1, . . . , indepvarN+M are explanatory vari-ables for the corresponding dependent variables. clr2bound allows for multidimensionalindepvars for parametric estimation but for only a one-dimensional independent variablefor series and local linear estimation.

    The variables range1, . . . , rangeN+M are sets of grid points over which the boundingfunction is estimated, corresponding to the sets X lj and X uj in (1). The number ofobservations for the range is not necessarily the same as the number of observationsfor the depvar and indepvars. The latter is the sample size, whereas the former is thenumber of grid points to evaluate the maximum or minimum values of the boundingfunctions.

    Note that the parentheses must be used properly. Variables for lower bounds andupper bounds must be put in additional parentheses separately. For example, if there aretwo variable sets, say, (ldepvar1 indepvars1 range1) and (ldepvar2 indepvars2 range2)for the lower-bounds estimation and one variable set, say, (udepvar1 indepvars3 range3)for the upper-bounds estimation, the right syntax for two-sided intersection boundsestimation is ((ldepvar1 indepvars1 range1) (ldepvar2 indepvars2 range2)) ((udepvar1indepvars3 range3)).

    In addition, clr2bound provides a test result for the null hypothesis that the specifiedvalue is in the intersection bounds for each confidence level. If the value is unspecified,the null hypothesis is that the parameter of interest is 0. This test uses (10), whichis a more stringent requirement than simply checking whether the value lies withinthe confidence set reported by clr2bound, which is based on Bonferroni’s inequality.Therefore, this test may reject some values in the reported confidence set at the sameconfidence level.

    5.3 Options

    method(series | local) specifies the method of estimation. By default, clr2boundwill conduct parametric estimation. If method(series) is specified, clr2boundwill conduct series estimation with cubic B-splines. If method(local) is specified,clr2bound will conduct local linear estimation.

    notest determines whether clr2bound conducts a test. clr2bound provides a testfor the null hypothesis that the specified value is in the intersection bounds atthe confidence levels specified in the level() option below. By default, clr2boundconducts the test. Specifying this option causes clr2bound to output only Bonferronibounds.

  • V. Chernozhukov, W. Kim, S. Lee, and A. Rosen 31

    null(real) specifies the value for θ∗ under the null hypothesis of the test we describedabove. The default is null(0).

    level(numlist) specifies confidence levels. numlist must contain only real numbersbetween 0 and 1. If this option is specified as level(0.5), the result is the half-median-unbiased estimator of the parameter of interest. The default is level(0.50.9 0.95 0.99).

    noais determines whether AIS should be applied. AIS helps to get sharper bounds byusing a problem-dependent cutoff to drop irrelevant grid points of the range. Thedefault is to use AIS.

    minsmooth(#) and maxsmooth(#) specify the minimum and maximum possible num-bers of approximating functions considered in the cross-validation procedure forB-splines. Specifically, the number of approximating functions K̂cv is set to theminimizer of the leave-one-out least-squares cross-validation score within this range.For example, if a user inputs minsmooth(5) and maxsmooth(9), K̂cv is chosen fromthe set (5, 6, 7, 8, 9). The procedure calculates this number separately for each in-equality. The default is minsmooth(5) and maxsmooth(20). If undersmoothing isperformed, the number of approximating functions K ultimately used will be givenby the largest integer smaller than K̂cv multiplied by the undersmoothing factorn−1/5 × n2/7; see option noundersmooth below. This option is available for onlyseries estimation.

    noundersmooth determines whether undersmoothing is carried out, with the defaultbeing to undersmooth. In series estimation, undersmoothing is implemented by firstcomputing K̂cv as the minimizer of the leave-one-out least-squares cross-validationscore. Without this option, the number of approximating functions is then setto K, which is given by the largest integer that is less than or equal to K̂ :=K̂cv×n−1/5×n2/7. The noundersmooth option uses K̂cv. For local linear estimation,undersmoothing is done by setting the bandwidth to h = ĥROT × ŝv ×n1/5 ×n−2/7,where ĥROT is the “rule-of-thumb” bandwidth that is used in Chernozhukov, Lee,and Rosen (2013). The noundersmooth option instead uses ĥROT × ŝv. This optionis available for only series and local linear estimation.

    bandwidth(#) specifies the value of the bandwidth used in local linear estimation. Bydefault, clr2bound calculates a bandwidth for each inequality. With undersmooth-ing, we use the “rule-of-thumb” bandwidth h = ĥROT× ŝv×n1/5×n−2/7, where ŝv isthe square root of the sample variance of V , and ĥROT is the “rule-of-thumb” band-width for estimation of θ(v) with Studentized V . See Chernozhukov, Lee, and Rosen

    (2013) for the exact form of ĥROT. When the bandwidth(#) is specified, clr2bounduses the given bandwidth as the global bandwidth for every inequality. This optionis available for only local linear estimation.

    rnd(#) specifies the number of columns of the random matrix generated from thestandard normal distribution. This matrix is used to compute critical values. Forexample, if the number is 10,000 and the level is 0.95, we choose the 0.95 quantilefrom 10,000 randomly generated elements. The default is rnd(10000).

  • 32 Intersection bounds in Stata

    norseed determines whether to reset the seed number for the simulation used in thecalculation. For example, if a user wants to use this command for simulations carriedout as part of a Monte Carlo study, this command can be used to prevent resettingthe seed number in each Monte Carlo iteration. The default is to reset the seednumber.

    seed(#) specifies the seed number for the random number generation described above.To prevent the estimation result from changing one particular value to anotherrandomly, clr2bound always initially conducts set seed#. The default is seed(0).

    5.4 Stored results

    In the following, “l.b.e.” stands for lower-bound estimation, “u.b.e.” for upper-boundestimation, and “ineq” stands for inequality. (i) denotes the ith inequality. (lev) meansthe confidence level’s decimal part. For example, when the confidence level is 97.5% or0.975, (lev) is 975. The number of elements in (lev) is equal to the number of confidencelevels specified by the level() option. Some results are available for only series or locallinear estimation.

    clr2bound stores the following in e(). Note that for this command and all othercommands, 1 is used in the stored AIS results to denote values that were kept in theindex set, and 0 is used to denote values that were dropped.

    Scalarse(N) number of observations e(l bdwh(i)) bandwidth for (i) of l.b.ee(null) the null hypothesis e(u bdwh(i)) bandwidth for (i) of u.b.e.e(l ineq) # of ineq’s in l.b.e. e(lbd(lev)) est. results of l.b.e.e(u ineq) # of ineq’s in u.b.e. e(ubd(lev)) est. results of u.b.e.e(l grid(i)) # of grid points in (i) of l.b.e. e(lcl(lev)) critical value of l.b.e.e(u grid(i)) # of grid points in