advances in social science research using r · hrishikesh d. vinod, ph. d. department of economics...

22
Advances in Social Science Research Using R For other titles in this series, go to www.springer.com/series/694

Upload: others

Post on 07-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Advances in Social Science Research Using R

For other titles in this series go towwwspringercomseries694

HD VinodEditor

Advances in Social ScienceResearch Using R

123

EditorHrishikesh D Vinod Ph DDepartment of EconomicsFordham University441 E Fordham RoadBronx NY 10458USA

ISBN 978-1-4419-1763-8 e-ISBN 978-1-4419-1764-5DOI 101007978-1-4419-1764-5Springer New York Dordrecht Heidelberg London

Library of Congress Control Number 2009942719

ccopy Springer Science+Business Media LLC 2010All rights reserved This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media LLC 233 Spring Street New YorkNY 10013 USA) except for brief excerpts in connection with reviews or scholarly analysis Use inconnection with any form of information storage and retrieval electronic adaptation computersoftware or by similar or dissimilar methodology now known or hereafter developed is forbiddenThe use in this publication of trade names trademarks service marks and similar terms even ifthey are not identified as such is not to be taken as an expression of opinion as to whether or notthey are subject to proprietary rights

Printed on acid-free paper

Springer is part of Springer Science+Business Media (wwwspringercom)

To My Daughter RitaSon-in-Law Dale Knauseand My Grandchildren

Devin Vinod Knause and Troy Vinod Knause

Preface

Quantitative social science research has been expanding due to the avail-ability of computers and data over the past few decades Yet the textbooksand supplements for researchers do not adequately highlight the revolutioncreated by the R software [2] and graphics system R is fast becoming the lin-gua franca of quantitative research with some 2000 free specialized packageswhere the latest versions can be downloaded in seconds Many packages suchas ldquocarrdquo [1] developed by social scientists are popular among all scientists

An early 2009 article [3] in the New York Times notes that statisticiansengineers and scientists without computer programming skills find Rldquoeasy touserdquo A common language R can readily promote deeper mutual respect andunderstanding of unique problems facing quantitative work in various socialsciences Often the solutions developed in one field can be extended and usedin many fields This book promotes just such exchange of ideas across manysocial sciences Since Springer has played a leadership role in promoting Rwe are fortunate to have Springer publish this book

A Conference on Quantitative Social Science Research Using R was heldin New York City at the Lincoln Center campus of Fordham University June18ndash19 2009 This book contains selected papers presented at the conferencerepresenting the ldquoProceedingsrdquo of the conference

The conference offered an opportunity for enthusiastic users of R to discusstheir research ideas and some policy applications in social sciences to learnmeet and mingle Speakers included authors of books andor packages forR published researchers and editors of important journals in statistics andsocial sciences The keynote speaker was Roger Koenker Professor of Eco-nomics University of Illinois and there was a distinguished panel of invitedspeakers Prof Andrew Gelman Director of the Applied Statistics Center atColumbia University Prof Kosuke Imai Department of Politics at Prince-ton University Prof Keith A Markus Psychology Department at John JayCollege of Criminal Justice Prof B D McCullough Department of DecisionSciences Drexel University and Prof Achim Zeileis Department of Statisticsand Mathematics WU Wirtschaftsuniversitat Wien (Vienna)

vii

viii Preface

We had a truly interdisciplinary conference organizing committee of Ford-ham professors Co-chair Frank Hsu (Computer and Information Sciences)and H D Vinod (Economics) Jose Aleman treasurer (Political Science) Re-maining Members (alphabetical order) were Elena Filatova (Computer andInformation Sciences) Se-Kang Kim (Psychology) Erick W Rengifo (Eco-nomics) Emily Rosenbaum (Sociology) Matthew Weinshenker (Sociology)and Tiffany Yip (Psychology)

Fordham University was the main sponsor of the conference providing allthe facilities guidance and staff help It would not have been possible withoutthe active support from the deans of the Graduate School of Arts and Sciences(GSAS Dean Dr Nancy Busch) Fordham College at Rose Hill (FCRH DeanDr Brennan OrsquoDonnell) and College of Business Administration (CBA DeanDr Donna Rapaccioli) Another active supporter Dr Robert HimmelbergDean of Arts amp Sciences Faculty inaugurated the conference Cosponsors werethe Society of Indian Academics in America (SIAA) and Global Associationof Risk Professionals (GARP) The modest registration fee paid for the coffeebreaks and a cocktail reception with snacks on the evening of Thursday June18 2009

Although seating was limited and a registration fee was charged seventy-five people registered and attended the conference A limited number of grad-uate and undergraduate students from various disciplines attended the con-ference Fordham University faculty and deans encouraged their students totake advantage of this opportunity to learn R Students could learn someinnovative and practical research tools providing intellectual skills havingpotentially lifelong benefits The conference provided graduate students andyoung faculty in all disciplines a great chance to see talk to and seek ad-vice from researchers using R The conference program and details remainavailable at the conference website httpwwwcisfordhameduQR2009

Most attendees thought that the conference was a great success The re-search work presented was of high quality covering a wide range of importanttopics along with important examples and R implementations This book ofldquoProceedingsrdquo extends the benefits of attending the conference to the muchwider audience of academics and professionals in various sciences around theworld For example one of our cosponsors (GARP mentioned above) encour-aged risk professionals who model risk and uncertainty in financial marketsand who work in the metropolitan New York area to attend the conference

A Brief Review of Each Chapter

Since all chapters come with their own abstracts one may wonder why Iam including this section Note that our conference was interdisciplinary andindividual authors have generally focused their abstracts to be appealing toreaders from their own specific disciplines I am including this section here

Preface ix

because I believe that chosen papers are of interest to a much wider range ofreaders from other fields and also to those interested in the R software perse Consistent with an aim of the conference the following summaries areintended to promote greater cross fertilization of ideas among all scientists

1] The chapter by McCullough shows the importance of numerically accu-rate computing indicating the relevance of forward and backward errors andcondition numbers in all sciences He shows that R software is accurate usingsome classic examples A general reader might be interested in McCulloughrsquosreasons for questioning the reliability of Donohue and Levittrsquos controversialpaper These authors speculated that a disproportionate number of aborted(unwanted) children would have grown up to become criminals and claimedto show that abortion reduces crime after a gap of 20 years

2] Koenkerrsquos chapter should be of interest to anyone seeking state-of-the-art flexible functional forms for regression fitting The ldquoadditiverdquo models seeka clever compromise between parametric and nonparametric componentsSome commonly made data ldquoadjustmentsrdquo are shown to be misleading andavoidable The R package ldquoquantregrdquo implementation for modeling childhoodmalnutrition in India also should be of interest to policymakers For examplemothers who are employed and wealthier are found to have taller children

3] Gelmanrsquos chapter is of practical interest to all of us who use R forgraphics and his example of US voting participation rates is of interest toall who care about democracy and responsible citizenry

4] My chapter suggests new solutions to a rather old and common problemof efficient estimation despite autocorrelation and heteroscedasticity amongregression errors Since regression is a ubiquitous tool in all sciences it shouldbe of much wider interest

5] Markus and Gursquos chapter is of great interest in exploratory data analysisin any scientific field Given any data set one often needs to develop initialunderstanding of the nature of relationships between three-way continuousvariables It is enhanced by comparing their new ldquobubble plotsrdquo convenientlymade available as an R script called ldquobp3way()rdquo at the conference websitehttpwwwcisfordhameduQR2009

6] Vinod Hsu and Tianrsquos chapter deals with the portfolio selection prob-lem of interest to anyone with an investment portfolio It is also of interestto computer scientists and data-mining experts since combinatorial fusioncomes from those fields The idea of mapping various related variables into acomparable set ready for combining them has applications in many social sci-ences Social scientists doing exploratory data analysis where lots of variablesare potentially relevant should find these R tools useful

7] Foster and Kecojevicrsquos chapter is of wide interest because they extendanalysis of covariance (ANCOVA) a very general statistical tool After allmany scientists want to know whether certain factors have an effect on acontinuous outcome variable while removing the variance associated withsome covariates Another common problem addressed here occurs when ob-servations with the big residuals need to be down weighted The wonderful

x Preface

(highly sophisticated) R graphics and growth charts for Saudi children arealso of interest in their own right

8] Imai Keele Tingley and Yamamotorsquos chapter addresses an age-old sci-entific problem of assessing the direction and strength of causation amongvariables It is remarkably powerful because of flexibility and variety in thetypes of causendasheffect relations admitted The authors explain the use of theirR package called ldquomediationrdquo Their ldquojob searchrdquo application has wider re-search interest providing room for extensions since the unemployment prob-lem can be viewed from the distinct viewpoint of several social sciences

9] Haupt Schnurbus and Tschernigrsquos chapter focuses on the problem ofthe choice of functional form for an unknown potentially nonlinear relation-ship This discussion goes beyond [4] where I discuss the related problemof choosing production functions This chapter describes how to use flexiblenonparametric kernel regression models for model validation of parametricspecifications including misspecification testing and prediction simulationsThe ldquorelaxrdquo package is shown to be useful for model visualization and valida-tion

10] Rindskopfrsquos chapter shows how to use R to fit a multinomial model thatis parallel to the usual multivariate analysis of variance (MANOVA) A widevariety of categorical data models can be fit using this program includingthe usual test of independence logit and loglinear models logistic regressionquasi-independence models models with equality restrictions on parametersand a variety of nonstandard loglinear models An advantage of this programover some loglinear models is the clear separation of independent (predictor)and dependent (outcome) variables Examples are presented from psychologysociology political science and medicine

11] Neathrsquos chapter dealing with location of hazardous waste sites alsodeals with a topic of wide general interest in policy circles More importantBayesian analysis of posterior distributions used to be less graphical and lessintuitive before the recent availability of R packages Neath shows how touse ldquoWinBUGSrdquo and ldquoR2WinBUGSrdquo interfaces in the context of an exampleproviding a template for applications in other fields

12] Numatsi and Rengiforsquos chapter dealing with financial market volatilityis of interest to anyone with an investment portfolio subject to financial crisesIt studies time series models using FTSE 100 daily returns Many fields havetime series with persistent discrete jumps misspecification and other issueswhich are solved by the authorsrsquo R software

My short descriptions of twelve chapters explain why I am enthusiasticabout including them in this book They are my possibly inadequate salespitches encouraging all readers to study the original chapters in detail Theauthors of papers presented at the conference were given a short deadlineof a couple of months for submission of their papers as a Latex documentsuitable for Springer publications We are including only the ones who metthe deadline and whose referees recommended their publication

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 2: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

HD VinodEditor

Advances in Social ScienceResearch Using R

123

EditorHrishikesh D Vinod Ph DDepartment of EconomicsFordham University441 E Fordham RoadBronx NY 10458USA

ISBN 978-1-4419-1763-8 e-ISBN 978-1-4419-1764-5DOI 101007978-1-4419-1764-5Springer New York Dordrecht Heidelberg London

Library of Congress Control Number 2009942719

ccopy Springer Science+Business Media LLC 2010All rights reserved This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media LLC 233 Spring Street New YorkNY 10013 USA) except for brief excerpts in connection with reviews or scholarly analysis Use inconnection with any form of information storage and retrieval electronic adaptation computersoftware or by similar or dissimilar methodology now known or hereafter developed is forbiddenThe use in this publication of trade names trademarks service marks and similar terms even ifthey are not identified as such is not to be taken as an expression of opinion as to whether or notthey are subject to proprietary rights

Printed on acid-free paper

Springer is part of Springer Science+Business Media (wwwspringercom)

To My Daughter RitaSon-in-Law Dale Knauseand My Grandchildren

Devin Vinod Knause and Troy Vinod Knause

Preface

Quantitative social science research has been expanding due to the avail-ability of computers and data over the past few decades Yet the textbooksand supplements for researchers do not adequately highlight the revolutioncreated by the R software [2] and graphics system R is fast becoming the lin-gua franca of quantitative research with some 2000 free specialized packageswhere the latest versions can be downloaded in seconds Many packages suchas ldquocarrdquo [1] developed by social scientists are popular among all scientists

An early 2009 article [3] in the New York Times notes that statisticiansengineers and scientists without computer programming skills find Rldquoeasy touserdquo A common language R can readily promote deeper mutual respect andunderstanding of unique problems facing quantitative work in various socialsciences Often the solutions developed in one field can be extended and usedin many fields This book promotes just such exchange of ideas across manysocial sciences Since Springer has played a leadership role in promoting Rwe are fortunate to have Springer publish this book

A Conference on Quantitative Social Science Research Using R was heldin New York City at the Lincoln Center campus of Fordham University June18ndash19 2009 This book contains selected papers presented at the conferencerepresenting the ldquoProceedingsrdquo of the conference

The conference offered an opportunity for enthusiastic users of R to discusstheir research ideas and some policy applications in social sciences to learnmeet and mingle Speakers included authors of books andor packages forR published researchers and editors of important journals in statistics andsocial sciences The keynote speaker was Roger Koenker Professor of Eco-nomics University of Illinois and there was a distinguished panel of invitedspeakers Prof Andrew Gelman Director of the Applied Statistics Center atColumbia University Prof Kosuke Imai Department of Politics at Prince-ton University Prof Keith A Markus Psychology Department at John JayCollege of Criminal Justice Prof B D McCullough Department of DecisionSciences Drexel University and Prof Achim Zeileis Department of Statisticsand Mathematics WU Wirtschaftsuniversitat Wien (Vienna)

vii

viii Preface

We had a truly interdisciplinary conference organizing committee of Ford-ham professors Co-chair Frank Hsu (Computer and Information Sciences)and H D Vinod (Economics) Jose Aleman treasurer (Political Science) Re-maining Members (alphabetical order) were Elena Filatova (Computer andInformation Sciences) Se-Kang Kim (Psychology) Erick W Rengifo (Eco-nomics) Emily Rosenbaum (Sociology) Matthew Weinshenker (Sociology)and Tiffany Yip (Psychology)

Fordham University was the main sponsor of the conference providing allthe facilities guidance and staff help It would not have been possible withoutthe active support from the deans of the Graduate School of Arts and Sciences(GSAS Dean Dr Nancy Busch) Fordham College at Rose Hill (FCRH DeanDr Brennan OrsquoDonnell) and College of Business Administration (CBA DeanDr Donna Rapaccioli) Another active supporter Dr Robert HimmelbergDean of Arts amp Sciences Faculty inaugurated the conference Cosponsors werethe Society of Indian Academics in America (SIAA) and Global Associationof Risk Professionals (GARP) The modest registration fee paid for the coffeebreaks and a cocktail reception with snacks on the evening of Thursday June18 2009

Although seating was limited and a registration fee was charged seventy-five people registered and attended the conference A limited number of grad-uate and undergraduate students from various disciplines attended the con-ference Fordham University faculty and deans encouraged their students totake advantage of this opportunity to learn R Students could learn someinnovative and practical research tools providing intellectual skills havingpotentially lifelong benefits The conference provided graduate students andyoung faculty in all disciplines a great chance to see talk to and seek ad-vice from researchers using R The conference program and details remainavailable at the conference website httpwwwcisfordhameduQR2009

Most attendees thought that the conference was a great success The re-search work presented was of high quality covering a wide range of importanttopics along with important examples and R implementations This book ofldquoProceedingsrdquo extends the benefits of attending the conference to the muchwider audience of academics and professionals in various sciences around theworld For example one of our cosponsors (GARP mentioned above) encour-aged risk professionals who model risk and uncertainty in financial marketsand who work in the metropolitan New York area to attend the conference

A Brief Review of Each Chapter

Since all chapters come with their own abstracts one may wonder why Iam including this section Note that our conference was interdisciplinary andindividual authors have generally focused their abstracts to be appealing toreaders from their own specific disciplines I am including this section here

Preface ix

because I believe that chosen papers are of interest to a much wider range ofreaders from other fields and also to those interested in the R software perse Consistent with an aim of the conference the following summaries areintended to promote greater cross fertilization of ideas among all scientists

1] The chapter by McCullough shows the importance of numerically accu-rate computing indicating the relevance of forward and backward errors andcondition numbers in all sciences He shows that R software is accurate usingsome classic examples A general reader might be interested in McCulloughrsquosreasons for questioning the reliability of Donohue and Levittrsquos controversialpaper These authors speculated that a disproportionate number of aborted(unwanted) children would have grown up to become criminals and claimedto show that abortion reduces crime after a gap of 20 years

2] Koenkerrsquos chapter should be of interest to anyone seeking state-of-the-art flexible functional forms for regression fitting The ldquoadditiverdquo models seeka clever compromise between parametric and nonparametric componentsSome commonly made data ldquoadjustmentsrdquo are shown to be misleading andavoidable The R package ldquoquantregrdquo implementation for modeling childhoodmalnutrition in India also should be of interest to policymakers For examplemothers who are employed and wealthier are found to have taller children

3] Gelmanrsquos chapter is of practical interest to all of us who use R forgraphics and his example of US voting participation rates is of interest toall who care about democracy and responsible citizenry

4] My chapter suggests new solutions to a rather old and common problemof efficient estimation despite autocorrelation and heteroscedasticity amongregression errors Since regression is a ubiquitous tool in all sciences it shouldbe of much wider interest

5] Markus and Gursquos chapter is of great interest in exploratory data analysisin any scientific field Given any data set one often needs to develop initialunderstanding of the nature of relationships between three-way continuousvariables It is enhanced by comparing their new ldquobubble plotsrdquo convenientlymade available as an R script called ldquobp3way()rdquo at the conference websitehttpwwwcisfordhameduQR2009

6] Vinod Hsu and Tianrsquos chapter deals with the portfolio selection prob-lem of interest to anyone with an investment portfolio It is also of interestto computer scientists and data-mining experts since combinatorial fusioncomes from those fields The idea of mapping various related variables into acomparable set ready for combining them has applications in many social sci-ences Social scientists doing exploratory data analysis where lots of variablesare potentially relevant should find these R tools useful

7] Foster and Kecojevicrsquos chapter is of wide interest because they extendanalysis of covariance (ANCOVA) a very general statistical tool After allmany scientists want to know whether certain factors have an effect on acontinuous outcome variable while removing the variance associated withsome covariates Another common problem addressed here occurs when ob-servations with the big residuals need to be down weighted The wonderful

x Preface

(highly sophisticated) R graphics and growth charts for Saudi children arealso of interest in their own right

8] Imai Keele Tingley and Yamamotorsquos chapter addresses an age-old sci-entific problem of assessing the direction and strength of causation amongvariables It is remarkably powerful because of flexibility and variety in thetypes of causendasheffect relations admitted The authors explain the use of theirR package called ldquomediationrdquo Their ldquojob searchrdquo application has wider re-search interest providing room for extensions since the unemployment prob-lem can be viewed from the distinct viewpoint of several social sciences

9] Haupt Schnurbus and Tschernigrsquos chapter focuses on the problem ofthe choice of functional form for an unknown potentially nonlinear relation-ship This discussion goes beyond [4] where I discuss the related problemof choosing production functions This chapter describes how to use flexiblenonparametric kernel regression models for model validation of parametricspecifications including misspecification testing and prediction simulationsThe ldquorelaxrdquo package is shown to be useful for model visualization and valida-tion

10] Rindskopfrsquos chapter shows how to use R to fit a multinomial model thatis parallel to the usual multivariate analysis of variance (MANOVA) A widevariety of categorical data models can be fit using this program includingthe usual test of independence logit and loglinear models logistic regressionquasi-independence models models with equality restrictions on parametersand a variety of nonstandard loglinear models An advantage of this programover some loglinear models is the clear separation of independent (predictor)and dependent (outcome) variables Examples are presented from psychologysociology political science and medicine

11] Neathrsquos chapter dealing with location of hazardous waste sites alsodeals with a topic of wide general interest in policy circles More importantBayesian analysis of posterior distributions used to be less graphical and lessintuitive before the recent availability of R packages Neath shows how touse ldquoWinBUGSrdquo and ldquoR2WinBUGSrdquo interfaces in the context of an exampleproviding a template for applications in other fields

12] Numatsi and Rengiforsquos chapter dealing with financial market volatilityis of interest to anyone with an investment portfolio subject to financial crisesIt studies time series models using FTSE 100 daily returns Many fields havetime series with persistent discrete jumps misspecification and other issueswhich are solved by the authorsrsquo R software

My short descriptions of twelve chapters explain why I am enthusiasticabout including them in this book They are my possibly inadequate salespitches encouraging all readers to study the original chapters in detail Theauthors of papers presented at the conference were given a short deadlineof a couple of months for submission of their papers as a Latex documentsuitable for Springer publications We are including only the ones who metthe deadline and whose referees recommended their publication

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 3: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

EditorHrishikesh D Vinod Ph DDepartment of EconomicsFordham University441 E Fordham RoadBronx NY 10458USA

ISBN 978-1-4419-1763-8 e-ISBN 978-1-4419-1764-5DOI 101007978-1-4419-1764-5Springer New York Dordrecht Heidelberg London

Library of Congress Control Number 2009942719

ccopy Springer Science+Business Media LLC 2010All rights reserved This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media LLC 233 Spring Street New YorkNY 10013 USA) except for brief excerpts in connection with reviews or scholarly analysis Use inconnection with any form of information storage and retrieval electronic adaptation computersoftware or by similar or dissimilar methodology now known or hereafter developed is forbiddenThe use in this publication of trade names trademarks service marks and similar terms even ifthey are not identified as such is not to be taken as an expression of opinion as to whether or notthey are subject to proprietary rights

Printed on acid-free paper

Springer is part of Springer Science+Business Media (wwwspringercom)

To My Daughter RitaSon-in-Law Dale Knauseand My Grandchildren

Devin Vinod Knause and Troy Vinod Knause

Preface

Quantitative social science research has been expanding due to the avail-ability of computers and data over the past few decades Yet the textbooksand supplements for researchers do not adequately highlight the revolutioncreated by the R software [2] and graphics system R is fast becoming the lin-gua franca of quantitative research with some 2000 free specialized packageswhere the latest versions can be downloaded in seconds Many packages suchas ldquocarrdquo [1] developed by social scientists are popular among all scientists

An early 2009 article [3] in the New York Times notes that statisticiansengineers and scientists without computer programming skills find Rldquoeasy touserdquo A common language R can readily promote deeper mutual respect andunderstanding of unique problems facing quantitative work in various socialsciences Often the solutions developed in one field can be extended and usedin many fields This book promotes just such exchange of ideas across manysocial sciences Since Springer has played a leadership role in promoting Rwe are fortunate to have Springer publish this book

A Conference on Quantitative Social Science Research Using R was heldin New York City at the Lincoln Center campus of Fordham University June18ndash19 2009 This book contains selected papers presented at the conferencerepresenting the ldquoProceedingsrdquo of the conference

The conference offered an opportunity for enthusiastic users of R to discusstheir research ideas and some policy applications in social sciences to learnmeet and mingle Speakers included authors of books andor packages forR published researchers and editors of important journals in statistics andsocial sciences The keynote speaker was Roger Koenker Professor of Eco-nomics University of Illinois and there was a distinguished panel of invitedspeakers Prof Andrew Gelman Director of the Applied Statistics Center atColumbia University Prof Kosuke Imai Department of Politics at Prince-ton University Prof Keith A Markus Psychology Department at John JayCollege of Criminal Justice Prof B D McCullough Department of DecisionSciences Drexel University and Prof Achim Zeileis Department of Statisticsand Mathematics WU Wirtschaftsuniversitat Wien (Vienna)

vii

viii Preface

We had a truly interdisciplinary conference organizing committee of Ford-ham professors Co-chair Frank Hsu (Computer and Information Sciences)and H D Vinod (Economics) Jose Aleman treasurer (Political Science) Re-maining Members (alphabetical order) were Elena Filatova (Computer andInformation Sciences) Se-Kang Kim (Psychology) Erick W Rengifo (Eco-nomics) Emily Rosenbaum (Sociology) Matthew Weinshenker (Sociology)and Tiffany Yip (Psychology)

Fordham University was the main sponsor of the conference providing allthe facilities guidance and staff help It would not have been possible withoutthe active support from the deans of the Graduate School of Arts and Sciences(GSAS Dean Dr Nancy Busch) Fordham College at Rose Hill (FCRH DeanDr Brennan OrsquoDonnell) and College of Business Administration (CBA DeanDr Donna Rapaccioli) Another active supporter Dr Robert HimmelbergDean of Arts amp Sciences Faculty inaugurated the conference Cosponsors werethe Society of Indian Academics in America (SIAA) and Global Associationof Risk Professionals (GARP) The modest registration fee paid for the coffeebreaks and a cocktail reception with snacks on the evening of Thursday June18 2009

Although seating was limited and a registration fee was charged seventy-five people registered and attended the conference A limited number of grad-uate and undergraduate students from various disciplines attended the con-ference Fordham University faculty and deans encouraged their students totake advantage of this opportunity to learn R Students could learn someinnovative and practical research tools providing intellectual skills havingpotentially lifelong benefits The conference provided graduate students andyoung faculty in all disciplines a great chance to see talk to and seek ad-vice from researchers using R The conference program and details remainavailable at the conference website httpwwwcisfordhameduQR2009

Most attendees thought that the conference was a great success The re-search work presented was of high quality covering a wide range of importanttopics along with important examples and R implementations This book ofldquoProceedingsrdquo extends the benefits of attending the conference to the muchwider audience of academics and professionals in various sciences around theworld For example one of our cosponsors (GARP mentioned above) encour-aged risk professionals who model risk and uncertainty in financial marketsand who work in the metropolitan New York area to attend the conference

A Brief Review of Each Chapter

Since all chapters come with their own abstracts one may wonder why Iam including this section Note that our conference was interdisciplinary andindividual authors have generally focused their abstracts to be appealing toreaders from their own specific disciplines I am including this section here

Preface ix

because I believe that chosen papers are of interest to a much wider range ofreaders from other fields and also to those interested in the R software perse Consistent with an aim of the conference the following summaries areintended to promote greater cross fertilization of ideas among all scientists

1] The chapter by McCullough shows the importance of numerically accu-rate computing indicating the relevance of forward and backward errors andcondition numbers in all sciences He shows that R software is accurate usingsome classic examples A general reader might be interested in McCulloughrsquosreasons for questioning the reliability of Donohue and Levittrsquos controversialpaper These authors speculated that a disproportionate number of aborted(unwanted) children would have grown up to become criminals and claimedto show that abortion reduces crime after a gap of 20 years

2] Koenkerrsquos chapter should be of interest to anyone seeking state-of-the-art flexible functional forms for regression fitting The ldquoadditiverdquo models seeka clever compromise between parametric and nonparametric componentsSome commonly made data ldquoadjustmentsrdquo are shown to be misleading andavoidable The R package ldquoquantregrdquo implementation for modeling childhoodmalnutrition in India also should be of interest to policymakers For examplemothers who are employed and wealthier are found to have taller children

3] Gelmanrsquos chapter is of practical interest to all of us who use R forgraphics and his example of US voting participation rates is of interest toall who care about democracy and responsible citizenry

4] My chapter suggests new solutions to a rather old and common problemof efficient estimation despite autocorrelation and heteroscedasticity amongregression errors Since regression is a ubiquitous tool in all sciences it shouldbe of much wider interest

5] Markus and Gursquos chapter is of great interest in exploratory data analysisin any scientific field Given any data set one often needs to develop initialunderstanding of the nature of relationships between three-way continuousvariables It is enhanced by comparing their new ldquobubble plotsrdquo convenientlymade available as an R script called ldquobp3way()rdquo at the conference websitehttpwwwcisfordhameduQR2009

6] Vinod Hsu and Tianrsquos chapter deals with the portfolio selection prob-lem of interest to anyone with an investment portfolio It is also of interestto computer scientists and data-mining experts since combinatorial fusioncomes from those fields The idea of mapping various related variables into acomparable set ready for combining them has applications in many social sci-ences Social scientists doing exploratory data analysis where lots of variablesare potentially relevant should find these R tools useful

7] Foster and Kecojevicrsquos chapter is of wide interest because they extendanalysis of covariance (ANCOVA) a very general statistical tool After allmany scientists want to know whether certain factors have an effect on acontinuous outcome variable while removing the variance associated withsome covariates Another common problem addressed here occurs when ob-servations with the big residuals need to be down weighted The wonderful

x Preface

(highly sophisticated) R graphics and growth charts for Saudi children arealso of interest in their own right

8] Imai Keele Tingley and Yamamotorsquos chapter addresses an age-old sci-entific problem of assessing the direction and strength of causation amongvariables It is remarkably powerful because of flexibility and variety in thetypes of causendasheffect relations admitted The authors explain the use of theirR package called ldquomediationrdquo Their ldquojob searchrdquo application has wider re-search interest providing room for extensions since the unemployment prob-lem can be viewed from the distinct viewpoint of several social sciences

9] Haupt Schnurbus and Tschernigrsquos chapter focuses on the problem ofthe choice of functional form for an unknown potentially nonlinear relation-ship This discussion goes beyond [4] where I discuss the related problemof choosing production functions This chapter describes how to use flexiblenonparametric kernel regression models for model validation of parametricspecifications including misspecification testing and prediction simulationsThe ldquorelaxrdquo package is shown to be useful for model visualization and valida-tion

10] Rindskopfrsquos chapter shows how to use R to fit a multinomial model thatis parallel to the usual multivariate analysis of variance (MANOVA) A widevariety of categorical data models can be fit using this program includingthe usual test of independence logit and loglinear models logistic regressionquasi-independence models models with equality restrictions on parametersand a variety of nonstandard loglinear models An advantage of this programover some loglinear models is the clear separation of independent (predictor)and dependent (outcome) variables Examples are presented from psychologysociology political science and medicine

11] Neathrsquos chapter dealing with location of hazardous waste sites alsodeals with a topic of wide general interest in policy circles More importantBayesian analysis of posterior distributions used to be less graphical and lessintuitive before the recent availability of R packages Neath shows how touse ldquoWinBUGSrdquo and ldquoR2WinBUGSrdquo interfaces in the context of an exampleproviding a template for applications in other fields

12] Numatsi and Rengiforsquos chapter dealing with financial market volatilityis of interest to anyone with an investment portfolio subject to financial crisesIt studies time series models using FTSE 100 daily returns Many fields havetime series with persistent discrete jumps misspecification and other issueswhich are solved by the authorsrsquo R software

My short descriptions of twelve chapters explain why I am enthusiasticabout including them in this book They are my possibly inadequate salespitches encouraging all readers to study the original chapters in detail Theauthors of papers presented at the conference were given a short deadlineof a couple of months for submission of their papers as a Latex documentsuitable for Springer publications We are including only the ones who metthe deadline and whose referees recommended their publication

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 4: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

To My Daughter RitaSon-in-Law Dale Knauseand My Grandchildren

Devin Vinod Knause and Troy Vinod Knause

Preface

Quantitative social science research has been expanding due to the avail-ability of computers and data over the past few decades Yet the textbooksand supplements for researchers do not adequately highlight the revolutioncreated by the R software [2] and graphics system R is fast becoming the lin-gua franca of quantitative research with some 2000 free specialized packageswhere the latest versions can be downloaded in seconds Many packages suchas ldquocarrdquo [1] developed by social scientists are popular among all scientists

An early 2009 article [3] in the New York Times notes that statisticiansengineers and scientists without computer programming skills find Rldquoeasy touserdquo A common language R can readily promote deeper mutual respect andunderstanding of unique problems facing quantitative work in various socialsciences Often the solutions developed in one field can be extended and usedin many fields This book promotes just such exchange of ideas across manysocial sciences Since Springer has played a leadership role in promoting Rwe are fortunate to have Springer publish this book

A Conference on Quantitative Social Science Research Using R was heldin New York City at the Lincoln Center campus of Fordham University June18ndash19 2009 This book contains selected papers presented at the conferencerepresenting the ldquoProceedingsrdquo of the conference

The conference offered an opportunity for enthusiastic users of R to discusstheir research ideas and some policy applications in social sciences to learnmeet and mingle Speakers included authors of books andor packages forR published researchers and editors of important journals in statistics andsocial sciences The keynote speaker was Roger Koenker Professor of Eco-nomics University of Illinois and there was a distinguished panel of invitedspeakers Prof Andrew Gelman Director of the Applied Statistics Center atColumbia University Prof Kosuke Imai Department of Politics at Prince-ton University Prof Keith A Markus Psychology Department at John JayCollege of Criminal Justice Prof B D McCullough Department of DecisionSciences Drexel University and Prof Achim Zeileis Department of Statisticsand Mathematics WU Wirtschaftsuniversitat Wien (Vienna)

vii

viii Preface

We had a truly interdisciplinary conference organizing committee of Ford-ham professors Co-chair Frank Hsu (Computer and Information Sciences)and H D Vinod (Economics) Jose Aleman treasurer (Political Science) Re-maining Members (alphabetical order) were Elena Filatova (Computer andInformation Sciences) Se-Kang Kim (Psychology) Erick W Rengifo (Eco-nomics) Emily Rosenbaum (Sociology) Matthew Weinshenker (Sociology)and Tiffany Yip (Psychology)

Fordham University was the main sponsor of the conference providing allthe facilities guidance and staff help It would not have been possible withoutthe active support from the deans of the Graduate School of Arts and Sciences(GSAS Dean Dr Nancy Busch) Fordham College at Rose Hill (FCRH DeanDr Brennan OrsquoDonnell) and College of Business Administration (CBA DeanDr Donna Rapaccioli) Another active supporter Dr Robert HimmelbergDean of Arts amp Sciences Faculty inaugurated the conference Cosponsors werethe Society of Indian Academics in America (SIAA) and Global Associationof Risk Professionals (GARP) The modest registration fee paid for the coffeebreaks and a cocktail reception with snacks on the evening of Thursday June18 2009

Although seating was limited and a registration fee was charged seventy-five people registered and attended the conference A limited number of grad-uate and undergraduate students from various disciplines attended the con-ference Fordham University faculty and deans encouraged their students totake advantage of this opportunity to learn R Students could learn someinnovative and practical research tools providing intellectual skills havingpotentially lifelong benefits The conference provided graduate students andyoung faculty in all disciplines a great chance to see talk to and seek ad-vice from researchers using R The conference program and details remainavailable at the conference website httpwwwcisfordhameduQR2009

Most attendees thought that the conference was a great success The re-search work presented was of high quality covering a wide range of importanttopics along with important examples and R implementations This book ofldquoProceedingsrdquo extends the benefits of attending the conference to the muchwider audience of academics and professionals in various sciences around theworld For example one of our cosponsors (GARP mentioned above) encour-aged risk professionals who model risk and uncertainty in financial marketsand who work in the metropolitan New York area to attend the conference

A Brief Review of Each Chapter

Since all chapters come with their own abstracts one may wonder why Iam including this section Note that our conference was interdisciplinary andindividual authors have generally focused their abstracts to be appealing toreaders from their own specific disciplines I am including this section here

Preface ix

because I believe that chosen papers are of interest to a much wider range ofreaders from other fields and also to those interested in the R software perse Consistent with an aim of the conference the following summaries areintended to promote greater cross fertilization of ideas among all scientists

1] The chapter by McCullough shows the importance of numerically accu-rate computing indicating the relevance of forward and backward errors andcondition numbers in all sciences He shows that R software is accurate usingsome classic examples A general reader might be interested in McCulloughrsquosreasons for questioning the reliability of Donohue and Levittrsquos controversialpaper These authors speculated that a disproportionate number of aborted(unwanted) children would have grown up to become criminals and claimedto show that abortion reduces crime after a gap of 20 years

2] Koenkerrsquos chapter should be of interest to anyone seeking state-of-the-art flexible functional forms for regression fitting The ldquoadditiverdquo models seeka clever compromise between parametric and nonparametric componentsSome commonly made data ldquoadjustmentsrdquo are shown to be misleading andavoidable The R package ldquoquantregrdquo implementation for modeling childhoodmalnutrition in India also should be of interest to policymakers For examplemothers who are employed and wealthier are found to have taller children

3] Gelmanrsquos chapter is of practical interest to all of us who use R forgraphics and his example of US voting participation rates is of interest toall who care about democracy and responsible citizenry

4] My chapter suggests new solutions to a rather old and common problemof efficient estimation despite autocorrelation and heteroscedasticity amongregression errors Since regression is a ubiquitous tool in all sciences it shouldbe of much wider interest

5] Markus and Gursquos chapter is of great interest in exploratory data analysisin any scientific field Given any data set one often needs to develop initialunderstanding of the nature of relationships between three-way continuousvariables It is enhanced by comparing their new ldquobubble plotsrdquo convenientlymade available as an R script called ldquobp3way()rdquo at the conference websitehttpwwwcisfordhameduQR2009

6] Vinod Hsu and Tianrsquos chapter deals with the portfolio selection prob-lem of interest to anyone with an investment portfolio It is also of interestto computer scientists and data-mining experts since combinatorial fusioncomes from those fields The idea of mapping various related variables into acomparable set ready for combining them has applications in many social sci-ences Social scientists doing exploratory data analysis where lots of variablesare potentially relevant should find these R tools useful

7] Foster and Kecojevicrsquos chapter is of wide interest because they extendanalysis of covariance (ANCOVA) a very general statistical tool After allmany scientists want to know whether certain factors have an effect on acontinuous outcome variable while removing the variance associated withsome covariates Another common problem addressed here occurs when ob-servations with the big residuals need to be down weighted The wonderful

x Preface

(highly sophisticated) R graphics and growth charts for Saudi children arealso of interest in their own right

8] Imai Keele Tingley and Yamamotorsquos chapter addresses an age-old sci-entific problem of assessing the direction and strength of causation amongvariables It is remarkably powerful because of flexibility and variety in thetypes of causendasheffect relations admitted The authors explain the use of theirR package called ldquomediationrdquo Their ldquojob searchrdquo application has wider re-search interest providing room for extensions since the unemployment prob-lem can be viewed from the distinct viewpoint of several social sciences

9] Haupt Schnurbus and Tschernigrsquos chapter focuses on the problem ofthe choice of functional form for an unknown potentially nonlinear relation-ship This discussion goes beyond [4] where I discuss the related problemof choosing production functions This chapter describes how to use flexiblenonparametric kernel regression models for model validation of parametricspecifications including misspecification testing and prediction simulationsThe ldquorelaxrdquo package is shown to be useful for model visualization and valida-tion

10] Rindskopfrsquos chapter shows how to use R to fit a multinomial model thatis parallel to the usual multivariate analysis of variance (MANOVA) A widevariety of categorical data models can be fit using this program includingthe usual test of independence logit and loglinear models logistic regressionquasi-independence models models with equality restrictions on parametersand a variety of nonstandard loglinear models An advantage of this programover some loglinear models is the clear separation of independent (predictor)and dependent (outcome) variables Examples are presented from psychologysociology political science and medicine

11] Neathrsquos chapter dealing with location of hazardous waste sites alsodeals with a topic of wide general interest in policy circles More importantBayesian analysis of posterior distributions used to be less graphical and lessintuitive before the recent availability of R packages Neath shows how touse ldquoWinBUGSrdquo and ldquoR2WinBUGSrdquo interfaces in the context of an exampleproviding a template for applications in other fields

12] Numatsi and Rengiforsquos chapter dealing with financial market volatilityis of interest to anyone with an investment portfolio subject to financial crisesIt studies time series models using FTSE 100 daily returns Many fields havetime series with persistent discrete jumps misspecification and other issueswhich are solved by the authorsrsquo R software

My short descriptions of twelve chapters explain why I am enthusiasticabout including them in this book They are my possibly inadequate salespitches encouraging all readers to study the original chapters in detail Theauthors of papers presented at the conference were given a short deadlineof a couple of months for submission of their papers as a Latex documentsuitable for Springer publications We are including only the ones who metthe deadline and whose referees recommended their publication

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 5: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Preface

Quantitative social science research has been expanding due to the avail-ability of computers and data over the past few decades Yet the textbooksand supplements for researchers do not adequately highlight the revolutioncreated by the R software [2] and graphics system R is fast becoming the lin-gua franca of quantitative research with some 2000 free specialized packageswhere the latest versions can be downloaded in seconds Many packages suchas ldquocarrdquo [1] developed by social scientists are popular among all scientists

An early 2009 article [3] in the New York Times notes that statisticiansengineers and scientists without computer programming skills find Rldquoeasy touserdquo A common language R can readily promote deeper mutual respect andunderstanding of unique problems facing quantitative work in various socialsciences Often the solutions developed in one field can be extended and usedin many fields This book promotes just such exchange of ideas across manysocial sciences Since Springer has played a leadership role in promoting Rwe are fortunate to have Springer publish this book

A Conference on Quantitative Social Science Research Using R was heldin New York City at the Lincoln Center campus of Fordham University June18ndash19 2009 This book contains selected papers presented at the conferencerepresenting the ldquoProceedingsrdquo of the conference

The conference offered an opportunity for enthusiastic users of R to discusstheir research ideas and some policy applications in social sciences to learnmeet and mingle Speakers included authors of books andor packages forR published researchers and editors of important journals in statistics andsocial sciences The keynote speaker was Roger Koenker Professor of Eco-nomics University of Illinois and there was a distinguished panel of invitedspeakers Prof Andrew Gelman Director of the Applied Statistics Center atColumbia University Prof Kosuke Imai Department of Politics at Prince-ton University Prof Keith A Markus Psychology Department at John JayCollege of Criminal Justice Prof B D McCullough Department of DecisionSciences Drexel University and Prof Achim Zeileis Department of Statisticsand Mathematics WU Wirtschaftsuniversitat Wien (Vienna)

vii

viii Preface

We had a truly interdisciplinary conference organizing committee of Ford-ham professors Co-chair Frank Hsu (Computer and Information Sciences)and H D Vinod (Economics) Jose Aleman treasurer (Political Science) Re-maining Members (alphabetical order) were Elena Filatova (Computer andInformation Sciences) Se-Kang Kim (Psychology) Erick W Rengifo (Eco-nomics) Emily Rosenbaum (Sociology) Matthew Weinshenker (Sociology)and Tiffany Yip (Psychology)

Fordham University was the main sponsor of the conference providing allthe facilities guidance and staff help It would not have been possible withoutthe active support from the deans of the Graduate School of Arts and Sciences(GSAS Dean Dr Nancy Busch) Fordham College at Rose Hill (FCRH DeanDr Brennan OrsquoDonnell) and College of Business Administration (CBA DeanDr Donna Rapaccioli) Another active supporter Dr Robert HimmelbergDean of Arts amp Sciences Faculty inaugurated the conference Cosponsors werethe Society of Indian Academics in America (SIAA) and Global Associationof Risk Professionals (GARP) The modest registration fee paid for the coffeebreaks and a cocktail reception with snacks on the evening of Thursday June18 2009

Although seating was limited and a registration fee was charged seventy-five people registered and attended the conference A limited number of grad-uate and undergraduate students from various disciplines attended the con-ference Fordham University faculty and deans encouraged their students totake advantage of this opportunity to learn R Students could learn someinnovative and practical research tools providing intellectual skills havingpotentially lifelong benefits The conference provided graduate students andyoung faculty in all disciplines a great chance to see talk to and seek ad-vice from researchers using R The conference program and details remainavailable at the conference website httpwwwcisfordhameduQR2009

Most attendees thought that the conference was a great success The re-search work presented was of high quality covering a wide range of importanttopics along with important examples and R implementations This book ofldquoProceedingsrdquo extends the benefits of attending the conference to the muchwider audience of academics and professionals in various sciences around theworld For example one of our cosponsors (GARP mentioned above) encour-aged risk professionals who model risk and uncertainty in financial marketsand who work in the metropolitan New York area to attend the conference

A Brief Review of Each Chapter

Since all chapters come with their own abstracts one may wonder why Iam including this section Note that our conference was interdisciplinary andindividual authors have generally focused their abstracts to be appealing toreaders from their own specific disciplines I am including this section here

Preface ix

because I believe that chosen papers are of interest to a much wider range ofreaders from other fields and also to those interested in the R software perse Consistent with an aim of the conference the following summaries areintended to promote greater cross fertilization of ideas among all scientists

1] The chapter by McCullough shows the importance of numerically accu-rate computing indicating the relevance of forward and backward errors andcondition numbers in all sciences He shows that R software is accurate usingsome classic examples A general reader might be interested in McCulloughrsquosreasons for questioning the reliability of Donohue and Levittrsquos controversialpaper These authors speculated that a disproportionate number of aborted(unwanted) children would have grown up to become criminals and claimedto show that abortion reduces crime after a gap of 20 years

2] Koenkerrsquos chapter should be of interest to anyone seeking state-of-the-art flexible functional forms for regression fitting The ldquoadditiverdquo models seeka clever compromise between parametric and nonparametric componentsSome commonly made data ldquoadjustmentsrdquo are shown to be misleading andavoidable The R package ldquoquantregrdquo implementation for modeling childhoodmalnutrition in India also should be of interest to policymakers For examplemothers who are employed and wealthier are found to have taller children

3] Gelmanrsquos chapter is of practical interest to all of us who use R forgraphics and his example of US voting participation rates is of interest toall who care about democracy and responsible citizenry

4] My chapter suggests new solutions to a rather old and common problemof efficient estimation despite autocorrelation and heteroscedasticity amongregression errors Since regression is a ubiquitous tool in all sciences it shouldbe of much wider interest

5] Markus and Gursquos chapter is of great interest in exploratory data analysisin any scientific field Given any data set one often needs to develop initialunderstanding of the nature of relationships between three-way continuousvariables It is enhanced by comparing their new ldquobubble plotsrdquo convenientlymade available as an R script called ldquobp3way()rdquo at the conference websitehttpwwwcisfordhameduQR2009

6] Vinod Hsu and Tianrsquos chapter deals with the portfolio selection prob-lem of interest to anyone with an investment portfolio It is also of interestto computer scientists and data-mining experts since combinatorial fusioncomes from those fields The idea of mapping various related variables into acomparable set ready for combining them has applications in many social sci-ences Social scientists doing exploratory data analysis where lots of variablesare potentially relevant should find these R tools useful

7] Foster and Kecojevicrsquos chapter is of wide interest because they extendanalysis of covariance (ANCOVA) a very general statistical tool After allmany scientists want to know whether certain factors have an effect on acontinuous outcome variable while removing the variance associated withsome covariates Another common problem addressed here occurs when ob-servations with the big residuals need to be down weighted The wonderful

x Preface

(highly sophisticated) R graphics and growth charts for Saudi children arealso of interest in their own right

8] Imai Keele Tingley and Yamamotorsquos chapter addresses an age-old sci-entific problem of assessing the direction and strength of causation amongvariables It is remarkably powerful because of flexibility and variety in thetypes of causendasheffect relations admitted The authors explain the use of theirR package called ldquomediationrdquo Their ldquojob searchrdquo application has wider re-search interest providing room for extensions since the unemployment prob-lem can be viewed from the distinct viewpoint of several social sciences

9] Haupt Schnurbus and Tschernigrsquos chapter focuses on the problem ofthe choice of functional form for an unknown potentially nonlinear relation-ship This discussion goes beyond [4] where I discuss the related problemof choosing production functions This chapter describes how to use flexiblenonparametric kernel regression models for model validation of parametricspecifications including misspecification testing and prediction simulationsThe ldquorelaxrdquo package is shown to be useful for model visualization and valida-tion

10] Rindskopfrsquos chapter shows how to use R to fit a multinomial model thatis parallel to the usual multivariate analysis of variance (MANOVA) A widevariety of categorical data models can be fit using this program includingthe usual test of independence logit and loglinear models logistic regressionquasi-independence models models with equality restrictions on parametersand a variety of nonstandard loglinear models An advantage of this programover some loglinear models is the clear separation of independent (predictor)and dependent (outcome) variables Examples are presented from psychologysociology political science and medicine

11] Neathrsquos chapter dealing with location of hazardous waste sites alsodeals with a topic of wide general interest in policy circles More importantBayesian analysis of posterior distributions used to be less graphical and lessintuitive before the recent availability of R packages Neath shows how touse ldquoWinBUGSrdquo and ldquoR2WinBUGSrdquo interfaces in the context of an exampleproviding a template for applications in other fields

12] Numatsi and Rengiforsquos chapter dealing with financial market volatilityis of interest to anyone with an investment portfolio subject to financial crisesIt studies time series models using FTSE 100 daily returns Many fields havetime series with persistent discrete jumps misspecification and other issueswhich are solved by the authorsrsquo R software

My short descriptions of twelve chapters explain why I am enthusiasticabout including them in this book They are my possibly inadequate salespitches encouraging all readers to study the original chapters in detail Theauthors of papers presented at the conference were given a short deadlineof a couple of months for submission of their papers as a Latex documentsuitable for Springer publications We are including only the ones who metthe deadline and whose referees recommended their publication

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 6: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

viii Preface

We had a truly interdisciplinary conference organizing committee of Ford-ham professors Co-chair Frank Hsu (Computer and Information Sciences)and H D Vinod (Economics) Jose Aleman treasurer (Political Science) Re-maining Members (alphabetical order) were Elena Filatova (Computer andInformation Sciences) Se-Kang Kim (Psychology) Erick W Rengifo (Eco-nomics) Emily Rosenbaum (Sociology) Matthew Weinshenker (Sociology)and Tiffany Yip (Psychology)

Fordham University was the main sponsor of the conference providing allthe facilities guidance and staff help It would not have been possible withoutthe active support from the deans of the Graduate School of Arts and Sciences(GSAS Dean Dr Nancy Busch) Fordham College at Rose Hill (FCRH DeanDr Brennan OrsquoDonnell) and College of Business Administration (CBA DeanDr Donna Rapaccioli) Another active supporter Dr Robert HimmelbergDean of Arts amp Sciences Faculty inaugurated the conference Cosponsors werethe Society of Indian Academics in America (SIAA) and Global Associationof Risk Professionals (GARP) The modest registration fee paid for the coffeebreaks and a cocktail reception with snacks on the evening of Thursday June18 2009

Although seating was limited and a registration fee was charged seventy-five people registered and attended the conference A limited number of grad-uate and undergraduate students from various disciplines attended the con-ference Fordham University faculty and deans encouraged their students totake advantage of this opportunity to learn R Students could learn someinnovative and practical research tools providing intellectual skills havingpotentially lifelong benefits The conference provided graduate students andyoung faculty in all disciplines a great chance to see talk to and seek ad-vice from researchers using R The conference program and details remainavailable at the conference website httpwwwcisfordhameduQR2009

Most attendees thought that the conference was a great success The re-search work presented was of high quality covering a wide range of importanttopics along with important examples and R implementations This book ofldquoProceedingsrdquo extends the benefits of attending the conference to the muchwider audience of academics and professionals in various sciences around theworld For example one of our cosponsors (GARP mentioned above) encour-aged risk professionals who model risk and uncertainty in financial marketsand who work in the metropolitan New York area to attend the conference

A Brief Review of Each Chapter

Since all chapters come with their own abstracts one may wonder why Iam including this section Note that our conference was interdisciplinary andindividual authors have generally focused their abstracts to be appealing toreaders from their own specific disciplines I am including this section here

Preface ix

because I believe that chosen papers are of interest to a much wider range ofreaders from other fields and also to those interested in the R software perse Consistent with an aim of the conference the following summaries areintended to promote greater cross fertilization of ideas among all scientists

1] The chapter by McCullough shows the importance of numerically accu-rate computing indicating the relevance of forward and backward errors andcondition numbers in all sciences He shows that R software is accurate usingsome classic examples A general reader might be interested in McCulloughrsquosreasons for questioning the reliability of Donohue and Levittrsquos controversialpaper These authors speculated that a disproportionate number of aborted(unwanted) children would have grown up to become criminals and claimedto show that abortion reduces crime after a gap of 20 years

2] Koenkerrsquos chapter should be of interest to anyone seeking state-of-the-art flexible functional forms for regression fitting The ldquoadditiverdquo models seeka clever compromise between parametric and nonparametric componentsSome commonly made data ldquoadjustmentsrdquo are shown to be misleading andavoidable The R package ldquoquantregrdquo implementation for modeling childhoodmalnutrition in India also should be of interest to policymakers For examplemothers who are employed and wealthier are found to have taller children

3] Gelmanrsquos chapter is of practical interest to all of us who use R forgraphics and his example of US voting participation rates is of interest toall who care about democracy and responsible citizenry

4] My chapter suggests new solutions to a rather old and common problemof efficient estimation despite autocorrelation and heteroscedasticity amongregression errors Since regression is a ubiquitous tool in all sciences it shouldbe of much wider interest

5] Markus and Gursquos chapter is of great interest in exploratory data analysisin any scientific field Given any data set one often needs to develop initialunderstanding of the nature of relationships between three-way continuousvariables It is enhanced by comparing their new ldquobubble plotsrdquo convenientlymade available as an R script called ldquobp3way()rdquo at the conference websitehttpwwwcisfordhameduQR2009

6] Vinod Hsu and Tianrsquos chapter deals with the portfolio selection prob-lem of interest to anyone with an investment portfolio It is also of interestto computer scientists and data-mining experts since combinatorial fusioncomes from those fields The idea of mapping various related variables into acomparable set ready for combining them has applications in many social sci-ences Social scientists doing exploratory data analysis where lots of variablesare potentially relevant should find these R tools useful

7] Foster and Kecojevicrsquos chapter is of wide interest because they extendanalysis of covariance (ANCOVA) a very general statistical tool After allmany scientists want to know whether certain factors have an effect on acontinuous outcome variable while removing the variance associated withsome covariates Another common problem addressed here occurs when ob-servations with the big residuals need to be down weighted The wonderful

x Preface

(highly sophisticated) R graphics and growth charts for Saudi children arealso of interest in their own right

8] Imai Keele Tingley and Yamamotorsquos chapter addresses an age-old sci-entific problem of assessing the direction and strength of causation amongvariables It is remarkably powerful because of flexibility and variety in thetypes of causendasheffect relations admitted The authors explain the use of theirR package called ldquomediationrdquo Their ldquojob searchrdquo application has wider re-search interest providing room for extensions since the unemployment prob-lem can be viewed from the distinct viewpoint of several social sciences

9] Haupt Schnurbus and Tschernigrsquos chapter focuses on the problem ofthe choice of functional form for an unknown potentially nonlinear relation-ship This discussion goes beyond [4] where I discuss the related problemof choosing production functions This chapter describes how to use flexiblenonparametric kernel regression models for model validation of parametricspecifications including misspecification testing and prediction simulationsThe ldquorelaxrdquo package is shown to be useful for model visualization and valida-tion

10] Rindskopfrsquos chapter shows how to use R to fit a multinomial model thatis parallel to the usual multivariate analysis of variance (MANOVA) A widevariety of categorical data models can be fit using this program includingthe usual test of independence logit and loglinear models logistic regressionquasi-independence models models with equality restrictions on parametersand a variety of nonstandard loglinear models An advantage of this programover some loglinear models is the clear separation of independent (predictor)and dependent (outcome) variables Examples are presented from psychologysociology political science and medicine

11] Neathrsquos chapter dealing with location of hazardous waste sites alsodeals with a topic of wide general interest in policy circles More importantBayesian analysis of posterior distributions used to be less graphical and lessintuitive before the recent availability of R packages Neath shows how touse ldquoWinBUGSrdquo and ldquoR2WinBUGSrdquo interfaces in the context of an exampleproviding a template for applications in other fields

12] Numatsi and Rengiforsquos chapter dealing with financial market volatilityis of interest to anyone with an investment portfolio subject to financial crisesIt studies time series models using FTSE 100 daily returns Many fields havetime series with persistent discrete jumps misspecification and other issueswhich are solved by the authorsrsquo R software

My short descriptions of twelve chapters explain why I am enthusiasticabout including them in this book They are my possibly inadequate salespitches encouraging all readers to study the original chapters in detail Theauthors of papers presented at the conference were given a short deadlineof a couple of months for submission of their papers as a Latex documentsuitable for Springer publications We are including only the ones who metthe deadline and whose referees recommended their publication

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 7: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Preface ix

because I believe that chosen papers are of interest to a much wider range ofreaders from other fields and also to those interested in the R software perse Consistent with an aim of the conference the following summaries areintended to promote greater cross fertilization of ideas among all scientists

1] The chapter by McCullough shows the importance of numerically accu-rate computing indicating the relevance of forward and backward errors andcondition numbers in all sciences He shows that R software is accurate usingsome classic examples A general reader might be interested in McCulloughrsquosreasons for questioning the reliability of Donohue and Levittrsquos controversialpaper These authors speculated that a disproportionate number of aborted(unwanted) children would have grown up to become criminals and claimedto show that abortion reduces crime after a gap of 20 years

2] Koenkerrsquos chapter should be of interest to anyone seeking state-of-the-art flexible functional forms for regression fitting The ldquoadditiverdquo models seeka clever compromise between parametric and nonparametric componentsSome commonly made data ldquoadjustmentsrdquo are shown to be misleading andavoidable The R package ldquoquantregrdquo implementation for modeling childhoodmalnutrition in India also should be of interest to policymakers For examplemothers who are employed and wealthier are found to have taller children

3] Gelmanrsquos chapter is of practical interest to all of us who use R forgraphics and his example of US voting participation rates is of interest toall who care about democracy and responsible citizenry

4] My chapter suggests new solutions to a rather old and common problemof efficient estimation despite autocorrelation and heteroscedasticity amongregression errors Since regression is a ubiquitous tool in all sciences it shouldbe of much wider interest

5] Markus and Gursquos chapter is of great interest in exploratory data analysisin any scientific field Given any data set one often needs to develop initialunderstanding of the nature of relationships between three-way continuousvariables It is enhanced by comparing their new ldquobubble plotsrdquo convenientlymade available as an R script called ldquobp3way()rdquo at the conference websitehttpwwwcisfordhameduQR2009

6] Vinod Hsu and Tianrsquos chapter deals with the portfolio selection prob-lem of interest to anyone with an investment portfolio It is also of interestto computer scientists and data-mining experts since combinatorial fusioncomes from those fields The idea of mapping various related variables into acomparable set ready for combining them has applications in many social sci-ences Social scientists doing exploratory data analysis where lots of variablesare potentially relevant should find these R tools useful

7] Foster and Kecojevicrsquos chapter is of wide interest because they extendanalysis of covariance (ANCOVA) a very general statistical tool After allmany scientists want to know whether certain factors have an effect on acontinuous outcome variable while removing the variance associated withsome covariates Another common problem addressed here occurs when ob-servations with the big residuals need to be down weighted The wonderful

x Preface

(highly sophisticated) R graphics and growth charts for Saudi children arealso of interest in their own right

8] Imai Keele Tingley and Yamamotorsquos chapter addresses an age-old sci-entific problem of assessing the direction and strength of causation amongvariables It is remarkably powerful because of flexibility and variety in thetypes of causendasheffect relations admitted The authors explain the use of theirR package called ldquomediationrdquo Their ldquojob searchrdquo application has wider re-search interest providing room for extensions since the unemployment prob-lem can be viewed from the distinct viewpoint of several social sciences

9] Haupt Schnurbus and Tschernigrsquos chapter focuses on the problem ofthe choice of functional form for an unknown potentially nonlinear relation-ship This discussion goes beyond [4] where I discuss the related problemof choosing production functions This chapter describes how to use flexiblenonparametric kernel regression models for model validation of parametricspecifications including misspecification testing and prediction simulationsThe ldquorelaxrdquo package is shown to be useful for model visualization and valida-tion

10] Rindskopfrsquos chapter shows how to use R to fit a multinomial model thatis parallel to the usual multivariate analysis of variance (MANOVA) A widevariety of categorical data models can be fit using this program includingthe usual test of independence logit and loglinear models logistic regressionquasi-independence models models with equality restrictions on parametersand a variety of nonstandard loglinear models An advantage of this programover some loglinear models is the clear separation of independent (predictor)and dependent (outcome) variables Examples are presented from psychologysociology political science and medicine

11] Neathrsquos chapter dealing with location of hazardous waste sites alsodeals with a topic of wide general interest in policy circles More importantBayesian analysis of posterior distributions used to be less graphical and lessintuitive before the recent availability of R packages Neath shows how touse ldquoWinBUGSrdquo and ldquoR2WinBUGSrdquo interfaces in the context of an exampleproviding a template for applications in other fields

12] Numatsi and Rengiforsquos chapter dealing with financial market volatilityis of interest to anyone with an investment portfolio subject to financial crisesIt studies time series models using FTSE 100 daily returns Many fields havetime series with persistent discrete jumps misspecification and other issueswhich are solved by the authorsrsquo R software

My short descriptions of twelve chapters explain why I am enthusiasticabout including them in this book They are my possibly inadequate salespitches encouraging all readers to study the original chapters in detail Theauthors of papers presented at the conference were given a short deadlineof a couple of months for submission of their papers as a Latex documentsuitable for Springer publications We are including only the ones who metthe deadline and whose referees recommended their publication

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 8: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

x Preface

(highly sophisticated) R graphics and growth charts for Saudi children arealso of interest in their own right

8] Imai Keele Tingley and Yamamotorsquos chapter addresses an age-old sci-entific problem of assessing the direction and strength of causation amongvariables It is remarkably powerful because of flexibility and variety in thetypes of causendasheffect relations admitted The authors explain the use of theirR package called ldquomediationrdquo Their ldquojob searchrdquo application has wider re-search interest providing room for extensions since the unemployment prob-lem can be viewed from the distinct viewpoint of several social sciences

9] Haupt Schnurbus and Tschernigrsquos chapter focuses on the problem ofthe choice of functional form for an unknown potentially nonlinear relation-ship This discussion goes beyond [4] where I discuss the related problemof choosing production functions This chapter describes how to use flexiblenonparametric kernel regression models for model validation of parametricspecifications including misspecification testing and prediction simulationsThe ldquorelaxrdquo package is shown to be useful for model visualization and valida-tion

10] Rindskopfrsquos chapter shows how to use R to fit a multinomial model thatis parallel to the usual multivariate analysis of variance (MANOVA) A widevariety of categorical data models can be fit using this program includingthe usual test of independence logit and loglinear models logistic regressionquasi-independence models models with equality restrictions on parametersand a variety of nonstandard loglinear models An advantage of this programover some loglinear models is the clear separation of independent (predictor)and dependent (outcome) variables Examples are presented from psychologysociology political science and medicine

11] Neathrsquos chapter dealing with location of hazardous waste sites alsodeals with a topic of wide general interest in policy circles More importantBayesian analysis of posterior distributions used to be less graphical and lessintuitive before the recent availability of R packages Neath shows how touse ldquoWinBUGSrdquo and ldquoR2WinBUGSrdquo interfaces in the context of an exampleproviding a template for applications in other fields

12] Numatsi and Rengiforsquos chapter dealing with financial market volatilityis of interest to anyone with an investment portfolio subject to financial crisesIt studies time series models using FTSE 100 daily returns Many fields havetime series with persistent discrete jumps misspecification and other issueswhich are solved by the authorsrsquo R software

My short descriptions of twelve chapters explain why I am enthusiasticabout including them in this book They are my possibly inadequate salespitches encouraging all readers to study the original chapters in detail Theauthors of papers presented at the conference were given a short deadlineof a couple of months for submission of their papers as a Latex documentsuitable for Springer publications We are including only the ones who metthe deadline and whose referees recommended their publication

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 9: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Preface xi

The original developers of the S computer language at Bell Labs (on whichR is based) had insisted that the assignment operator must have two strokeldquolt-rdquo When I worked at Bell Labs I could not convince them to permitthe use of one stroke ldquo=rdquo used by Fortran and other languages as a moreconvenient option for lazy typists I am happy to see that unlike S the Rteam does indeed permit the ldquo=rdquo symbol for assignment Similarly R givesme the option to start my R session with the command

options(prompt = continue = )

This has the advantage that lazy typists like me can directly copy and pasteselected lines of the output from R into my R input command stream With-out the lsquooptionsrsquo command the current prompt ldquogtrdquo is confused by R withldquogreater thanrdquo and the current continuation symbol ldquo+rdquo is confused withaddition

We enthusiastic users of R are grateful to the R Foundation for Statisti-cal Computing Vienna Austria [2] for continually maintaining and improv-ing R The R Foundation has nurtured a wonderfully flexible research toolmaking it into a great intellectual resource for the benefit of humanity anddeserves our thanks and financial support Please see the separate section onlsquoAcknowledgmentsrsquo listing names of those whose help was crucial in writingthis book including the referees

Tenafly New Jersey October 2009 Hrishikesh Vinod

References

1 Fox J car Companion to Applied Regression R package version 12-14 (2009)URL httpCRANR-projectorgpackage=car

2 R Development Core Team R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria (2009) URLhttpwwwR-projectorg ISBN 3-900051-07-0

3 Vance A Data Analysts Captivated by Rrsquos Power The New York TimesJanuary 6 (2009) URL httpwwwnytimescom20090107technology

business-computing07programhtml

4 Vinod HD Hands-on Intermediate Econometrics Using R Templates for Ex-tending Dozens of Practical Examples World Scientific Publishers HackensackNJ (2008) URL httpwwwworldscibookscomeconomics6895html

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 10: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

xii Preface

Fig 1 Standing left to right Joachim Schnurbus Rebecca Sela Achim ZeileisTatjana Kecojevic David Rindskopf Ronald Neath Adjoa Numatsi and RossenTrendafilov Sitting left to right B D McCullough D Frank Hsu Roger KoenkerH D Vinod Kosuke Imai and Keith Markus

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 11: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Acknowledgments

I am grateful to the Co-chair Dr Frank Hsu of the organizing commit-tee along with all the members whose names are mentioned in the Prefacefor their help with the conference The following deans provided importantencouragement Nancy Busch Brennan OrsquoDonnell Donna Rapaccioli andRobert Himmelberg

My current students Rossen Trendafilov and Brandon Vick gave anldquoIntro-duction to Rrdquotalk which attracted the students My former student ThethachChuaprapaisilp helped greatly with typesetting and Latex issues His helpwas crucial in incorporating the numerous suggestions by Springerrsquos copy ed-itor Professor Elena Filatova helped tirelessly with the conference websiteand its updates Our department secretary Giuseppina Cannariato was a bighelp throughout Mr John Kimmel Executive Editor Statistics at Springerprovided useful advice and encouragement over several months

The quality of the papers depends critically on our distinguished andhard-working authors and referees who provided timely critical commentsand suggestions so the authors could improve the manuscripts Some refereeswere also presenters of papers and some will remain anonymous However Ido want to thank William Anderson (Cornell University) Bruce L Brown(Brigham Young University) Andreas Heinen (Universidad Carlos of Spain)Matthew S Johnson (Columbia University) D M Lyons (Fordham Univer-sity) and Philipp Sibbertsen (Leibniz Universitat Hannover Germany) fortheir refereeing work

xiii

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 12: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Contents

Preface viiA Brief Review of Each Chapter viiiReferences xi

Acknowledgments xiii

1 Econometric Computing with ldquoRrdquo 1B D McCullough11 Introduction 112 The Economics Profession Needs Econometric Computing 3

121 Most Users Do Not Know Econometric Computing 3122 Some Developers Do Not Know Econometric

Computing 4123 Some Textbook Authors Do Not Know

Econometric Computing 413 Econometric Computing Is Important 614 ldquoRrdquo Is the Best Language for Teaching Econometric

Computing 815 The Longley Data and Econometric Computing 1016 Beaton Rubin and Barone Revisit Longley 1217 An Example DonohueLevittrsquos Abortion Paper 1418 Conclusions 19References 19

2 Additive Models for Quantile Regression An Analysis ofRisk Factors for Malnutrition in India 23Roger Koenker21 Additive Models for Quantile Regression 2422 A Model of Childhood Malnutrition in India 25

221 λ -Selection 26222 Confidence Bands and Post-Selection Inference 28

References 32

xv

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 13: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

xvi Contents

3 Toward Better R Defaults for Graphics Example ofVoter Turnouts in US Elections 35Andrew GelmanReferences 38

4 Superior Estimation and Inference AvoidingHeteroscedasticity and Flawed Pivots R-exampleof Inflation Unemployment Trade-Off 39H D Vinod41 Introduction 4042 Heteroscedasticity Efficient (HE) Estimation 4243 A Limited Monte Carlo Simulation

of Efficiency of HE 4944 An Example of Heteroscedasticity Correction 5145 Superior Inference of Deep Parameters

Beyond Efficient Estimation 5746 Summary and Final Remarks 58Appendix 59References 62

5 Bubble Plots as a Model-Free Graphical Tool forContinuous Variables 65Keith A Markus and Wen Gu51 Introduction 6552 General Principles Bearing on Three-Way Graphs 6653 Graphical Options Ruled Out a Priori 6854 Plausible Graphical Alternatives 7155 The bp3way() Function 74

551 Use and Options of bp3way() Function 75552 Six Key Parameters for Controlling the Graph 75553 Additional Parameters Controlling the Data Plotted 76554 Parameters Controlling the Plotted Bubbles 76555 Parameters Controlling the Grid 77556 The tacit Parameter 77557 The bpdata() Function 77

56 An Empirical Study of Three Graphical Methods 78561 Method 78562 Results 80

57 Discussion 89Appendixes 91References 93

6 Combinatorial Fusion for Improving PortfolioPerformance 95H D Vinod D F Hsu and Y Tian61 Introduction 96

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 14: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Contents xvii

62 Combinatorial Fusion Analysisfor Portfolios 97

63 An Illustrative Example as an Experiment 100631 Description of the Data Set 100632 Description of the Steps in Our R Algorithm 102

References 104

7 Reference Growth Charts for Saudi Arabian Childrenand Adolescents 107P J Foster and T Kecojevic71 Introduction 10872 Outliers 10873 LMS 11374 Smoothing and Evaluation 11775 Averaging 11876 Comparisons Using ANCOVA 122

761 Comparing Geographical Regions 122762 Comparing Males and Females 125

77 Discussion 126References 128

8 Causal Mediation Analysis Using R 129K Imai L Keele D Tingley and T Yamamoto81 Introduction 130

811 Installation and Updating 13082 The Software 131

821 Overview 131822 Estimation of the Causal Mediation Effects 132823 Sensitivity Analysis 134824 Current Limitations 136

83 Examples 138831 Estimation of Causal Mediation Effects 138832 Sensitivity Analysis 147

84 Concluding Remarks 15385 Notes and Acknowledgment 153References 153

9 Statistical Validation of Functional Form in MultipleRegression Using R 155Harry Haupt Joachim Schnurbus and Rolf Tschernig91 Model Validation 15592 Nonparametric Methods for Model Validation 15793 Model Visualization and Validation Using relax 15994 Beauty and the Labor Market Revisited 161References 166

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 15: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

xviii Contents

10 Fitting Multinomial Models in R A Program Based onBockrsquos Multinomial Response Relation Model 167David Rindskopf101 Model 167102 Program Code 169103 How to Use the mqual Function 169104 Example 1 Test of Independence 170

1041 Input 1701042 Output 170

105 Example 2 Effect of Aspirin on Myocardial Infarction (MI) 1711051 Input 1711052 Output from Saturated Model 171

106 Example 3 Race times Gender times Party Affiliation 1721061 Input 1721062 Output 173

107 Nonstandard Loglinear Models 174108 Technical Details of Estimation Procedure 174109 Troubleshooting and Usage Suggestions 176References 177

11 A Bayesian Analysis of Leukemia Incidence Surroundingan Inactive Hazardous Waste Site 179Ronald C Neath111 Introduction 179112 Data Summaries 180113 The Model 180114 Prior Distributions 183115 Analysis 184

1151 Estimated Posteriors 1851152 The Location-Risk Function 1871153 A Simplified Model 188

116 Discussion 189References 190

12 Stochastic Volatility Model with Jumps in Returns andVolatility An R-Package Implementation 191Adjoa Numatsi and Erick W Rengifo121 Introduction 191122 The Stochastic Volatility Model with Jumps in Returns

and Volatility 193123 Empirical Implementation 194

1231 The Data 1941232 The Estimation Method 1941233 The R Program 1971234 The Results 197

124 Conclusion and Future Venues of Research 200

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 16: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Contents xix

References 200

Index 203

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 17: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

List of Contributors

Peter FosterSchool of Mathematics University of Manchester Manchester UKe-mail peterfostermanchesteracuk

Andrew GelmanDepartment of Statistics and Department of Political Science ColumbiaUniversity New York NY 10027 USA e-mail gelmanstatcolumbiaedu

Wen GuPsychology Department John Jay College of Criminal Justice NY 10019USA e-mail wgugccunyedu

Harry HauptCentre for StatisticsDepartment of Economics and Business AdministrationBielefeld University 33501 Bielefeld Germanye-mail hhauptwiwiuni-bielefeldde

D Frank HsuDepartment of Computer amp Information Science Fordham University NewYork NY 10023 USA e-mail hsucisfordhamedu

Kosuke ImaiDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail kimaiprincetonedu

Tatjana KecojevicLancashire Business School University of Central Lancashire Preston UKe-mail TKecojevicuclanacuk

Luke KeeleDepartment of Political Science Ohio State University Columbus OH43210 USA e-mail keele4polisciosuedu

xxi

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 18: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

xxii List of Contributors

Roger KoenkerDepartment of Economics University of Illinois Champaign IL 61820 USAe-mail rkoenkeruiucedu

Keith A MarkusPsychology Department John Jay College of Criminal Justice New YorkNY 10019 USA e-mail kmarkusaolcom

B D McCulloughDepartment of Economics Drexel University Department of DecisionSciences LeBow College of Business Drexel University Philadelphia PA19104 USA e-mail bdmcculloughdrexeledu

Ronald C NeathDepartment of Statistics and CIS Baruch College City University of NewYork New York NY 10010 USA e-mail ronaldneathbaruchcunyedu

Adjoa NumatsiDepartment of Economics Fordham University Bronx NY 10458 USAe-mail numatsifordhamedu

Erick W RengifoDepartment of Economics Fordham University Bronx NY 10458 USAe-mail rengifominafordhamedu

David RindskopfEducational Psychology Program CUNY Graduate Center New York NY10016 USA e-mail drindskopfgccunyedu

Joachim SchnurbusInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germanye-mail joachimschnurbuswiwiuni-regensburgde

Ye TianDepartment of Applied Mathematics and Computational Science Universityof Pennsylvania Philadelphia PA 19104 USAe-mail ytian001hotmailcom

Dustin TingleyDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail dtingleyprincetonedu

Rolf TschernigInstitute of Economics and Econometrics University of Regensburg 93053Regensburg Germany e-mail rolftschernigwiwiuni-regensburgde

Hrishikesh VinodFordham University Bronx NY 10458 USA e-mail vinodfordhamedu

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 19: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

List of Contributors xxiii

Teppei YamamotoDepartment of Politics Princeton University Princeton NJ 08544 USAe-mail tyamamotprincetonedu

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 20: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

Chapter 1

Econometric Computing with ldquoRrdquo

B D McCullough

Abstract We show that the econometrics profession is in dire need of practi-tioners of econometric computing and thatldquoRrdquo is the best choice for teachingeconometricstatistical computing to researchers who are not numerical an-alysts We give examples of econometric computing in R and use ldquoRrdquo torevisit the classic papers by Longley and by Beaton Rubin and Barone Weapply the methods of econometric computing to show that the empirical re-sults of Donohue and Levittrsquos abortion paper are numerically unsound Thisdiscussion should be of interest in other social sciences as well

11 Introduction

Econometric computing is not simply statistical computing for econometricsthere is much overlap between the two disciplines yet there are separateareas too Econometric computing is to a large extent the act of makingeconometric theory operational This stands in sharp contrast to statisticalcomputing which is not at all defined by its users (eg biologist physicistor any of scores of disciplines) Statistics has not much to say about whethersome particular empirical measurement might exhibit a unit root while theeconometric literature has a cottage industry dedicated to unit roots Thesethemes are explored in the recent book by Renfro [26]

Practitioners of statistical computing will not in general concern them-selves with econometric issues such as cointegration or three-stage leastsquares yet economists have need of these procedures and the accuracyand stability of these procedures should be addressed This is the purviewof econometric computing There is a tendency by some to use statistical

B D McCulloughDepartment of Economics Drexel University Philadelphia PA 19104 USAe-mail bdmcculloughdrexeledu

HD Vinod (ed) Advances in Social Science Research Using R 1Lecture Notes in Statistics 196 DOI 101007978-1-4419-1764-5 1copy Springer Science+Business Media LLC 2010

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 21: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

2 B D McCullough

(econometric) computing and computational statistics (econometrics) inter-changeably and this leads to confusion To clarify we quote from Zeileis [33p 2988]

[T]here is a broad spectrum of various possibilities of combining econometricsand computing two terms which are sometimes used to denote different endsof this spectrum are (1) computational econometrics which is mainly aboutmethods that require substantial computations (eg bootstrap or Monte Carlomethods) and (2) econometric computing which is about translating econo-metric ideas into software Of course both approaches are closely related andcannot be clearly separated[]

Implicit in the above definition of econometric computing is that the softwarecreated be correct More conventionally statistical computing is numericalanalysis for statistics while computational statistics is the use of compu-tationally intensive methods (eg simulation bootstrap MCMC etc) toconduct inference We make the same distinction for econometric computingand computational econometrics

Econometric computing has been even less prominent in econometrics thanhas statistical computing in statistics to the detriment of the practice ofeconomics In general economists know nothing of econometric computingEvidence suggests that some developers of econometric software know little ofthe subject too mdash witness how frequently two or more econometric softwarepackages give different answers to the same problem The solution to theproblem is the development of a course in econometric computing to be taughtto graduate students in economics It is the contention of this paper that ldquoRrdquois the only program capable of serving as a vehicle for teaching econometriccomputing

Of course if ldquoRrdquo is to be the solution it must first be demonstrated thata problem exists Hence Section 12 contends that most users and somedevelopers of econometric software are generally unaware of the principles ofeconometric computing Continuing in this vein Section 13 presents someexamples to show that accurate computing matters Section 14 discussesthe reasons that ldquoRrdquo is the only viable language for teaching econometriccomputing Section 15 reexamines Longleyrsquos [13] classic paper Section 16presents Beaton Rubin and Baronersquos [2] reanalysis of Longley Section 17applies the lessons of Beaton Rubin and Barone to the abortion paper byDonohue and Levitt [6] that was recently analyzed by Anderson and Wells [1]from a numerical perspective Our results support those of Anderson andWells the Donohue and Levitt results are numerically unreliable

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])

Page 22: Advances in Social Science Research Using R · Hrishikesh D. Vinod, Ph. D. Department of Economics Fordham University 441 E. Fordham Road Bronx NY 10458 USA ISBN 978-1-4419-1763-8

1 Econometric Computing with ldquoRrdquo 3

12 The Economics Profession Needs EconometricComputing

There is substantial overlap between statistics and econometrics yet there areeconometric methods of which most economists are aware and of which moststatisticians will never hear Similarly for statistical computing and econo-metric computing Econometrics cannot free-ride on statistical computingbecause there are many econometric issues that practitioners of statisticalcomputing will never address

Consider for example Generalized Autoregressive Conditional Heteroscedas-ticity (GARCH) for which Rob Engle won the Nobel Memorial Prize in eco-nomics Hundreds if not thousands of GARCH articles had been written andpublished during a period when no two software packages gave the same an-swer to the same GARCH problem Fiorentini Calzolari and Panattoni [9]produced benchmark-quality code for solving the GARCH problem McCul-lough and Renfro [21] demonstrated that different packages gave differentanswers to the same GARCH problem and used the Fiorentini et al codeto produce a GARCH benchmark Software developers converged on thisbenchmark and subsequently many packages gave the same answer to thesame GARCH problem as demonstrated by Brooks et al [4] Another sucheconometric computing problem is cointegration for which Clive Granger wonthe Nobel Memorial Prize in economics the same year as Engle and whichis discussed in Sect 13

Experts in statistical computing will never address these econometric is-sues only economists will But before economists can address these economet-ric computing issues they need to know the elements of statistical computingIn general they lack this knowledge

121 Most Users Do Not Know EconometricComputing

It stands to reason that researchers who write their own code are morelikely to be acquainted with the fundamentals of computing than researcherswho use canned procedures Yet when examining the code written by theseresearchers it is easy to find examples of computing illiteracy There arearchives of code written by social science researchers in many languagesMATLAB GAUSS and even ldquoRrdquo Examining the code in these archives willreveal numerous instances of computing illiteracy here we mention only three

1 solving for the least squares regression estimator as b = inv(XrsquoX)Xrsquoy (seeMcCullough and Vinod [22 sect25])

2 writing log(norm(x)) instead of lognorm(x) (see McCullough and Vinod [23p 880])