introduction to bayesian additive regression trees for ...observational studies 5:52-70. carnegie...

20
Introduction to Bayesian Additive Regression Trees for Causal Inference NICOLE BOHME CARNEGIE MONTANA STATE UNIVERSITY, [email protected]

Upload: others

Post on 22-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • IntroductiontoBayesianAdditiveRegressionTreesforCausalInferenceNICOLEBOHMECARNEGIEMON TAN A S TAT E U N I V ER S I T Y, N IC O LE .CA R NEG IE@MON TANA . EDU

  • Roadmap◦Whatareadditiveregressiontrees?◦WhyBayesian?◦BARTforcausalinference

  • RegressiontreesBuildingblockofBARTisregressiontrees:◦ Algorithmicpartitionofdataintonon-overlappingsubsets◦Goalistominimizethevarianceintheresponsevariablewithinsubsets◦ Resultingregressionfitisthemeanofeachsubset

  • TreefittingexampleAnexampleoftheconstructionofasingletree.ThedataarepartitionedfirstatX=80,andthenatX=90amongthoseobservationswithX80.Thefitforthetreeisthemeanofobservationsthatfallineachterminalnode(shaded ingrey),andisshownashorizontallinesegments onthescatterplot.(fromCarnegie&Wu2020)

  • Why*Additive* regressiontrees?◦ Asingleregressiontreewillover-emphasizeinteractionsbetweenvariablesandhavedifficultyfindinglinearrelationships.

  • Why*Additive* regressiontrees?◦ Asingleregressiontreewillover-emphasizeinteractionsbetweenvariablesandhavedifficultyfindinglinearrelationships.◦ Alternative:fitmanysmalltreesusingaback-fittingalgorithm.◦ Fitasmalltree◦ Getthefittedvaluesfromthattree◦ Subtractfittedvaluesfromtheobservedvaluesoftheresponse◦ Fitanothersmalltree

    ◦ Repeatuntilsomenumberofsmalltreeshavebeenfit.

  • Caveat:OverfittingWithoutsomemechanismtocontrolit,anytree-basedmodelcaneasilyover-fitthedata◦ Boostedregressiontreeslimittreedepthanduseshrinkagebymultiplyingthefitofeachtreebysomesmallconstant,chosenbycross-validation

    ◦ Asingletreemodelmightusecross-validationto“prune”branchesinthedecisiontreethatarenotrobusttoremovalofafewobservationsfromthedata

    ◦ BARTsets intelligentpriorsforthedepthofeachtreeandtheshrinkagefactor

  • Advantage:ComplexresponsesurfacesFromHill(2011):Leftpanelisasinglebinarytreefittothedata,rightpanelshowstruereponse curve,single-treefitandBARTfit.

  • TheBARTmodelBARTsum-of-treesmodel:Y=g(z,x;T1,M1)+…+g(z,x;Tm,Mm)+ε =f(z,x)+ε◦ Treemodel(T,M)◦ T isabinarytree◦ M={µ1,µ2,…,µb} isthevectorofmeansintheb terminalnodesofthetree

    ◦ g(z,x;T,M): valueobtainedbyfollowingobservation(z,x)downtreeandreturningmeanfortheterminalnodeinwhichitlands

    ◦ ε ~N(0,σ2)

    BARTregularizationprior:◦ Priorpreferencefortreestobesmall(fewterminalnodes)◦ PriorshrinkingmeansMj toward0◦ Prioronσ2 suggestingitissomewhatsmallerthananOLSregression

  • WhyaBayesianframework?◦ Datacanovercomeassumptionsaboutdepthoftrees,shrinkageneeded◦ Numberoftreesremainsasatuningparameter◦ ComputationalbenefitsfromavoidingCV◦ Embedswhatisnormallyanalgorithmicapproachinalikelihoodframeworktoproducecoherentuncertaintyintervals,unusualformachinelearningapproaches

  • BARTandcausalinference:why?Precisemodelingoftheresponsesurface◦ Morethoroughcontrolforconfoundingthanwithtraditionalparametricmodels.

    Straightforwardestimationofcausaleffectsfromposteriordistributions◦ Averagetreatmenteffects◦ Heterogeneouscausaleffects

  • ◦ FigurefromDorieetal(2019):Resultsfromacausalinferencecompetitionusingautomatedmethods.

    ◦ Allmethodsfromleft(excludingOracle)toSuperLearner useanonparametricmethodtofittheresponsesurface

  • Obtainingposteriordistributions1.Setup“testset”ofcovariates

    2.FitBARTonobservedcovariatesandresponse

    3.Computecausaleffectofinterestfromdrawsoftheestimatedresponseforthetestset

  • Obtainingposteriordistributions1.Setup“testset”ofcovariates◦ ToestimateaSATE:covariatesofallobservations withtreatmentsettoopposite ofobserved

    2.FitBARTonobservedcovariatesandresponse

    3.Computecausaleffectofinterestfromdrawsoftheestimatedresponseforthetestset◦ ToestimateaSATE:compute differenceofestimateforobserved treatmentandcounterfactualtreatment(changesign sothatdifference istreated-untreated).Averageacross observations foreachdrawfrom theposterior

    ◦ Canthenplot posterior, computemeanandcredible intervalforATE

    OrusebartCause packageinR,whichhasawrapperthatcompletesthesestepsforyou!

  • Ex:HeterogeneoustreatmenteffectsFromCarnegie,et.al.(2019):Left:meansand95%credibleintervalsofposteriordistributionsforATEofmindsetinterventiononstudentachievement,foreachlevelofanorderedcategoricalvariable(studentexpectationsforsuccess).Right:posteriordistributionfordifferenceinmeaneffectsbetweentwotoplevelsoffuturesuccessandtherest.(Simulateddatachallenge)

  • ToolsforBARTmodelingThereareanumberofRpackagesthatfitBARTmodels:◦ BayesTree:basicBARTmodel◦ dbarts:expandstoincluderandomeffectsmodelsandautomaticcrossvalidation

    ◦ bartCause:wrapperfunctionsusingdbarts implementationthatspecificallytargetcausalinference

    ◦ treatSens:includessensitivity analysismethodsforBARTmodels

  • SensitivityanalysiswithBARTFigurefromCarnegieetal.(2019).

  • Inconclusion◦ Advantagesofmachine learningwithadvantagesofformalstatisticalinference

    ◦ Computationally (relatively)efficient◦ Robustimplementationforeaseofuse◦ Demonstratedsuccess incausalinferencechallenges◦ Toolsavailabletoassess sensitivitytounmeasuredconfoundingnotcapturedbyflexibleresponsesurfacemodeling

  • References◦ Carnegie,NB,Dorie,V,andHill,JH.(2019)ExaminingtreatmenteffectheterogeneityusingBART.ObservationalStudies 5:52-70.◦ CarnegieNB,andWuJ.(2020)VariableselectionandparametertuningforBARTmodelingintheFragileFamiliesChallenge.Socius Inpress.

    ◦ Chipman,H.,George,E.andMcCulloch,R.(2007).Bayesianensemblelearning.InAdvancesinNeuralInformationProcessingSystems19(B.Scholkopf,J.PlattandT.Hoffman,eds.).MIT Press,Cambridge,MA.

    ◦ Chipman,H.A.,George,E.I.andMcCulloch,R.E.(2010).BART:Bayesianadditiveregressiontrees.AnnalsofAppliedStatistics;4:266–298.

    ◦ Chipman H,McCullochR.BayesTree:BayesianmethodsforTreeBasedModels,2010.Availablefrom:http://CRAN.R-project.org/package=BayesTree

    ◦ DorieV,Chipman H,McCullochR.DBARTS:DiscreteBayesianAdditiveRegressionTreesSampler.Availablefrom:http://CRAN.R-project.org/package=dbarts

    ◦ DorieV,HillJL.bartCause:CausalinferenceusingBayesianAdditiveRegressionTrees.Availablefrom:http://cran.r-project.org/package=bartcause

    ◦ DorieV,HillJL,Shalit U,ScottM,andCervoneD.(2019)Automatedversusdo-it-yourselfmethodsforcausalinference:lessonslearnedfromadataanalysiscompetition.StatisticalScience;34(1):43-68.

    ◦ Green,D.P.andKern,H.L.(2012).ModelingheterogeneoustreatmenteffectsinsurveyexperimentswithBayesianadditiveregressiontrees.PublicOpinionQuarterly;76:491–511.

    ◦ HillJ.(2011)Bayesiannonparametricmodelingforcausalinference.JournalofComputationalandGraphicalStatistics;20(1):217–240.

    ◦ HillJ,SuYS.(2013)Assessing lackofcommonsupportincausalinferenceusingBayesiannonparametrics:Implicationsforevaluatingtheeffectofbreastfeedingonchildren’scognitiveoutcomes.AnnalsofAppliedStatistics;7(3):1386–1420.