almost-matching-exactly for treatment e ect estimation

Almost-Matching-Exactly for Treatment E↵ect Estimation underNetwork Interference

M. Usaid Awan Marco Morucci Vittorio OrlandiSudeepa Roy Cynthia Rudin Alexander Volfovsky

Duke University

Abstract

We propose a matching method that recov-ers direct treatment e↵ects from randomizedexperiments where units are connected in anobserved network, and units that share edgescan potentially influence each others’ out-comes. Traditional treatment e↵ect estima-tors for randomized experiments are biasedand error prone in this setting. Our methodmatches units almost exactly on counts ofunique subgraphs within their neighborhoodgraphs. The matches that we construct areinterpretable and high-quality. Our methodcan be extended easily to accommodate ad-ditional unit-level covariate information. Weshow empirically that our method performsbetter than other existing methodologies forthis problem, while producing meaningful,interpretable results.

1 INTRODUCTION

Randomized experiments are considered to be the goldstandard for estimating causal e↵ects of a treatmenton an outcome. Typically, in these experiments, theoutcome of a unit is assumed to be only a↵ected by theunit’s own treatment status, and not by the treatmentassignment of other units (Cox, 1958; Rubin, 1980).However, in many applications – such as measuring ef-fectiveness of an advertisement campaign or a teachertraining program – units interact, and ignoring theseinteractions results in poor causal estimates (Hallo-ran and Struchiner, 1995; Sobel, 2006). We propose amethod that leverages the observed network structureof interactions between units to account for treatmentinterference among them.

Proceedings of the 23rdInternational Conference on Artifi-cial Intelligence and Statistics (AISTATS) 2020, Palermo,Italy. PMLR: Volume 108. Copyright 2020 by the au-thor(s).

We study a setting in which a treatment has been uni-formly randomized over a set of units connected in anetwork, and where treatments of connected units caninfluence each others’ outcomes. The development ofmethods for this setting is a relatively new field incausal inference methodology, and only few approachesfor it have been proposed (e.g., van der Laan, 2014;Aronow et al., 2017; Sussman and Airoldi, 2017).

In this paper, we propose a method that leveragesmatching (Rosenbaum and Rubin, 1983) to recover di-rect treatment e↵ects from experiments with interfer-ence. Our method makes several key contributionsto the study of this setting: First, our method ex-plicitly leverages information about the network struc-ture of the experimental sample to adjust for possibleinterference while estimating direct treatment e↵ects.Second, unlike other methods, matching allows us tononparametrically estimate treatment e↵ects, withoutthe need to specify parametric models for interferenceor outcomes. Third, matching produces highly inter-pretable results, informing analysts as to which fea-tures of the input data were used to produce estimates.More specifically, we match on features of graphs thatare easy to interpret and visualize.

In our setting, units experience interference accordingto their neighborhood graphs – the graphs defined bythe units they are directly connected to. Units withsimilar neighborhood graphs will experience similar in-terference. For example, the educational outcome of astudent randomly assigned to an extra class dependson whether or not her friends are also assigned to thatclass, and not just on how many: the specific structureof the student’s friendship circle will influence whetheror not study groups are formed, how information isshared, how much attention the treated student willdevote to the class, and so on. All of this will impactthe overall educational outcomes of interest.

Because of this, matching units with similar neigh-borhood graphs together will enable us to recover di-rect treatment e↵ects even under interference. Wematch units’ neighborhood graphs on counts of sub-

Almost-Matching-Exactly for Treatment E↵ect Estimation under Network Interference

graphs within them, as graphs with similar counts ofthe same unique subgraphs are naturally likely to besimilar. From there, we construct matches on individ-uals with similar sets of important subgraphs; here,the set of important subgraphs is learned from a train-ing set. We generalize the Almost-Matching-Exactly(AME) framework (Dieng et al., 2019; Wang et al.,2019) to match units on subgraphs in experimentalsettings. We do this by constructing graph-based fea-tures that can explain both the interference pattern inthe experiment and predict the underlying social net-work. We demonstrate that our method performs bet-ter than other methods for the same problem in manysettings, while generating interpretable matches.

The paper will proceed as follows: In Section 2,we make explicit the assumptions underpinning ourframework, and outline our matching approach to es-timating direct treatment e↵ects. In Sections 3 and 4,we evaluate the e↵ectiveness of our method on simu-lated and real-world data. Theoretical evaluation ofour approach is available in the appendix.

1.1 Related Work

Work on estimating causal e↵ects under interferencebetween units has three broad themes. First, therehas been a growing body of work on the design of novelrandomization schemes to perform causal inference un-der interference (Liu and Hudgens, 2014; Sinclair et al.,2012; Duflo and Saez, 2003; Basse and Airoldi, 2018).Some of this work makes explicit use of observed net-work structure to randomly assign treatment so as toreduce interference (Ugander et al., 2013; Toulis andKao, 2013; Eckles et al., 2016, 2017; Jagadeesan et al.,2019). These methodologies are inapplicable to oursetting as they require non-uniform treatment assign-ment, whereas in our setting we wish to correct for in-terference after randomization. Second, there is workon estimating direct treatment e↵ects in experimentsunder interference, and after conventional treatmentrandomization, similar to our setting. Some existingwork aims to characterize the behavior of existing esti-mators under interference (Manski, 2013; Savje et al.,2017). Other approaches lay out methods based onrandomization inference to test a variety of hypothe-ses under interference and treatment randomization(Rosenbaum, 2007; Aronow, 2012; Athey et al., 2018).Some of these approaches mix randomization infer-ence and outcome models (Bowers et al., 2013). Forthe explicit problem of recovery of treatment e↵ectsunder interference, Aronow et al. (2017) provide ageneral framework to translate di↵erent assumptionsabout interference into inverse-probability estimators,and Sussman and Airoldi (2017) give linearly unbi-ased, minimum integrated-variance estimators under a

series of assumptions about interference. These meth-ods either ignore explicit network structure, or requireprobabilities under multiple complex sampling designsto be estimated explicitly. Finally, there have beenstudies of observational inference under network inter-ference (van der Laan, 2014; Liu et al., 2016; Ogburnet al., 2017; Forastiere et al., 2016). However, recov-ering causal estimates using observational data whenunits are expected to influence each other requires astructural model of both the nature of interference andcontagion among units.

2 METHODOLOGY

We discuss our problem and approach in this section.

2.1 Problem Statement

We have a set of n experimental units indexed byi. These units are connected in a known graph G =(V,E), where V (G) = {1, . . . , n} is the set of verticesof G, and E(G) is the set of edges of G. We disallowself-loops in our graph. We say thatH is a subgraph ofG if V (H) ✓ V (G) and E(H) ✓ E(G). Let ti 2 {0, 1}represent the treatment indicator for unit i, t repre-sent the vector of treatment indicators for the entiresample, and t�i represent the treatment indicators forall units except i. Given a treatment vector t on theentire sample (i.e., for all vertices in G), we use G

t todenote the labeled graph, where each vertex i 2 V (G)has been labeled with its treatment indicator ti. In ad-dition, we use GP to denote a graph induced by the setof vertices P ✓ V (G) on G, such that V (GP ) = P andE(GP ) = {(e1, e2) 2 E(G) : e1 2 P, e2 2 P}. We usethe notation Ni = {j : (i, j) 2 E(G)} to represent theneighborhood of vertex i. The labeled neighborhoodgraph of a unit i, Gt

Ni, is defined as the graph induced

by the neighbors of i, and labeled according to t. Wealso define tNi to be the vector of treatment indica-tors corresponding to unit i’s neighborhood graph. Aunit’s response to the treatment is represented by itsrandom potential outcomes Yi(t) = Yi(ti, t�i). Unlikeother commonly studied causal inference settings, uniti’s potential outcomes are now a function of both thetreatment assigned to i, and of all other units’ treat-ments. Observed treatments for unit i and the wholesample are represented by the random variables Ti andT respectively. We assume that the number of treatedunits is always n(1), i.e.,

Pni=1 Ti = n

(1).

A0: Ignorability of Treatment Assignment.We make the canonical assumption that treatmentsare administered independently of potential outcomes,that is: Yi(ti, t�i) ?? T, and 0 < Pr(Ti = 1) < 1 forall units. In practice, we assume that treatment is as-signed uniformly at random to units, which is possible

Awan, Morucci, Orlandi, Roy, Rudin, Volfovsky

only in experimental settings. As stated before, we donot make the canonical Stable Unit Treatment ValueAssumption (SUTVA) (Rubin, 1980), which, amongother requirements, states that units are exclusivelya↵ected by the treatment assigned to them. We donot make this assumption because our units are con-nected in a network: it could be possible for treat-ments to spread along the edges of the network andto a↵ect connected units’ outcomes. We do maintainthe assumption of comparable treatments across units,which is commonly included in SUTVA.

Our causal quantity of interest will be the AverageDirect E↵ect (ADE), which is defined as follows:

ADE =1

n

nX

i=1

E[Yi(1,0)� Yi(0,0)], (1)

where t�i = 0 represents the treatment assignment inwhich no unit other than i is treated. The summandrepresents the treatment e↵ect on unit i when no otherunit is treated, and, therefore, no interference occurs(Halloran and Struchiner, 1995).

2.2 Framework

We outline the requirements of our framework fordirect e↵ect estimation under interference. We de-note interference e↵ects on a unit i with the functionfi(t) : {0, 1}n 7! R, a function that maps each possibletreatment allocation for the n units to the amount ofinterference on unit i. We will use several assumptionsto restrict the domain of f to a much smaller set (andoverload the notation fi accordingly). To characterizef , we rely on the typology of interference assumptionsintroduced by Sussman and Airoldi (2017). The firstthree assumptions (A0-A2) needed in our frameworkare common in the interference literature (e.g., Man-ski, 2013; Toulis and Kao, 2013; Eckles et al., 2016;Athey et al., 2018):

A1: Additivity of Main E↵ects. First, we as-sume that main treatment e↵ects are additive, i.e.,that there is no interaction between units’ treatmentindicators. This allows us to write:

Yi(t, t�i) = t⌧i + fi(t�i) + ✏i (2)

where ⌧i is the direct treatment e↵ect on unit i, and✏i is some baseline e↵ect.A2: Neighborhood Interference. We focus on aspecific form of the interference function fi by assum-ing that the interference experienced by unit i dependsonly on treatment of its neighbors. That is, if for twotreatment allocations t, t0 we have tNi = t0Ni

thenfi(t) = fi(t0). To make explicit this dependence on theneighborhood subgraph, we will write fi(tNi) ⌘ fi(t).

A3: Isomorphic Graph Interference We assumethat, if two units i and j have isomorphic labeled neigh-borhood graphs, then they receive the same amountof interference, denoting isomorphism by ', G

tNi

'G

tNj

=) fi(tNi) = fj(tNj ) ⌘ f(GtNi

) = f(GtNj

).While Assumptions A1 and A2 are standard, A3 isnew. This assumption allows us to study interferencein a setting where units with similar neighborhood sub-graphs experience similar amounts of interference.

All our assumptions together induce a specific form forthe potential outcomes, namely that they depend onneighborhood structure G

tNi

, but not exactly who theneighbors are (information contained in Ni) nor treat-ment assignments for those outside the neighborhood(information contained in tNi). Namely:

Proposition 1. Under assumptions A0-A3, potentialoutcomes in (2) for all units i can be written as:

Yi(t, t�i) = t⌧i + f(GtNi

) + ✏i, (3)

where ⌧i is the direct treatment e↵ect on unit i, and ✏i

is some baseline response.

In addition, suppose that baseline responses for allunits are equal to each other in expectation, i.e., forall i, E[✏i] = ↵. Then under assumptions A0-A3, forneighborhood graph structures gi of unit i and treat-ment vectors t, the ADE is identified as:

ADE =1

n(1)

nX

i=1

E⇥Ti⇥

�E[Yi|GT

Ni' g

ti , Ti = 1]

� E[Yi|GTN ' g

ti , Ti = 0]

�⇤,

where GTNi

is the neighborhood graph of i labelled ac-cording to the treatment assignment T.

The proposition (whose proof is in the appendix)states that the interference received by a unit is a func-tion of each unit’s neighborhood graph. Further, theoutcomes can be decomposed additively into this func-tion and the direct treatment e↵ect on i. The propo-sition implies that the ADE is identified by matchingeach treated unit to one or more control units with anisomorphic neighborhood graph, and computing thedirect e↵ect on the treated using these matches. Thise↵ect is, in expectation over individual treatment as-signments, equal to the ADE.

2.3 Subgraph Selection viaAlmost-Matching-Exactly

Given Proposition 1 and the framework established inthe previous section, we would ideally like to matchtreated and control units that have isomorphic neigh-borhood graphs. This would allow us to better esti-mate the ADE without su↵ering interference bias: for


a treated unit i, if a control unit j can be found suchthat Gt

Ni' G

tNj

, then j’s outcome will be identical inexpectation to i’s counterfactual outcome and can beused as a proxy. Unfortunately, the number of non-isomorphic (canonically unique) graphs with a givennumber of nodes and edges grows incredibly quickly(Harary, 1994) and finding such matches is infeasiblefor large graphs. We therefore resort to counting allsubgraphs that appear in a unit’s neighborhood graphand matching units based on the counts of those sub-graphs. However, instead of exactly matching on thecounts of those subgraphs, we match treated and con-trol units if they have similar counts, since match-ing exactly on all subgraph counts implies isomorphicneighborhoods and is also infeasible. Further, abso-lutely exact matches may not exist in real networks.

Constructing inexact matches, in turn, requires a mea-sure of relative graph importance. In Figure 1, forexample, there are two control units that the treatedunit may be matched to; if triangles contribute more tothe interference function, it should be matched to theright; otherwise, if degree and/or two-stars are moreimportant, it should be matched to the left. Of course,these relative importance measures might depend onthe problem and we would like to learn them.

Figure 1: Inexact matching presupposes an ordering offeature importance; should the the treated ego (black)be matched to a control whose neighborhood graphhas the same number of units (left), or same numberof triangles (right)?

It might be tempting to match directly on f , as thatwould lead to unbiased inference. However, we abstainfrom doing so for two reasons. Firstly, in practice,the true interference is unknown and we could onlymatch on estimated values of f ; this su↵ers from all theproblems that a✏ict matching on estimated propensityscores without appropriate adjustments (Abadie andImbens, 2016) or parametric approximations (Rubinand Thomas, 1996). Such corrections or approxima-tions do not currently exist for estimated interferencefunctions and their development is an active area ofresearch. Secondly, interpretability is a key compo-nent of our framework that would be lost matching onf -values; these values are scalar summaries of interfer-

ence that depends on entire graphs. Estimating f wellwould also likely require complex and uninterpretablenonparametric methods. In Section K of the appendix,we empirically compare matching units on f -values toour subgraph matching method via simulation. Theloss of interpretability associated with matching on f

does not yield substantial gains in performance, evenwhen using true values of f for matching, which isimpossible in practice.

Almost-Matching-Exactly (AME) (Wang et al., 2019;Dieng et al., 2019; Awan et al., 2019) provides a frame-work for the above problem that is explicitly geared to-wards building interpretable, high-quality matches ondiscrete covariates, which in our setting are the countsof the treated subgraphs in the neighborhood. AMEperforms inexact matching while learning importanceweights for each covariate from a training set, priori-tizing matches on more important covariates. In thisway, it neatly addresses the challenge of inexact match-ing by learning a metric specific to discrete covariates(namely, a weighted Hamming distance). Formally,AME matches units so as to optimize a flexible mea-sure of match quality. For each treated unit i, solvingthe AME problem is equivalent to finding:

✓i⇤ 2 argmax✓2{0,1}p

✓Tw (4)

such that 9j : tj = 0 and xj � ✓ = xi � ✓

where � denotes the Hadamard product, w is a vectorof weights and xi,xj are vectors of binary covariatesfor units i and j that we might like to match on. Inour network interference setting, these are vectors ofsubgraph counts. The vector w denotes the impor-tance of each subgraph in causing interference. Wewill leverage both information on outcomes and net-works to construct an estimate for it.

We start by enumerating (up to isomorphism) all the psubgraphs g1, . . . , gp that appear in any of the Gt

Ni, i 2

1, . . . , n. The covariates for unit i are then given byS(Gt

Ni) = (S1(Gt

Ni), . . . , Sp(Gt

Ni)) where Sk(Gt

Ni) de-

notes the number of times subgraph gk appears in thesubgraphs of Gt

Ni. These counts are then converted

into binary indicators that are one if the count of sub-graph gk in each unit’s neighborhood is exactly x, forall x observed in the data. Thus, units will be matchedexactly if they have identical subgraph counts. Wethen approximately solve the problem in Equation (4)to find the optimally important set of subgraphs uponwhich to exactly match each treated unit, such thatthere is at least one control unit that matches ex-actly with the treated unit on the chosen subgraphcounts. The key idea behind this approach is that wewant to match units exactly on subgraph counts thatcontribute significantly to the interference function,


trading o↵ exactly-matching on these important sub-graphs with potential mismatches on subgraphs thatcontribute less to interference.

In practice, our implementation enumerates all sub-graphs in each unit’s neighborhood and stores thecount of each pattern – this is computationally chal-lenging. There is a growing body of work on e�cientcounting algorithms for pre-specified small patterns(up to 4-5 nodes) but there is little research on fastmethods to both enumerate and count all motifs in agraph (e.g., Pinar et al., 2017; Marcus and Shavitt,2010; Hu et al., 2013). Empirically, we see that thisenumeration takes less than 30 seconds for 50 units.

The FLAME Algorithm for AME. The FastLarge Almost Matching Exactly (FLAME) algorithm(Wang et al., 2019) approximates the solution tothe AME problem. The procedure starts by exactlymatching all possible units on all covariates. It thendrops one covariate at a time, choosing the drop maxi-mizing the match quality MQ at that iteration, defined:

MQ = C · BF�cPEY . (5)

The match quality is the sum of a balancing factorBF and a predictive error cPEY , with relative weightsdetermined by the hyper-parameter C. The balanc-ing factor is defined as the proportion of treated unitsplus the proportion of control units matched at thatiteration. Introducing the balancing factor into theobjective has the advantage of encouraging more unitsto be matched, thereby minimizing variance of estima-tors (see Wang et al., 2019). In our setting, the secondcomponent of the match quality, predictive error, takesthe form:

cPEY = argminh2F1

nX

i

(Yi � h(S(GtNi

) � ✓, Ti))2 (6)

for some class of functions F1. It is computed usinga holdout training set and discourages dropping co-variates that are useful for predicting the outcome. Inthis way, FLAME strikes a balance between matchingmany units and ensuring these matches are of high-quality. By using a holdout set to determine how use-ful a set of variables is for out-of-sample prediction,FLAME learns a measure of covariate importance viaa weighted Hamming distance. Specifically, it learns avector of importance weights w for the di↵erent sub-graph counts that minimizes wT I[S(Gt

Ni) 6= S(Gt

Nj)],

where I[S(GtNi

) 6= S(GtNj

)] is a vector whose kth entryis 0 if the labeled neighborhood graphs of i and j havethe same count of subgraph k, and 1 otherwise.

To this match quality term, we add a network fit termto give subgraphs more weight that are highly predic-

tive of overall network structure. We fit a logistic re-gression model in which the edges (i, j) between unitsi, j are independent given Ni,Nj , and dependent onthe subgraph counts of units i and j:

(i, j)iid⇠ Bern(logit(�T

1 S(GtNi

) + �T2 S(G

tNj

)))

To the match quality in the original formulation, wethen add cPEG, defined to be the AIC (Akaike, 1974)of this fitted model, weighted by a hyperparameter D.Therefore, at each iteration:

MQ = C · BF�cPEY +D ·cPEG.

Thus, we penalize not only subgraph drops that im-pede predictive performance or making matches, butalso those that make the observed network unlikely.cPEG represents the empirical prediction error of thechosen set of statistics for the observed graph: if cPEGis low, then the chosen subgraphs do a good job ofpredicting the observed graph. This error is also eval-uated at a minimum over another class of predictionfunctions, F2. This term in the AME objective is justi-fied by Assumption A3: units that have isomorphic la-beled neighborhood graphs should experience the sameamount of interference, and subgraph counts should bepredictive of neighborhood graph structure.

Our approach to estimating the ADE is therefore asfollows. (1) For each unit i, count and label all types ofsubgraphs inG

tNi

. (2) Run FLAME, encouraging largenumbers of matches on subgraph counts, while usingthe covariates that are most important for predictingthe outcome and the network. (3) Estimate ADE as\ADE, by computing the di↵erence in means for eachmatched group and then averaging across matchedgroups, weighted by their size. Since our approachis based on FLAME, we call it FLAME-Networks.

Extensions. FLAME-Networks immediately ex-tends to handling unit-level covariate information forbaseline adjustments; we simply concatenate sub-graph information and covariate information in thesame dataset and then make almost-exact matches.FLAME-Networks will automatically learn the weightsof both subgraphs and baseline covariates to makematches that take both into account. Anotherstraightforward extension considers interference notjust in the immediate neighborhood of each unit, butup to an arbitrary number of hops away. To extendFLAME-Networks in this way, it is su�cient to enu-merate subgraphs in the induced neighborhood graphof each unit i where the vertices considered are thosewith a path to i that is at most k steps long. Giventhese counts, our method proceeds exactly as before.


di: The treated degree of unit i�i: The number of triangles in G

tNi[{i} with at

least one treated unit.Fk

i : The number of k-stars in GtNi[{i} with at least

one treated unit.†ki : The number of units in G

tNi[{i} with degree

� k and at least one treated unit among theirneighbors.

Bi: The vertex betweenness of unit i.Ci: The closeness centrality of unit i.

Table 1: Interference components used in experiments;see the appendix for more details.

3 EXPERIMENTS

We begin by evaluating the performance of our estima-tor in a variety of simulated settings, in which we varythe form of the interference function. We find that ourapproach performs well in many distinct settings. InSection M of the appendix, we also assess the qualityof the matches constructed.

We simulate graphs from an Erdos-Renyi model: G ⇠Erdos-Renyi(n, q), by which every possible edge be-tween the n units is created independently, with prob-ability q. In Sections H and I of the appendix, wealso perform experiments on cluster-randomized andreal-world networks. Treatments for the whole sam-ple are generated with Pr(T = t) =

� nn(1)

��1, where

n(1) is the number of treated units. Outcomes are

generated according to Yi(t, t�i) = t⌧i + f(GtNi

) + ✏i,where ✏ ⇠ N(0, In) represents a baseline outcome;⌧i ⇠ N(5, In) represents a direct treatment e↵ect, andf is the interference function. In Section J of the ap-pendix, we consider a setting in which the errors areheteroscedastic. For the interference function, we useadditive combinations of the subgraph components inTable 1 and define mip, p = 1, . . . , 7 to be the counts offeature p in G

tNi[{i}. Lastly, the counts of each com-

ponent are normalized to have mean 0 and standarddeviation 1. We compare our approach with di↵erentmethods to estimate the ADE under interference:

Naıve. The simple di↵erence in means between treat-ment and control groups assuming no interference.All Eigenvectors. Eigenvectors for the entire adja-cency matrix are computed with every treated unitmatched to the control unit minimizing the Maha-lanobis distance between the eigenvectors, weighingthe k’th eigenvector by 1/k. The idea behind this esti-mator is that the eigendecomposition of the adjacencymatrix encodes important information about the net-work and how interference might spread within it.First Eigenvector. Same as All Eigenvectors ex-cept units are matched only on their values of thelargest-eigenvalue eigenvector.

Feature di �i F2i F4

i †3 Bi Ci

Weight �1 �2 �3 �4 �5 �6 �7

Setting 1 0 10 0 0 0 0 0Setting 2 10 10 0 0 0 0 0Setting 3 0 10 1 1 1 1 -1Setting 4 5 1 10 1 1 1 -1

Table 2: Settings for Experiment 1.

Stratified Naıve. The stratified naıve estimator asdiscussed by Sussman and Airoldi (2017). A weighteddi↵erence-in-means estimator where units are dividedinto strata defined by their treated degree (number oftreated vertices they are connected to), and assignedweight equal to the number of units within the stra-tum in the final di↵erence of weighted averages be-tween treated and control groups.SANIA MIVLUE. The minimum integrated vari-ance, linear unbiased estimator under assumptions ofsymmetrically received interference and additivity ofmain e↵ects, when the priors on the baseline outcomeand direct treatment e↵ect have no correlation be-tween units; proposed by Sussman and Airoldi (2017).FLAME-Networks. Our proposed method. In allsimulations, the two components of the PE functionare weighted equally, and a ridge regression is used tocompute outcome prediction error.

3.1 Experiment 1: Additive Interference

First we study a setting in which interference is anadditive function of the components in Table 1. Out-comes in this experiment have the form: Yi = �1di +�2�i + �3F2

i + �4F4i + �5 †3i +�6Bi + �7Ci + ✏i, with

✏i ⇠ N(0, 1). We simulate 50 datasets for each set-ting, in which the units are in an ER(50, 0.05) graph.Table 2 shows values for the �i in each of our exper-imental settings. Results for Experiment 1 are re-ported in Figure 2. FLAME-Networks outperformsall other methods both in terms of average error, andstandard deviation over the simulations. This is likelybecause FLAME-Networks learns weights for the sub-graphs that are proportionate to those we use at eachsetting, and matches units on subgraphs with largerweights. When the interference function is multiplica-tive instead of additive, FLAME-Networks performssimilarly; results are in the appendix.

3.2 Experiment 2: Covariate Adjustment

A strength of FLAME-Networks is its ability to na-tively account for covariate information. We analyzea setting in which baseline e↵ects are dependent onan additional discrete-valued covariate, x, that is ob-served alongside the network. Outcomes take the formYi = t⌧i + f(Gt

Ni) + �xi + ✏i, where xi is chosen uni-


Figure 2: Results from Experiment 1. Each violin plotrepresents the distribution over simulations of absoluteestimation error. The panels are numbered accordingto the parameter settings of the simulations. Violinplots are blue if the method had mean error lower thanor equal to FLAME-Networks’ and red otherwise. Theblack line inside each violin is the median error. Thedashed line is FLAME-Networks’ mean error.

formly at random from {1, 2, 3} for each unit, and� is fixed at 15. This means that our sample is di-vided into 3 strata defined by the covariate values.We ran FLAME-Networks with a dataset consistingof subgraph counts, plus observed covariates for eachunit. For comparison with the other methods, we firstregress Y on x, and then use the residuals of that re-gression as outcomes for the other methods. This way,the initial regression will account for the baseline ef-fects of x, and the residuals contain only potential in-terference. The interference function takes the formf(Gt

Ni) = di+�i+Bi, which is what we are trying to

learn with the other methods. We simulate the samplenetwork from ER(50, 0.05).

Results are displayed in Table 3. FLAME-Networksperforms, on average, better than all the other meth-ods. Results in Section L of the supplement show thatwhen � is increased, none of the methods su↵er in per-formance. While regression adjustment prior to esti-mation seems to have a positive impact on the perfor-mance of other methods in the presence of additionalcovariates, FLAME-Networks performs best. This isbecause FLAME-Networks is built to easily handle theinclusion of covariates in its estimation procedure.

3.3 Experiment 3: Misspecified Interference

We now study the robustness of our estimator in asetting in which one of our key assumptions – A3– is violated. Specifically, we now allow for treated

Method Median 25th q 75th qFLAME-Networks 0.39 0.21 0.59First Eigenvector 0.47 0.40 0.83All Eigenvectors 0.55 0.29 0.79Naive 0.53 0.36 0.92SANIA 1.93 1.75 2.25Stratified 4.49 4.45 4.53

Table 3: Results from Experiment 2 with � = 5. Me-dian and 25th and 75th percentile of absolute errorover 40 simulated datasets.

and control units to receive di↵erent amounts of in-terference, even if they have the same labelled neigh-borhood graphs. We do this by temporarily eliminat-ing all control-control edges in the network and thencounting the features in Table 1 used to assign interfer-ence. That is, consider a unit i with a single, untreatedneighbor j. In our new setting, if degree is a feature ofthe interference function, then i being treated impliesi receives interference from j. But if i is untreated,then i would receive no interference from j, becauseits neighbor is also untreated. This crucially impliesthat FLAME-Networks will be matching–and estimat-ing the ADE from–individuals that do not necessarilyreceive similar amounts of interference.

In this setting, we generate interference according to:fi = (5 � �)di + ��i for � 2 [0, 5] and assess theperformance of FLAME-Networks against that of theSANIA and stratified estimators. Results are shownin Figure 3. We see that, when degree is the onlycomponent with weight in the interference function,FLAME-Networks performs better than the stratifiedestimator, but worse than the SANIA estimator, whichleverages aspects of the graph related to degree. How-ever, our performance improves as � increases and thetrue interference depends more on triangle counts sincethe triangle counts available to FLAME-Networks rep-resent the actual interference pattern more frequentlythan the degree counts did. Thus, we see that althoughviolation of our method’s assumptions harms its per-formance, it still manages at times to outperform es-timators that rely too heavily on degree.

4 APPLICATION

In this section, we demonstrate the practical utilityof our method. We use data collected by Banerjeeet al. (2013) on social networks for 75 villages in Kar-nataka, India. They are a median distance of 46 kmfrom one another other, motivating the assumptionthat network interference is experienced solely betweenindividuals from the same village. For each village,we study the e↵ect of 1. lack of education on electionparticipation; 2. lack of education on Self-Help-Group


Figure 3: Results from Experiment 3. These simula-tions were run on an ER(75, 0.07) graph. The bandsaround the lines represent 25th and 75th quantiles ofthe 50 simulations, for each value of �.

(SHG) participation; and 3. being male on SHG par-ticipation. We proxy election participation by owner-ship of an election card. We compare our estimates –which account for network interference – to naive esti-mates – which assume no network interference. Datapre-processing is summarized in the appendix.

For ADE estimates, we assume the treatment is ran-domly assigned. We find that lack of education is asso-ciated with higher SHG participation, and that malesare less likely to participate in SHGs than females (seeFigure 4). These results make sense in the contextof developing countries where SHGs are mainly uti-lized by females in low-income families, which gen-erally have lower education levels. We observe thateducation does not impact election participation; indeveloping countries, an individual’s decision to par-ticipate in an election may be driven by factors suchcaste, religion, influence of local leaders and closenessof race (Shachar and Nalebu↵, 1999; Gleason, 2001).FLAME-Networks matches units in each village bysubgraph counts and covariate values to estimate theADE. Looking at the matched groups, we discover thatsubgraphs such as 2-stars and triangles were importantfor matching, implying that second-order connectionscould be a↵ecting interference in this setting. Furtherdetails of the matched groups are in Section F.

Figure 4 plots naive and FLAME-Networks ADE esti-mates. We find a significant di↵erence between ourestimates and the naive estimates when estimatingthe e↵ect of being male on participation in SHGs.The naive estimator overestimates the treatment ef-fect, which is possible when ignoring network interfer-ence. It is plausible, in this setting, that interferencewould heighten these e↵ects for all outcomes. This isbecause individuals from similar social backgrounds or

Figure 4: Naive and FLAME-Networks ADE estimatesand their di↵erence. Red, blue, and green respectivelycorrespond to (treatment, outcome) pairs: (no educa-tion, election participation), (no education, SHG par-ticipation), and (gender, SHG participation).

gender tend to interact more together, and, therefore,are more likely to influence each other’s participationdecision, both in elections and in SHGs.

5 DISCUSSION

Conventional estimators for treatment e↵ects in ran-domized experiments will be biased when there isinterference between units. We have introducedFLAME-Networks – a method to recover direct treat-ment e↵ects in such settings. Our method is basedon matching units with similar neighborhood graphsin an almost-exact way, thus producing interpretable,high-quality results. We have shown that FLAME-Networks performs better than existing methods bothon simulated data, and we have used real-world datato show how it can be applied in a real setting. Ourmethod extends easily to settings with additional co-variate information for units and to taking into ac-count larger neighborhoods for interference. In futurework, our method can be extended to learning a vari-ety of types of distance metrics between graphs.

Acknowledgements

This work was supported in part by NIH award1R01EB025021-01, NSF awards IIS-1552538 andIIS1703431, a DARPA award under the L2M program,and a Duke University Energy Initiative ERSF grant.


References

Alberto Abadie and Guido Imbens. Matching on theestimated propensity score. Econometrica, pages781–807, 2016.

H. Akaike. A new look at the statistical modelidentification. IEEE Transactions on AutomaticControl, 19(6):716–723, December 1974. doi:10.1109/TAC.1974.1100705.

Peter M Aronow. A general method for detecting inter-ference between units in randomized experiments.Sociological Methods & Research, 41(1):3–16, 2012.

Peter M Aronow, Cyrus Samii, et al. Estimating av-erage causal e↵ects under general interference, withapplication to a social network experiment. The An-nals of Applied Statistics, 11(4):1912–1947, 2017.

Susan Athey, Dean Eckles, and Guido W Imbens. Ex-act p-values for network interference. Journal of theAmerican Statistical Association, 113(521):230–240,2018.

M Awan, Yameng Liu, Marco Morucci, Sudeepa Roy,Cynthia Rudin, and Alexander Volfovsky. Inter-pretable almost matching exactly with instrumentalvariables. In Conference on Uncertainty in ArtificialIntelligence (UAI), 2019.

Abhijit Banerjee, Arun G Chandrasekhar, Esther Du-flo, and Matthew O Jackson. The di↵usion of mi-crofinance. Science, 341(6144):1236498, 2013.

Guillaume W Basse and Edoardo M Airoldi. Model-assisted design of experiments in the presence ofnetwork-correlated outcomes. Biometrika, 105(4):849–858, 2018.

Jake Bowers, Mark M Fredrickson, and CostasPanagopoulos. Reasoning about interference be-tween units: A general framework. Political Analy-sis, 21(1):97–124, 2013.

David Roxbee Cox. Planning of Experiments. Wiley,1958.

Awa Dieng, Yameng Liu, Sudeepa Roy, CynthiaRudin, and Alexander Volfovsky. Interpretablealmost-exact matching for causal inference. InProceedings of Artificial Intelligence and Statistics(AISTATS), pages 2445–2453, 2019.

Esther Duflo and Emmanuel Saez. The role of in-formation and social interactions in retirement plandecisions: Evidence from a randomized experiment.The Quarterly Journal of Economics, 118(3):815–842, 2003.

Dean Eckles, Rene F Kizilcec, and Eytan Bakshy. Esti-mating peer e↵ects in networks with peer encourage-ment designs. Proceedings of the National Academyof Sciences, 113(27):7316–7322, 2016.

Dean Eckles, Brian Karrer, and Johan Ugander. De-sign and analysis of experiments in networks: Re-ducing bias from interference. Journal of CausalInference, 5(1), 2017.

Laura Forastiere, Edoardo M Airoldi, and FabriziaMealli. Identification and estimation of treatmentand interference e↵ects in observational studies onnetworks. arXiv preprint arXiv:1609.06245, 2016.

Suzanne Gleason. Female political participation andhealth in india. The ANNALS of the AmericanAcademy of Political and Social Science, 573(1):105–126, 2001.

M Elizabeth Halloran and Claudio J Struchiner.Causal inference in infectious diseases. Epidemiology(Cambridge, Mass.), 6(2):142–151, 1995.

Frank Harary. Graph Theory. Addison-Wesley, 1994.

Kathleen Harris. The add health study: design and ac-complishments. Technical report, Carolina Popula-tion Center: University of North Carolina at ChapelHill, 2013.

Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. Mas-sive graph triangulation. In Proceedings of the 2013ACM SIGMOD international conference on Man-agement of data, pages 325–336, 2013.

Ravi Jagadeesan, Natesh Pillai, and Alexander Vol-fovsky. Designs for estimating the treatment e↵ectin networks with interference. Annals of Statistics,2019.

Lan Liu and Michael G Hudgens. Large sample ran-domization inference of causal e↵ects in the presenceof interference. Journal of the American StatisticalAssociation, 109(505):288–301, 2014.

Lan Liu, Michael G Hudgens, and Sylvia Becker-Dreps. On inverse probability-weighted estimatorsin the presence of interference. Biometrika, 103(4):829–842, 2016.

Charles F Manski. Identification of treatment responsewith social interactions. The Econometrics Journal,16(1):S1–S23, 2013.

Dror Marcus and Yuval Shavitt. E�cient countingof network motifs. In 2010 IEEE 30th Interna-tional Conference on Distributed Computing Sys-tems Workshops, pages 92–98. IEEE, 2010.

Elizabeth L Ogburn, Tyler J VanderWeele, et al. Vac-cines, contagion, and social networks. The Annalsof Applied Statistics, 11(2):919–948, 2017.

Ali Pinar, C Seshadhri, and Vaidyanathan Vishal. Es-cape: E�ciently counting all 5-vertex subgraphs. InProceedings of the 26th International Conference onWorld Wide Web, pages 1431–1440, 2017.


Paul R Rosenbaum. Interference between units in ran-domized experiments. Journal of the American Sta-tistical Association, 102(477):191–200, 2007.

Paul R Rosenbaum and Donald B Rubin. The centralrole of the propensity score in observational studiesfor causal e↵ects. Biometrika, 70(1):41–55, 1983.

Donald Rubin and Neal Thomas. Matching using esti-mated propensity scores: relating theory to practice.Biometrics, 52(1):249–264, 1996.

Donald B Rubin. Randomization analysis of exper-imental data: The fisher randomization test com-ment. Journal of the American Statistical Associa-tion, 75(371):591–593, 1980.

Fredrik Savje, Peter M Aronow, and Michael GHudgens. Average treatment e↵ects in the pres-ence of unknown interference. arXiv preprintarXiv:1711.06399, 2017.

Ron Shachar and Barry Nalebu↵. Follow the leader:Theory and evidence on political participation.American Economic Review, 89(3):525–547, 1999.

Betsy Sinclair, Margaret McConnell, and Donald PGreen. Detecting spillover e↵ects: Design and anal-ysis of multilevel experiments. American Journal ofPolitical Science, 56(4):1055–1069, 2012.

Michael E Sobel. What do randomized studies of hous-ing mobility demonstrate? causal inference in theface of interference. Journal of the American Statis-tical Association, 101(476):1398–1407, 2006.

Daniel L Sussman and Edoardo M Airoldi. Ele-ments of estimation theory for causal e↵ects in thepresence of network interference. arXiv preprintarXiv:1702.03578, 2017.

Panos Toulis and Edward Kao. Estimation of causalpeer influence e↵ects. In International Conferenceon Machine Learning (ICML), pages 1489–1497,2013.

Johan Ugander, Brian Karrer, Lars Backstrom, andJon Kleinberg. Graph cluster randomization: Net-work exposure to multiple universes. In Proceedingsof the 19th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, pages329–337. ACM, 2013.

Mark J van der Laan. Causal inference for a popula-tion of causally connected units. Journal of CausalInference, 2(1):13–74, 2014.

Tianyu Wang, Marco Morucci, M Awan, Yameng Liu,Sudeepa Roy, Cynthia Rudin, and Alexander Vol-fovsky. FLAME: A fast large-scale almost matchingexactly approach to causal inference. arXiv preprintarXiv:1707.06315, 2019.

almost-matching-exactly for treatment e ect estimation

Documents