medflex: flexible mediation analysis using natural effect ... · r> expfit

1
Mediation analysis is routinely adopted in a wide range of applied disciplines as a statistical tool to disentangle the causal pathways by which an exposure X affects an outcome Y. Within the counterfactual framework, the mediation formula, can be considered the predominant vehicle for effect decomposition. Despite widespread application, the mediation formula often produces complex expressions for natural direct and indirect effects. For instance, even if no modification by X and/or covariate C levels are allowed for in the working models (for the outcome Y and mediator M), the resulting expressions may still depend on X and/or C in a complicated way. This makes results difficult to report and hypotheses infeasible (or even impossible) to test and may hence pose an impediment to routine application of the mediation formula. Medflex: flexible mediation analysis using natural effect models in R Johan Steen 1 , Tom Loeys 1 , Beatrijs Moerkerke 1 , Theis Lange 2 , Stijn Vansteelandt 1 1 Ghent University, 2 University of Copenhagen expData <- neWeight(M~X+C, family=gaussian, data=data) fit a model for the mediator distribution and calculate regression weights in a single R command: expData <- neImpute(Y~X*M+C, family=binomial, data=data) or fit a model for the outcome mean and impute unobserved Y i (x 0 ,M i (x 1 )) with in a single R command: WEIGHTING-BASED APPROACH 2 IMPUTATION-BASED APPROACH 3,4 i X i x 0 x 1 Y i (x 0 ,M i (x 1 )) w i 1 1 1 1 Y 1 1 1 1 1 0 Y 1 p 1 (0)/p 1 (1) 2 0 0 0 Y 2 1 2 0 0 1 Y 2 p 2 (1)/p 2 (0) i X i x 0 x 1 Y i (x 0 ,M i (x 1 )) 1 1 1 1 Y 1 1 1 0 1 Ŷ 1 (0,M 1 ) 2 0 0 0 Y 2 2 0 1 0 Ŷ 2 (1,M 2 ) i X i x 0 x 1 Y i (x 0 ,M i (x 1 )) 1 1 1 1 Y 1 1 1 1 0 ? 1 1 0 1 ? 1 1 0 0 ? 2 0 0 0 Y 2 2 0 0 1 ? 2 0 1 0 ? 2 0 1 1 ? Create a hypothetical dataset by expanding the original data along unobserved (x 0 , x 1 ) combinations and … 1 fit <- neModel(Y~X0*X1+C, family=binomial, expData=expData) Fit a natural effect model to the expanded data: neEffdecomp(fit) Utility functions for effect decomposition, neLht(fit) or general linear hypotheses 2 3 Beyond the mediation formula: in search of flexibility and parsimony Fitting natural effect models and making statistical inferences using R package medflex 1 in three simple steps What’s in it for practitioners? medflex is freely available on CRAN: http:// cran.r-project.org /web/packages/ medflex / index.html correspondence: [email protected] handles a larger class of parametric working models than software applications that rely on closed-form expressions embedded within framework of existing model-fitting functions in R (mainly glm), allowing estimation on most natural (mostly multiplicative) effect scale (e.g. odds ratios) simplifies testing, especially when dealing with continuous exposures or covariates, as hypotheses of interest can be captured by (a linear combination of) targeted model parameters provides robust standard errors (for glm working models): less computer-intensive than bootstrap or Monte Carlo integration Alternatively, natural effect models focus on direct parameterization of the natural direct and indirect effects of interest (Lange, Vansteelandt & Bekaert, 2012; Vansteelandt, Bekaert & Lange, 2012). Fitting natural effect models entails the use of well- established missing data methods. 1 Steen, J., Loeys, T., Moerkerke, B., & Vansteelandt, S. (2015). Medflex: An R Package for Flexible Mediation Analysis Using Natural Effect Models. Submitted Manuscript. 2 Lange, T., Vansteelandt, S., & Bekaert, M. (2012). A Simple Unified Approach for Estimating Natural Direct and Indirect Effects. American Journal of Epidemiology, 176(3), 190–195. References 3 Vansteelandt, S., Bekaert, M., & Lange, T. (2012). Imputation Strategies for the Estimation of Natural Direct and Indirect Effects. Epidemiologic Methods, 1(1), Article 7. 4 Loeys, T., Moerkerke, B., De Smet, O., Buysse, A., Steen, J., & Vansteelandt, S. (2013). Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula. Multivariate Behavioral Research, 48(6), 871–894. mediation formula working models logitP{ Y = 1|X , M , C } = q 0 + q 1 X + q 2 M + q 3 XM + q 4 C E (M |X , C )= g 0 + g 1 X + g 2 C Z E ( Y |X = x 0 , M = m, C ) dF (M = m|X = x 1 , C ) natural direct effect odds ratio natural indirect effect odds ratio natural effect model logitP{ Y (x 0 , M (x 1 )) = 1| C } = b 0 + b 1 x 0 + b 2 x 1 + b 3 x 0 x 1 + b 4 C odds{ Y (1, M (x 1 ))| C } odds{ Y (0, M (x 1 ))| C } = exp(b 1 + b 3 x 1 ) odds{ Y (x 0 , M (1))| C } odds{ Y (x 0 , M (0))| C } = exp(b 2 + b 3 x 0 ) 0.0 0.4 0.8 total effect total indirect effect pure indirect effect total direct effect pure direct effect Many thanks to Patrick Corrigan for granting permission to reproduce his cartoon Estimate pure direct effect 0.4790 total direct effect 0.5578 pure indirect effect 0.1824 total indirect effect 0.2613 total effect 0.7403 ˆ Y i (x 0 , M i )= ˆ E ( Y i |X i = x 0 , M i , C i ). w i = p i (x 1 )/ p i (x 0 )= ˆ P(M i |X i = x 1 , C i ) ˆ P(M i |X i = x 0 , C i ) . Natural effect models

Upload: buituyen

Post on 22-Apr-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Medflex: flexible mediation analysis using natural effect ... · R> expFit

Mediation analysis is routinely adopted in a wide range of applied disciplines as a statistical tool to disentangle the causal pathways by which an exposure X affects an outcome Y. Within the counterfactual framework, the mediation formula, can be considered the predominant vehicle for effect decomposition.

§  Despite widespread application, the mediation formula often produces complex expressions for natural direct and indirect effects.

§  For instance, even if no modification by X and/or covariate C levels are allowed for in the working models (for the outcome Y and mediator M), the resulting expressions may still depend on X and/or C in a complicated way.

§  This makes results difficult to report and hypotheses infeasible (or even impossible) to test and may hence pose an impediment to routine application of the mediation formula.

Medflex: flexible mediation analysis using natural effect models in RJohan Steen1, Tom Loeys1, Beatrijs Moerkerke1, Theis Lange2, Stijn Vansteelandt1

1Ghent University, 2University of Copenhagen

�  �  

expData <- neWeight(M~X+C, family=gaussian, data=data)

fit a model for the mediator distribution and calculate regression weights

in a single R command:

expData <- neImpute(Y~X*M+C, family=binomial, data=data)

or fit a model for the outcome mean and impute unobserved Yi(x0,Mi(x1)) with

in a single R command:

WEIGHTING-BASED APPROACH2 IMPUTATION-BASED APPROACH3,4

i Xi x0 x1 Yi(x0,Mi(x1)) wi

1 1 1 1 Y1 1

1 1 1 0 Y1 p1(0)/p1(1)

2 0 0 0 Y2 1

2 0 0 1 Y2 p2(1)/p2(0)

⋮ ⋮ ⋮ ⋮ ⋮ ⋮

i Xi x0 x1 Yi(x0,Mi(x1))

1 1 1 1 Y1

1 1 0 1 Ŷ1(0,M1)

2 0 0 0 Y2

2 0 1 0 Ŷ2(1,M2)

⋮ ⋮ ⋮ ⋮ ⋮

i Xi x0 x1 Yi(x0,Mi(x1))

1 1 1 1 Y1

1 1 1 0 ?

1 1 0 1 ?

1 1 0 0 ?

2 0 0 0 Y2

2 0 0 1 ?

2 0 1 0 ?

2 0 1 1 ?

⋮ ⋮ ⋮ ⋮ ⋮

Create a hypothetical dataset by expanding the original data along unobserved (x0, x1) combinationsand …

1

fit <- neModel(Y~X0*X1+C, family=binomial, expData=expData)

Fit a natural effect model to the expanded data:

neEffdecomp(fit)

Utility functions for effect decomposition,

neLht(fit)or general linear hypotheses

2

3

X

M

Y

CBeyond the mediation formula: in search of flexibility and parsimony

Fitting natural effect models and making statistical inferences using R package medflex1 in three simple steps

What’s in it for practitioners?

medflex is freely available on CRAN: http://cran.r-project.org/web/packages/medflex/index.htmlcorrespondence: [email protected]

ü  handles a larger class of parametric working models than software applications that rely on closed-form expressions

ü  embedded within framework of existing model-fitting functions in R (mainly glm), allowing estimation on most natural (mostly multiplicative) effect scale (e.g. odds ratios)

ü  simplifies testing, especially when dealing with continuous exposures or covariates, as hypotheses of interest can be captured by (a linear combination of) targeted model parameters

ü  provides robust standard errors (for glm working models): less computer-intensive than bootstrap or Monte Carlo integration

Alternatively, natural effect models focus on direct parameterization of the natural direct and indirect effects of interest (Lange, Vansteelandt & Bekaert, 2012; Vansteelandt, Bekaert & Lange, 2012).Fitting natural effect models entails the use of well-established missing data methods.  

1Steen, J., Loeys, T., Moerkerke, B., & Vansteelandt, S. (2015). Medflex: An R Package for Flexible Mediation Analysis Using Natural Effect Models. Submitted Manuscript.

2Lange, T., Vansteelandt, S., & Bekaert, M. (2012). A Simple Unified Approach for Estimating Natural Direct and Indirect Effects. American Journal of Epidemiology, 176(3), 190–195.

References3Vansteelandt, S., Bekaert, M., & Lange, T. (2012). Imputation Strategies for the Estimation of Natural Direct and Indirect Effects. Epidemiologic Methods, 1(1), Article 7. 4Loeys, T., Moerkerke, B., De Smet, O., Buysse, A., Steen, J., & Vansteelandt, S. (2013). Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula. Multivariate Behavioral Research, 48(6), 871–894.

mediation formulaworking models

logitP{Y = 1|X ,M,C}=q

0

+q1

X +q2

M+q3

XM+q4

C

E(M|X ,C) = g0 + g1X + g2C

ZE(Y |X = x0,M = m,C)⇥dF(M = m|X = x1,C)

natural direct effect odds ratio

natural indirect effect odds ratio

natural effect modellogitP{Y (x

0

,M(x1

)) = 1|C}=b

0

+b1

x

0

+b2

x

1

+b3

x

0

x

1

+b4

C

odds{Y (1,M(x1

))|C}odds{Y (0,M(x

1

))|C} = exp(b1

+b3

x

1

)

odds{Y (x0

,M(1))|C}odds{Y (x

0

,M(0))|C} = exp(b2

+b3

x

0

)

Johan Steen, Tom Loeys, Beatrijs Moerkerke, Stijn Vansteelandt 29

Applying the plot function to a natural e↵ect model object automatically retains the causale↵ect estimates of interest, generates a linear hypothesis object using neEffdecomp and thenplots its corresponding estimates and confidence intervals, as shown in Figure 4.

R> par(mfrow = c(1, 2))

R> plot(neMod2, xlab = "log odds ratio")

R> plot(neMod2, xlab = "odds ratio", transf = exp)

95% sandwich CIs

log odds ratio

0.0 0.4 0.8

total effect

total indirect effectpure indirect effect

total direct effectpure direct effect

95% sandwich CIs

odds ratio

1.0 1.5 2.0 2.5

total effect

total indirect effectpure indirect effect

total direct effectpure direct effect

Figure 4: E↵ect decomposition on the log odds ratio and odds ratio scales.

The default exposure reference and covariate levels for these plots are the same as for theneEffdecomp function, but can again be altered via the corresponding arguments xRef andcovLev.

7. Population-average natural e↵ects

In all previous sections, we defined natural e↵ects as conditional or stratum-specific e↵ects(i.e., conditional on baseline covariates). However, themedflex package also allows to estimatepopulation-average natural e↵ects. As demonstrated in Appendix A.3 and A.4, rewritingthe mediation formula reveals that estimation of these population-average e↵ects requiresweighting by the reciprocal of the conditional exposure distribution in order to adjust forconfounding (also see Albert 2012; Vansteelandt 2012).

As a consequence, a model for the exposure distribution needs to be fitted and specified asan additional working model, e.g.,

R> expFit <- glm(att ~ gender + educ + age, data = UPBdata)

Since specifying population-average natural e↵ect models using the neModel is equivalent forthe weighting- and imputation-based approaches, in the remainder of this section, we demon-strate how to proceed when adhering to the imputation-based approach. Moreover, whenestimating population-average natural e↵ects, incoherence between imputation and naturale↵ect models is less of a concern as the latter does not require modeling the relation be-tween outcome and covariates. The (first) working model can again be fitted using the samecommands as before:

R> impData <- neImpute(UPB ~ att + negaff + gender + educ + age,

+ family = binomial("logit"), data = UPBdata)

Many thanks to Patrick Corrigan for granting permission to reproduce his cartoon

Estimatepure direct effect 0.4790total direct effect 0.5578pure indirect effect 0.1824total indirect effect 0.2613total effect 0.7403

Y

i

(x0,Mi

) = E(Yi

|Xi

= x0,Mi

,Ci

).w

i

= p

i

(x1)/p

i

(x0) =P(M

i

|Xi

= x1,Ci

)

P(Mi

|Xi

= x0,Ci

).

Natural effect models