cost-effectivenessanalysis · methodsforestimatingcomplier-averagecausaleffectsfor...

40
Methods for estimating complier-average causal effects for cost-effectiveness analysis K. DiazOrdaz, A. J. Franchini, R. Grieve Department of Medical Statistics, London School of Hygiene and Tropical Medicine, [email protected] Abstract In Randomised Controlled Trials (RCT) with treatment non-compliance, instrumental variable approaches are used to estimate complier average causal effects. We extend these approaches to cost-effectiveness analyses, where methods need to recognise the correlation between cost and health outcomes. We propose a Bayesian full likelihood (BFL) approach, which jointly models the effects of random assignment on treatment received and the outcomes, and a three-stage least squares (3sls) method, which acknowledges the correlation between the endpoints, and the endogeneity of the treatment received. This investigation is motivated by the REFLUX study, which exemplifies the setting where compliance differs between the RCT and routine practice. A simulation is used to compare the methods performance. We find that failure to model the correlation between the outcomes and treatment received correctly can result in poor CI coverage and biased estimates. By contrast, BFL and 3sls methods provide unbiased estimates with good coverage. Keywords: non-compliance, instrumental variables, bivariate outcomes, cost-effectiveness 1 Introduction Non-compliance is a common problem in Randomised Controlled Trials (RCTs), as some participants depart from their randomised treatment, by for example switching from the experimental to the con- trol regimen. An unbiased estimate of the effectiveness of treatment assignment can be obtained by reporting the intention-to-treat (ITT) estimand. In the presence of non-compliance, a complimentary estimand of interest is the causal effect of treatment received. Instrumental variable (IV) methods can be used to obtain the complier average causal effect (CACE), as long as random assignment meets the IV criteria for identification (Angrist et al., 1996). An established approach to IV estimation is two-stage least squares (2sls), which provides consistent estimates of the CACE when the outcome measure is continuous, and non-compliance is binary (Baiocchi et al., 2014). Cost-effectiveness analyses (CEA) are an important source of evidence for informing clinical decision- making and health policy. CEA commonly report an ITT estimand, i.e. the relative cost-effectiveness of the intention to receive the intervention (NICE, 2013). However, policy-makers may require ad- ditional estimands, such as the relative cost-effectiveness for compliers. For example, CEAs of new therapies for end-stage cancer, are required to estimate the cost-effectiveness of treatment receipt, recognising that patients may switch from their randomised allocation following disease progression. Alternative estimates such as the CACE may also be useful when levels of compliance in the RCT differ 1 arXiv:1601.07127v2 [stat.ME] 1 Dec 2016

Upload: others

Post on 04-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Methods for estimating complier-average causal effects forcost-effectiveness analysis

K. DiazOrdaz, A. J. Franchini, R. GrieveDepartment of Medical Statistics,

London School of Hygiene and Tropical Medicine,[email protected]

Abstract

In Randomised Controlled Trials (RCT) with treatment non-compliance, instrumental variableapproaches are used to estimate complier average causal effects. We extend these approaches tocost-effectiveness analyses, where methods need to recognise the correlation between cost and healthoutcomes.

We propose a Bayesian full likelihood (BFL) approach, which jointly models the effects ofrandom assignment on treatment received and the outcomes, and a three-stage least squares (3sls)method, which acknowledges the correlation between the endpoints, and the endogeneity of thetreatment received.

This investigation is motivated by the REFLUX study, which exemplifies the setting wherecompliance differs between the RCT and routine practice. A simulation is used to compare themethods performance.

We find that failure to model the correlation between the outcomes and treatment receivedcorrectly can result in poor CI coverage and biased estimates. By contrast, BFL and 3sls methodsprovide unbiased estimates with good coverage.

Keywords: non-compliance, instrumental variables, bivariate outcomes, cost-effectiveness

1 Introduction

Non-compliance is a common problem in Randomised Controlled Trials (RCTs), as some participantsdepart from their randomised treatment, by for example switching from the experimental to the con-trol regimen. An unbiased estimate of the effectiveness of treatment assignment can be obtained byreporting the intention-to-treat (ITT) estimand. In the presence of non-compliance, a complimentaryestimand of interest is the causal effect of treatment received. Instrumental variable (IV) methods canbe used to obtain the complier average causal effect (CACE), as long as random assignment meetsthe IV criteria for identification (Angrist et al., 1996). An established approach to IV estimation istwo-stage least squares (2sls), which provides consistent estimates of the CACE when the outcomemeasure is continuous, and non-compliance is binary (Baiocchi et al., 2014).

Cost-effectiveness analyses (CEA) are an important source of evidence for informing clinical decision-making and health policy. CEA commonly report an ITT estimand, i.e. the relative cost-effectivenessof the intention to receive the intervention (NICE, 2013). However, policy-makers may require ad-ditional estimands, such as the relative cost-effectiveness for compliers. For example, CEAs of newtherapies for end-stage cancer, are required to estimate the cost-effectiveness of treatment receipt,recognising that patients may switch from their randomised allocation following disease progression.Alternative estimates such as the CACE may also be useful when levels of compliance in the RCT differ

1

arX

iv:1

601.

0712

7v2

[st

at.M

E]

1 D

ec 2

016

to those in the target population, or where intervention receipt, rather than the intention to receivethe intervention, is the principal cost driver. Methods for obtaining the CACE for univariate survivaloutcomes have been exemplified before (Latimer et al., 2014), but approaches for obtaining estimatesthat adequately adjust for non-adherence in CEA more generally, have received little attention. Thishas been recently identified as a key area where methodological development is needed (Hughes et al.,2016).

The context of trial-based CEA highlights an important complexity that arises with multivariateoutcomes more widely, in that, to provide accurate measures of the uncertainty surrounding a compositemeasure of interest, for example the incremental net monetary benefit (INB), it is necessary to recognisethe correlation between the endpoints, in this case, cost and health outcomes (Willan et al., 2003;Willan, 2006). Indeed, when faced with non-compliance, and the requirement to estimate a causaleffect of treatment on cost-effectiveness endpoints, some CEA resort to per protocol (PP) analyses(Brilleman et al., 2015), which exclude participants who deviate from treatment. As non-complianceis likely to be associated with prognostic variables, only some of which are observed, PP analyses areliable to provide biased estimates of the causal effect of the treatment received.

This paper develops novel methods for estimating CACE in CEA that use data from RCTs withnon-compliance. First, we propose using the three stage least squares (3sls) method (Zellner and Theil,1962), which allows the estimation of a system of simultaneous equations with endogenous regressors.Next, we consider a bivariate version of the ‘unadjusted Bayesian’ models previously proposed forMendelian randomisation (Burgess and Thompson, 2012), which simultaneously estimate the expectedtreatment received as a function of random allocation, and the mean outcomes as a linear functionof the expected treatment received. Finally, we develop a Bayesian full likelihood approach (BFL),whereby the outcome variables and the treatment received are jointly modelled as dependent on therandom assignment. This is an extension to the multivariate case of what is known in the econometricsliterature as the IV unrestricted reduced form (Kleibergen and Zivot, 2003).

The aim of this paper is to present and compare these alternative approaches. The problemis illustrated in Section 2 with the REFLUX study, a multicentre RCT and CEA that contrastslaparoscopic surgery with medical management for patients with Gastro-Oesophageal Reflux Disease(GORD). Section 3 introduces the assumptions and methods for estimating CACEs. Section 4 presentsa simulation study used to assess the performance of the alternative approaches, which are then appliedto the case study in Section 5. We conclude with a Discussion (Section 6), where we consider thefindings from this study in the context of related research.

2 Motivating example: Cost-effectiveness analysis of the REFLUXstudy

The REFLUX study was a UK multicentre RCT with a parallel design, in which patients with moder-ately severe GORD, were randomly assigned to medical management or laparoscopic surgery (Grantet al., 2008, 2013).

The RCT randomised 357 participants (178 surgical, 179 medical) from 21 UK centres. An obser-vational preference based study was conducted alongside it, which involved 453 preference participants(261 surgical, 192 medical).

For the cost-effectiveness analysis within the trial, individual resource use (costs in £ sterling) andhealth-related quality of life (HRQoL), measured using EQ5D (3 levels), were recorded annually forup to 5 years. The HRQoL data were used to adjust life years and present quality-adjusted life years

2

(QALYs) over the follow-up period (Grant et al., 2013).1 As is typical, the costs were right-skewed.Table 1 reports the main characteristics of the data set.

The original CEA estimated the linear additive treatment effect on mean costs and health outcomes(QALYs). The primary analysis used a system of seemingly unrelated regression equations (SURs)(Zellner, 1962; Willan et al., 2004), adjusting for baseline HRQoL EQ5D summary score (denoted byEQ5D0). The SURs can be written for cost Y1i and QALYs Y2i, as follows

Y1i = β0,1 + β1,1treati + β1,2EQ5D0i + ε1i

Y2i = β0,2 + β1,2treati + β2,2EQ5D0i + ε2i(1)

where β1,1 and β1,2 represent the incremental costs and QALYs respectively. The error terms arerequired to satisfy E[ε1i] = E[ε2i] = 0, E[εkiεk′i] = σkk′ , E[εkiεk′j ] = 0, for k, k′ ∈ {1, 2}, and for i 6=j. Rather than assuming that the errors are drawn from a bivariate normal distribution, estimationis usually done by the feasible generalized least squares (FGLS) method2. This is a two-step methodwhere, in the first step, we run ordinary least squares estimation for equation (1). In the second step,residuals from the first step are used as estimates of the elements σkk′ of the covariance matrix, andthis estimated covariance structure is then used to re-estimate the coefficients in equation (1) (Zellner,1962; Zellner and Huang, 1962).

In addition to reporting incremental costs and QALYs, CEA often report the incremental cost-effectiveness ratio (ICER), which is defined as the ratio of the incremental costs per incrementalQALY, and the incremental net benefit (INB), defined as INB(λ) = λβ1,2 − β1,1, where λ representsthe decision-makers’ willingness to pay for a one unit gain in health outcome. Thus the new treatmentis cost-effective if INB > 0. For a given λ, the standard error of INB can be estimated from theestimated increments β̂1,1 and β̂1,2, together with their standard errors and their correlation followingthe usual rules for the variance of a linear combination of two random variables. The willingness to payλ generally lies in a range, so it is common to compute the estimated value of INB for various valuesof λ. In REFLUX, the reported INB was calculated using λ = £30000, which is within the range ofcost-effectiveness thresholds used by the UK National Institute for Health and Care Excellence (NICE,2013).

The original ITT analysis concluded that, compared to medical management, the arm assigned tosurgery had a large gain in average QALYs, at a small additional cost and was relatively cost-effectivewith a positive mean INB, albeit with 95% confidence intervals (CI) that included zero. However,these ITT results cannot be interpreted as a causal effect of the treatment, since within one year ofrandomisation, 47 of those randomised to surgery switched and received medical management, whilein the medical treatment arm, 10 received surgery. The reported reasons for not having the allocatedsurgery were that in the opinion of the surgeon or the patient, the symptons were not “sufficiently severe”or the patient was judged unfit for surgery (e.g. overweight). The preference-based observational studyconducted alongside the RCT reported that in routine clinical practice, the corresponding proportionwho switched from an intention to have surgery and received medical management, was relativelylow (4%), with a further 2% switching from control to intervention (Grant et al., 2013). Since thepercentage of patients who switched in the RCT was higher than in the target population and thecosts of the receipt of surgery are relatively large, there was interest in reporting a causal estimate ofthe intervention. Thus, the original study also reported a PP analysis on complete-cases, adjusted forbaseline EQ5D0, which resulted in an ICER of £7263 per additional QALY (Grant et al., 2013). Thisis not an unbiased estimate of the causal treatment effect, so in Section 5, we re-analyse the REFLUX

1There was no administrative censoring.2If we are prepared to assume the errors are bivariate normal, estimation can proceed by maximum likelihood.

3

dataset to obtain a CACE of the cost-effectiveness outcomes, recognising the joint distribution of costsand QALYs, using the methods described in the next section.

3 Complier Average Causal effects with bivariate outcomes

We begin by defining more formally our estimands and assumptions. Let Y1i and Y2i be the continuousbivariate outcomes, and Zi and Di the binary random treatment allocation and treatment receivedrespectively, corresponding to the i-th individual. The bivariate endpoints Y1i and Y2i belong to thesame individual i, and thus are correlated. We assume that there is an unobserved confounder U ,which is associated with the treatment received and either or both of the outcomes. From now on,we will assume that the (i) Stable Unit Treatment Value Assumption (SUTVA) holds: thepotential outcomes of the i-th individual are unrelated to the treatment status of all other individuals(known as no interference), and that for those who actually received treatment level z, their observedoutcome is the potential outcome corresponding to that level of treatment.

Under SUTVA, we can write the potential treatment received by the i-th subject under the randomassignment at level zi ∈ {0, 1} asDi (zi). Similarly, Y`i (zi, di) with ` ∈ {1, 2} denotes the correspondingpotential outcome for endpoint `, if the i-th subject were allocated to level zi of the treatment andreceived level di. There are four potential outcomes. Since each subject is randomised to one level oftreatment, only one of the potential outcomes per endpoint `, is observed, i.e. Y`i = Y`i(zi, Di(zi)) =

Yi(zi).The CACE for outcome ` can now be defined as

θ` = E[{Y`i(1)− Y`i(0)}

∣∣{Di(1)−Di(0) = 1}]. (2)

In addition to (i) SUTVA, the following assumptions are sufficient for identification of the CACE,(Angrist et al., 1996):

(ii) Ignorability of the treatment assignment: Zi is independent of unmeasured confounders(conditional on measured covariates) and the potential outcomes Zi ⊥⊥ Ui, Di(0), Di(1), Yi(0), Yi(1).

(iii) The random assignment predicts treatment received: Pr{Di(1) = 1} 6= Pr{Di(0) = 1}.(iv) Exclusion restriction: The effect of Z on Y` must be via an effect of Z on D; Z cannot affectY` directly.(v) Monotonicity: Di(1) ≥ Di(0).

The CACE can now be identified from equation (2) without any further assumptions about theunobserved confounder; in fact, U can be an effect modifier of the relationship of D and Y (Didilezet al., 2010).

In the REFLUX study, the assumptions concerning the random assignment, (ii) and (iii), arejustified by design. The exclusion restriction assumption seems plausible for the costs, since the costsof surgery are only incurred if the patient actually has the procedure. We argue that it is also plausiblethat it holds for QALYs, as the participants did not seem to have a preference for either treatment,thus making the psychological effects of knowing to which treatment one has been allocated minimal.The monotonicity assumption rules out the presence of defiers. It seems fair to assume that thereare no participants who would refuse the REFLUX surgery when randomised to it, but who wouldreceive surgery when randomised to receive medical management. Equation (2) implicitly assumesthat receiving the intervention has the same average effect in the linear scale, regardless of the level ofZ and U . This average is however across different ‘versions’ of the intervention, as the trial protocoldid not prescribe a single surgical procedure, but allowed for the surgeon to choose their preferredlaparoscopy method, as would be the case in routine clinical practice.

4

Since random allocation, Z, satisfies assumptions (ii)-(iv), we say it is an instrument (or instru-mental variable) for D. For binary instrument, the simplest method of estimation of equation (2) inthe IV framework is the Wald estimator (Angrist et al., 1996):

θ̂`,IV =E(Y |Z = 1)− E(Y |Z = 0)

E(D|Z = 1)− E(D|Z = 0)

Typically, estimation of these conditional expectations proceeds via an approach known as two-stageleast squares (2sls). The first stage fits a linear regression to treatment received on treatment assigned.Then, in a second stage, a regression model for the outcome on the predicted treatment received isfitted:

Di = α0 + α1Zi + ω1i

Y`i = β0 + βIV D̂i + ω2i (3)

where β̂IV is an estimator for θ`. Covariates can be used, by including them in both stages of themodel. To obtain the correct standard errors for the 2sls estimator, it is necessary to take into accountthe uncertainty about the first stage estimates. The asymptotic standard error for the 2sls CACE isgiven in Imbens and Angrist (1994), and implemented in commonly used software packages.

OLS estimation produces first-stage residuals ω1i that are uncorrelated with the instrument, andthis is sufficient to guarantee that the 2sls estimator is consistent for the CACE (Angrist et al., 2008).Therefore, we restrict our attention here to models where the first-stage equation is linear, even thoughthe treatment received is binary. 3

A key issue for settings such as CEA where there is interest in estimating the CACE for bivari-ate outcomes, is that 2sls as implemented in most software packages can only be readily applied tounivariate outcomes. Ignoring the correlation between the two endpoints is a concern for obtainingstandard errors of composite measures of the outcomes, e.g. INB, as this requires accurate estimatesof the covariance between the outcomes of interest (e.g. costs and QALYs).

A simple way to address this problem would be to apply 2sls directly to the composite measure,i.e. a net-benefit two-stage regression approach (Hoch et al., 2006). However, it is known that netbenefit regression is very sensitive to outliers, and to distributional assumptions (Willan et al., 2004),and has been recently shown to perform poorly when these assumptions are thought to be violated(Mantopoulos et al., 2016). Moreover, such net benefit regression is restrictive, in that it does notallow separate covariate adjustment for each of the component outcomes, (e.g. baseline HRQoL, forthe QALYs as opposed to the costs). In addition, this simple approach would not be valid for estimatingthe ICER, which is a non-linear function of the incremental costs and QALYs. For these reasons, wedo not consider this approach further. Rather, we present here three flexible strategies for estimatinga CACE of the QALYs and the costs, jointly. The first approach combines SURs (equation 1) and2sls (equation 3) to obtain CACEs for both outcomes accounting for their correlation. This simpleapproach is known in the econometrics literature as three-stage least squares (3sls).

3.1 Three-stage least squares

Three-stage least squares (3sls) was developed for SUR systems with endogenous regressors, i.e. anyexplanatory variables which are correlated with the error term in equations (1) (Zellner and Theil,1962). All the parameters appearing in the system are estimated jointly, in three stages. The first two

3Non-linear versions of the 2sls exist. See for example Clarke and Windmeijer (2012) for an excellent review ofmethods for binary outcomes.

5

stages are as for 2sls, but with the second stage applied to each of the outcomes.

(1st stage): Di = α0 + α1Zi + e0i

(2nd stage): Y1i = β01 + βIV,1D̂i + e1i (4)

Y2i = β02 + βIV,2D̂i + e2i (5)

As with 2sls, the models can be extended to include baseline covariates. The third stage is the samestep used on a SUR with exogenous regressors (equation (1)) for estimating the covariance matrix ofthe error terms from the two equations (4) and (5). Thus, because we are assuming that Z satisfiesthe identification assumptions (i)-(v), Z is independent of the residuals at first and second stage, i.e.Z ⊥⊥ e0i, Z ⊥⊥ e1i, and Z ⊥⊥ e2i. Then, the 3sls procedure allows us to obtain the covariance matrixbetween the residuals e1i and e2i. As with SURs, the 3sls approach does not require to make anydistributional assumptions, as estimation can be done by FGLS, and it is robust to heteroscedasticityof the errors in the linear models for the outcomes (Greene, 2002). We note that the 3sls estimatorbased on FGLS is consistent only if the error terms in each equation of the system and the instrumentare independent, which is likely to hold here, as we are dealing with a randomised instrument. Insettings where this condition is not satisfied, other estimation approaches such as generalised methodsof moments (GMM) warrant consideration (Schmidt, 1990). In the just-identified case, i.e. when thereare as many endogenous regressors as there are instruments, classical theory about 3sls estimatorsshows that the GMM and the FGLS estimators coincide (Greene, 2002). As the 3sls method uses anestimated variance-covariance matrix, it is only asymptotically efficient (Greene, 2002).

3.2 Naive Bayesian estimators

Bayesian models have a natural appeal for cost-effectiveness analyses, as they afford us the flexibilityto estimate bivariate models on the expectations of the two outcomes using different distributions, asproposed by (Nixon and Thompson, 2005). These models are often specified by writing a marginalmodel for one of the outcomes, e.g. the costs Y1, and then, a model for Y2, conditional on Y1.

For simplicity of exposition, we begin by assuming normality for both outcomes and no adjustmentfor covariates. We have a marginal model for Y1 and a model for Y2 conditional on Y1 (Nixon andThompson, 2005)

Y1i ∼ N(µ1i, σ21) µ1i = β0,1 + β1,1treati (6)

Y2i | Y1i ∼ N(µ2i, σ22(1− ρ2)) µ2i = β0,2 + β1,2treati + β2,2(y1i − µ1i), (7)

where ρ is the correlation between the outcomes. The linear relationship between the two outcomes isrepresented by β2,2 = ρσ2σ1 .

Because of the non-compliance, in order to obtain a causal estimate of treatment, we need to add alinear model for the treatment received, dependent on randomisation Zi, similar to the first equationof a 2sls. Formally, this model (denoted uBN, for unadjusted Bayesian Normal) can be written withthree equations as follows:

Di ∼ N(µ0i, σ20)

Y1i ∼ N(µ1i, σ21)

Y2i∣∣Y1i ∼ N(µ2i, σ

22(1− ρ2))

µ0i = β0,0 + β1,0Zi

µ1i = β0,1 + β1,1µ0i

µ2i = β0,2 + β1,2µ0i + β2,2(y1i − µ1i)(8)

This model is a bivariate version of the ‘unadjusted Bayesian’ method previously proposed for Mendelianrandomisation (Burgess and Thompson, 2012). It is called unadjusted, because the variance structureof the outcomes is assumed to be independent of the treatment received. The causal treatment effect

6

for outcome Y`, with ` ∈ {1, 2}, is represented by β1,` in equations (8). We use the Fisher’s z-transformof ρ, i.e. z = 1

2 log(1+ρ1−ρ

), for which we assume a vague normal prior, i.e. z ∼ N(0, 102). We also use

vague multivariate normal priors for the regression coefficient (with a precision of 0.01). For standarddeviations, we use σj ∼ Unif(0, 10), for j ∈ {0, 1, 2}. This is similar to the priors used in (Lancaster,2004), and are independent of the regression coefficient of treatment received on treatment allocationβ1,0.

Cost data are notoriously right-skewed, and Gamma-distributions are often used to model them.Thus, we can relax the normality assumption of equation (8), and model Y1 (i.e. cost) with a Gammadistribution, and treatment received (binary), with a logistic regression. The health outcomes, Y2, arestill modelled with a normal distribution, as is customary. Because we are using a non-linear model forthe treatment received, we use the predicted raw residuals from this model as extra regressors in theoutcome models, similar to the 2-stage residual inclusion estimator (Terza et al., 2008). We model Y1by its marginal distribution (Gamma) and Y2 by a conditional Normal distribution, given Y1 (Nixonand Thompson, 2005). We call this model ‘unadjusted Bayesian Gamma-Normal’ (uBGN) and writeit as follows

Di ∼ Bern(πi)

Y1i ∼ Gamma(ν1i, κ1)

Y2i | Y1i ∼ N(µ2i, σ22(1− ρ2))

logit(πi) = α0 + α1Zi

ri = Di − πiµ1i = β0,1 + β1,1Di + β1,rri

µ2i = β0,2 + β1,2Di + β2,rri + β2,2(y1i − µ1i)

(9)

where µ1 = ν1κ1, is the mean of the Gamma distributed costs, with shape ν1 and rate κ1. Again,

we express β2,2 = ρσ2σ1 , and assume a vague Normal prior on the Fisher’s z-transform of ρ, z ∼N(0, 102).The prior distribution for ν1 is Gamma(0.01, 0.01). We also assume a Gamma prior for theintercept term of the cost equation, β0,1 ∼ Gamma(0.01, 0.01). All the other regression parametershave the same priors as those used in the uBN model.

The models introduced in this section, uBN and uBGN, are estimated in one stage, allowing feed-back between the regression equations and the propagation of uncertainty. However, these ’unadjusted’methods ignore the correlation between the outcomes and the treatment received. This misspecifica-tion of the covariance structure may result in biases in the causal effect, which are likely to be moreimportant at higher levels of non-compliance.

3.3 Bayesian simultaneous equations (BFL)

We now introduce an approach that models the covariance between treatment received and outcomesappropriately, using a system of simultaneous equations. This can be done via full or limited informa-tion maximum likelihood, or by using MCMC to estimate the parameters in the model simultaneouslyallowing for proper Bayesian feedback and propagation of uncertainty. Here, we propose a Bayesianapproach which is an extension of the methods presented in Burgess and Thompson (2012); Kleibergenand Zivot (2003); Lancaster (2004).

This method treats the endogenous variable D and the cost-effectiveness outcomes as covariant andestimates the effect of treatment allocation, as follows. Let (Di, Y1i, Y2i)

> be the transpose of vectorof outcomes, which now includes treatment received, as well as the bivariate endpoints of interest. Wetreat all three variables as multivariate normally distributed, so thatDi

Y1i

Y2i

∼ N µ0i

µ1i

µ2i

,Σ =

σ20 s01 s02

s01 σ21 s12

s02 s12 σ22

;

µ0i = β0,0 + β1,0Zi

µ1i = β0,1 + β1,1β1,0Zi

µ2i = β0,2 + β1,2β1,0Zi

(10)

7

where sij = cov(Yi, Yj), and the causal treatment effect estimates are β1,1 and β1,2 respectively. Forthe implementation, we use vague normal priors for the regression coefficients, i.e. βm,j ∼ N(0, 102),for j ∈ {0, 1, 2},m ∈ {0, 1}, and a Wishart prior for the inverse of Σ (Gelman and Hill, 2006).

4 Simulation study

We now use a factorial simulation study to assess the finite sample performance of the alternativemethods. The first factor is the proportion of participants who do not comply with the experimentalregime, when assigned to it, expressed as a percentage of the total (one-sided non-compliance). Bias isexpected to increase with increasing levels of non-compliance. A systematic review (Dodd et al., 2012)found that the percentage of non-compliance was less than 30% in two-thirds of published RCTs, butgreater than 50% in one-tenth of studies. Here, two levels of non-compliance are chosen, 30% and 70%.As costs are typically skewed, three different distributions (Normal, Gamma or Inverse Gaussian – IG)are used to simulate cost data. As the 2sls approach fails to accommodate the correlation between theendpoints, we examined the impact of different levels of correlation on the methods’ performance; ρtakes one of four values ±0.4, ±0.8. The final factor is the sample size of the RCT, taking two settingsn = 100, and 1000. In total, there are 2× 3× 4× 2 = 48 simulated scenarios.

To generate the data, we begin by simulating U ∼ N(0.50, 0.252), independently from treatmentallocation. U represents a pre-randomisation variable that is a common cause of both the outcomesand the probability of non-compliance, i.e. it is a confounding variable, which we assume is unobserved.

Now, let Si ∼ Bern(πs) be the random variable denoting whether the ith individual switches fromallocated active treatment to control. The probability πs of one-way non-compliance with allocatedtreatment depends on U , in the following way,

πs =

p+ 0.1, if u > 0.5,

p− 0.1, otherwise(11)

where p denotes the corresponding average non-compliance percentage expressed as a probability, i.ehere p ∈ {0.3, 0.7}. We now generate Di, the random variable of treatment received as

Di =

Zi, if either si = 0 or Zi = 0,

1− Zi, if si = 1 and Zi = 1(12)

where Zi denotes the random allocation for subject i.Then, the means for both outcomes are assumed to depend linearly on treatment received and the

unobserved confounder U as follows:

µ1 = E[Y1] = 1.2 + 0.4Di + 0.16(ui − 0.5) (13)

µ2 = E[Y2] = 0.5 + 0.2Di + 0.04(ui − 0.5) (14)

Finally, the bivariate outcomes are generated using Gaussian copulas, initially with normal marginals.In subsequent scenarios, we consider Gamma or Inverse Gaussian marginals for Y1 and normal for Y2.The conditional correlation between the outcomes, ρ, is set according to the corresponding scenario.

For the scenarios where the endpoints are assumed to follow a bivariate normal distribution, thevariances of the outcomes are set to σ21 = 0.22, σ22 = 0.12 respectively, while for scenarios with Gammaand IG distributed Y1, the shape parameter is η = 4. For the Gamma case, this gives a variance forY1 equal to σ21 = 0.36 in control and σ21 = 0.64 in the intervention group. When Y1 ∼ IG(µ1, η), theexpected variance in the control group is σ21 = 0.432, and σ21 = 1.024 in those receiving the intervention.

8

The simulated endpoints represent cost-effectiveness variables that have been rescaled for compu-tational purposes, with costs divided by 1000, and QALYs by 0.1, such that the true values are £400

(incremental costs), and 0.02 (incremental QALYs) and so with a threshold value of λ = £30000 perQALY, the true causal INB is £200.

For each simulated scenario, we obtained M = 2500 sets. For the Bayesian analyses, we use themedian of the posterior distribution as the ‘estimate’ of the parameter of interest, and the standarddeviation of the posterior distribution as the standard error. Equi-tailed 95% posterior credible intervalsare also obtained. We use the term confidence interval for the Bayesian credible intervals henceforth,to have a unified terminology for both Bayesian and frequentist intervals.

Once the corresponding causal estimate has been obtained in each of the 2500 replicated sets undereach scenario in turn, we compute the median bias of the estimates, coverage of 95% confidence intervals(CI), median CI width and root mean square error (RMSE). We report median bias as opposed to meanbias, because the BFL leads to a posterior distribution of the causal parameters which is Cauchy-like(Kleibergen and Zivot, 2003). A method is ‘adequate’, if it results in low levels of bias (median bias≤ 5%) with coverage rates within 2.5% of the nominal value.

Implementation:The 3sls was fitted using systemfit package in R using FGLS, and the Bayesian methods were runusing JAGS from R (r2jags). Two chains, each one with 5000 initial iterations and 1000 burn-inwere used. The multiple chains allowed for a check of convergence by the degree of their mixing andthe initial iterations enabled to estimate iteration autocorrelation. A variable number of further 1000-iteration runs were performed until convergence was reached as estimated by the absolute value of theGeweke statistics for the first 10% and last 50% of iterations in a run being below 2.5. A final additionalrun of 5000 iterations was performed for each chain to achieve a total sample of 10000 iterations, anda MC error of about 1% of the parameter SE on which to base the posterior estimates. For the uBGN,an offset of 0.01 is added to the shape parameter ν1 for the Gamma distribution of the cost, to preventthe sampled shape parameter to become too close to zero, which may result in infinite densities. Seethe Appendix for the JAGS model code for BFL.

4.1 Simulation Study Results

BiasFigure 1 shows the median bias corresponding to scenarios with 30% non-compliance, by cost distri-butions (left to right) and levels of correlation between the two outcomes, for sample sizes of n = 100

(upper panel) and n = 1000 (lower panel). With the larger sample size, for all methods, bias is neg-ligible with normally distributed costs, and remain less than 5% when costs are Gamma-distributed.However, when costs follow an Inverse Gaussian distribution, and the absolute levels of correlationbetween the endpoints are high (±0.8), the uBGN approach results in biased estimates, around 10%bias for the estimated incremental cost, and between 20 and 40% for the estimated INB. With thesmall sample size and when costs follow a Gamma or Inverse Gaussian distribution, both unadjustedBayesian methods provide estimates with moderate levels of bias. With 70% non-compliance (Fig-ure A3 in the Appendix), the unadjusted methods result in important biases which persist even withlarge sample sizes, especially for scenarios with non-normal outcomes. For small sample settings, uBNreports positive bias (10 to 20%) in the estimation of incremental QALYs, and the resulting INB,irrespective of the cost distribution. The uBGN method reports relatively unbiased estimates of theQALYs, but large positive bias (up to 60%) in the estimation of costs, and hence, there is substan-tial bias in the estimated INB (up to 200%). Unadjusted Bayesian methods estimates are biased inthe direction of the true causal effect, because these methods ignore the positive correlation between

9

treatment received and each outcome introduced by the confounding variable. By contrast, the BFLand the 3sls provide estimates with low levels of bias across most settings.CI coverage and widthTable 2 presents the results for CI coverage and width, for scenarios with a sample size of n = 100,absolute levels of correlation between the endpoints of 0.4, and 30% non-compliance. All other resultsare shown in the Appendix. The 2sls INB ignores the correlation between costs and QALYs, and thus,depending on the direction of this correlation, 2sls reports CI coverage that is above (positive corre-lation) or below (negative correlation) nominal levels. This divergence from nominal levels increaseswith higher absolute levels of correlation (see Appendix, Table A6).

The uBN approach results in over-coverage across many settings, with wide CIs. For example,for both levels of non-compliance and either sample size, when the costs are Normal, the CI coveragerates for both incremental costs and QALYs exceed 0.98. The interpretation offered by Burgess andThompson (2012) is also relevant here: the uBN assumes that the treatment received and the outcomesvariance structures are uncorrelated, and so when the true correlation is positive, the model overstatesthe variance and leads to wide CIs. By contrast, the uBGN method results in low CI coverage ratesfor the estimation of incremental costs, when costs follow an inverse Gaussian distribution. This isbecause the model incorrectly assumes a Gamma distribution, thereby underestimating the variance.The extent of the under-coverage appears to increase with higher absolute values of the correlationbetween the endpoints, with coverage as low as 0.68 (incremental costs) and 0.72 (INB) in scenarioswhere the absolute value of correlation between costs and QALYs is 0.8. (see Appendix, Table A7).

The BFL approach reports estimates with CI coverage close to the nominal when the sample sizeis large, but with excess coverage (greater than 0.975), and relatively wide CI, when the sample size isn = 100 (see Table 2 for 30% noncompliance, and Table A4 in the Appendix for the result correspond-ing to 70% non-compliance). By contrast, the 3sls reports CI coverage within 2.5% of nominal levelsfor each sample size, level of non-compliance, cost distribution and level of correlation between costsand QALYs.RMSETable 2 reports RMSE corresponding to 30% non-compliance, and n = 100. The least squares ap-proaches result in lower RMSE than the other methods for the summary statistic of interest, the INB.This pattern is repeated across other settings, see the Appendix, Tables A10–A16.

5 Results for the motivating example

We now compare the methods in practice by applying them to the REFLUX dataset. Only 48% ofthe individuals have completely observed cost-effectiveness outcomes: there were 185 individuals withmissing QALYs, 166 with missing costs, and a further 13 with missing EQ5D0 at baseline, with abouta third of those with missing outcomes having switched from their allocated treatment. These missingdata not only bring more uncertainty to our analysis, but more importantly, unless the missing dataare handled appropriately can lead to biased causal estimates (Daniel et al., 2012). A complete caseanalysis would be unbiased, albeit inefficient, if the missingness is conditionally independent of theoutcomes given the covariates in the model (White et al., 2010), even when the covariates have missingdata, as is the case here.4 Alternatively, a more plausible assumption is to assume the missing dataare missing at random (MAR), i.e. the probability of missingness depends only on the observed data,and use multiple imputation (MI) or a full Bayesian analysis to obtain valid inferences.

Therefore, we perform MI prior to carrying out 2sls and 3sls analyses. We begin by investigating4This mechanism is a special case of missing not at random.

10

all the possible associations between the covariates available in the data set and the missingness,univariately for costs, QALYs and baseline EQ5D0. Covariates which are predictive of both, themissing values and the probability of missing, are to be included in the imputation model as auxiliaryvariables, as conditioning on more variables helps make the MAR assumption more plausible. Noneof the fully observed variables at our disposal (age, sex, and baseline BMI) satisfies these criteriaand therefore, we do not include any auxiliary variables in our imputation models. Thus, we imputetotal cost, total QALYs and baseline EQ5D0, 50 times by chained equations, using predictive meanmatching (PMM), taking the 5 nearest neighbours as donors (White et al., 2011), including treatmentreceived in the imputation model and stratifying by treatment allocation. We perform 2sls on costsand QALYs independently and calculate (within MI) SE for the INB assuming independence betweencosts and QALYs. For the 3sls approach, the model is fitted to both outcomes simultaneously, andthe post-estimation facilities are used to extract the variance-covariance estimate, and compute theestimated INB and its corresponding SE. We also use the CACE estimates of incremental cost andQALYs to obtain the ICER. After applying each method to the 50 MI sets, we combine the resultsusing Rubin’s rules (Rubin, 1987).5

For the Bayesian approaches, the missing values become extra parameters to model. Since baselineEQ5D0 has missing observations, a model for its distribution is added EQ5D0 ∼ N(µq0, σ

2q0), with a

vaguely informative prior for µq0 ∼ Unif(−0.5, 1), and an uninformative prior for |σq0| ∼ N(0, 0.01).We add two extra lines of code to the models to obtain posterior distributions for INB and ICERs.We center the outcomes around the empirical mean (except for costs, when modelled as Gamma) andre-scale the costs (dividing by 1000) to improve mixing and convergence. We use two chains, initiallyrunning 15,000 iterations with 5,000 as burn-in. After checking visually for auto-correlation, an extra10,000 iterations are needed to ensure that the density plots of the parameters corresponding to thetwo chains are very similar, denoting convergence to the stationary distribution. Enough iterations foreach chain are kept to make the total effective sample (after accounting for auto-correlation) equal to10,000. 6

Table 1 shows the results for incremental costs, QALYs and INB for the motivating example adjustedfor baseline EQ5D0. Bayesian posterior distribution are summarised by their median value and 95%credible intervals. The CACEs are similar across the methods, except for uBGN, where the incrementalQALYs CACE is nearly halved, resulting in a smaller INB with a CI that includes 0. In line with thesimulation results, this would suggest that, where the uBGN is misspecified according to the assumedcost distribution, it can provide a biased estimate of the incremental QALYs.

Comparing the CACEs to the ITT, we see that the incremental cost estimates increases betweenan ITT and a CACE, as actual receipt of surgery carries with it higher costs that the mere offeringof surgery does not. Similarly, the incremental QALYs are larger, meaning that amongst compli-ers, those receiving surgery have a greater gain in quality of life, over the follow-up period. TheCACE for costs are relatively close to the per-protocol incremental costs reported in the original study,£2324 (1780, 2848). In contrast, the incremental QALYs according to PP on complete-cases originallyreported was 0.3200 (0.0837, 0.5562), considerably smaller than our CACE estimates (Grant et al.,2013). The ITT ICER obtained after MI was £4135, while using causal incremental costs and QALYs,the corresponding estimates of the CACE for ICER were £4140 (3sls), £5189 (uBN), £5960 (uBGN),

5Applying IV 2sls and 3sls with multiply imputed datasets, and combining the results using Rubin’s rules can bedone automatically in Stata using mi estimate, cmdok: ivregress 2sls and mi estimate, cmdok: reg3. In R,ivregress can be used with with.mids command within mice, but systemfit cannot presently be combined with thiscommand so, Rubin’s rules have to be coded manually. Sample code is available in the Appendix.

6Multivariate normal nodes cannot be partially observed in JAGS, thus, we run BFL models on all available datawithin WinBUGs. An observation with zero costs was set to missing when running the Bayesian Gamma model, whichrequires strictly positive costs.

11

and £3948 (BFL). The originally reported per-protocol ICER is £7932 per extra QALY was obtainedon complete cases only (Grant et al., 2013).

These results may be sensitive to the modelling of the missing data. As a sensitivity analysis to theMAR assumption, we present the complete case analysis in Table A1 in the Appendix. The conclusionsfrom complete-case analysis are similar to those obtained under MAR.

We also explore the sensitivity to choices of priors, by re-running the BFL analyses using differentpriors, first for the multivariate precision matrix, keeping the priors for the coefficients normal, andthen a second analysis, with uniform priors for the regression coefficient, and an inverse Wishart priorwith 6 degrees of freedom and a identity scale matrix, for the precision. The results are not materiallychanged (see Appendix, Table A2).

The results of the within-trial CEA suggest that amongst compliers, laparoscopy is more cost-effective than medical management for patients suffering from GORD. The results are robust to thechoice of priors, and to the assumptions about the missing data mechanism. The results for theuBGN differ somewhat from the other models, and as our simulations show, the concern is that suchunadjusted Bayesian models are prone to bias from model misspecification.

6 Discussion

This paper extends existing methods for CEA (Willan et al., 2003; Nixon and Thompson, 2005), by pro-viding IV approaches for obtaining causal cost-effectiveness estimates for RCTs with non-compliance.The methods developed here however are applicable to other settings with multivariate continuousoutcomes more generally, for example RCTs in education, with different measures of attainment beingcombined into an overall score. To help dissemination, we provide code in the Appendix.

We proposed exploiting existing 3sls methods and also considered IV Bayesian models, whichare extensions of previously proposed approaches for univariate continuous outcomes. Burgess andThompson (2012) found the BFL was median unbiased and gave CI coverage close to nominal levels,albeit with wider CIs than least-squares methods. Their ‘unadjusted Bayesian’ method, similar toour uBN approach, assumes that the error term for the model of treatment received on treatmentallocated is uncorrelated with the error from the outcome models. This results in bias in the directionof the true association, and affects the CI coverage. Our simulation study shows that, in a setting withmultivariate outcomes, the bias can be substantial. A potential solution to this could be using priors forthe error terms that reflect this dependency explicitly. For example, Rossi et al. (2012) propose using aprior for the errors that explicitly depends on the coefficient β1,0, the effect of treatment allocation ontreatment received, in equation (8). Kleibergen and Zivot (2003) propose priors that also reflect thisdependency explicitly, and replicate better the properties of the 2sls. This is known as the “Bayesiantwo-stage approach”.

The results of our simulations show that applying 2sls separately to the univariate outcomes leadsto inaccurate 95% CI around the INB, even with moderate levels of correlation between costs andoutcomes (±0.4). Across all the settings considered, the 3sls approach resulted in low levels of bias forthe INB and unlike 2sls, provided CI coverage close to nominal levels. BFL performed well with largesample sizes, but produced standard deviations which were too large when the sample size was small,as can be seen from the over-coverage, with wide CIs.

The REFLUX study illustrated a common concern in CEA, in that the levels of non-compliancein the RCT were different, in this case higher, to those in routine practice. The CACEs presentedprovide the policy-maker with an estimate of what the relative cost-effectiveness would be if all theRCT participants had complied with their assigned treatment, which is complementary to the ITTestimate. Since we judged the IV assumptions for identification likely to hold in this case-study, we

12

conclude that either 3sls or BFL provide valid inferences for the CACE of INB. The re-analysis of theREFLUX case study also provided the opportunity to investigate the sensitivity to the choice of priorsin practice. Here we found that our choice of weakly informative priors, which were relatively flat inthe region where the values of the parameters were anticipated to be, together with samples of at leastsize 100, had minimal influence on the posterior estimates. We repeated the analysis using differentvague priors for the parameters of interests and the corresponding results were not materially changed.

We considered here relatively simple frequentist IV methods, namely 2sls and 3sls. One alternativeapproach to the estimation of CACEs for multivariate responses, is to use linear structural equationmodelling, estimated by maximum-likelihood expectation-maximization (ML-EM) algorithm (Jo andMuthén, 2001). Further, we only considered those settings where a linear additive treatment effect is ofinterest, and the assumptions for identification are met. Where interest lies in systems of simultaneousnon-linear equations with endogeneous regressors, GMM or generalised structural equation models canbe used to estimate CACEs (Davidson and MacKinnon, 2004).

There are several options to study the sensitivity to departures from the identification assumptions.For example, if the exclusion restriction does not hold, a Bayesian parametric model can use priorson the non-zero direct effect of randomisation on the outcome for identification (Conley et al., 2012;Hirano et al., 2000). Since the models are only weakly identified, the results would depend strongly onthe parametric choices for the likelihood and the prior distributions. In the frequentist IV framework,such modelling is also possible, see Baiocchi et al. (2014) for an excellent tutorial on how to conductsensitivity analysis to violations of the ER and monotonicity assumptions. Alternatively, violationsof the ER can also be handled by using baseline covariates to model the probability of compliancedirectly, within structural equation modelling via ML-EM framework (Jo, 2002a,b).

Settings where the instrument is only weakly correlated with the endogenous variable have not beenconsidered here, because for binary non-compliance with binary allocation, the percentage of one-waynon-compliance would need to be in excess of 85%, for the F-statistic of the randomization instrumentto be less than 10, the traditional cutoff beneath which an instrument is regarded as ‘weak’. Suchlevels of non-compliance are not realistic in practice, with the reported median non-compliance equalto 12% (Zhang et al., 2014). Nevertheless, Bayesian IV methods have been shown to perform betterthan 2sls methods when the instrument is weak (Burgess and Thompson, 2012).

Also, for simplicity, we restricted our analysis of the case study to MAR and complete casesassumptions. Sensitivity to departures from these assumptions is beyond the scope of this paper, butresearchers should be aware of the need to think carefully about the possible causes of missingness,and conduct sensitivity analysis under MNAR, assuming plausible differences in the distributions ofthe observed and the missing data. When addressing the missing data through Bayesian methods,the posterior distribution can be sensitive to the choice of prior distribution, especially with a largeamount of missing data (Hirano et al., 2000).

Future research directions could include exploiting the additional flexibility of the Bayesian frame-work to incorporate informative priors, perhaps as part of a comprehensive decision modelling approach.The methods developed here could also be extended to handle time-varying non-compliance.

Acknowledgements

We thank M. Sculpher, R. Faria, D. Epstein, C. Ramsey and the REFLUX study team for access to the data.

K. DiazOrdaz was supported by UK Medical Research Council Career development award in Biostatistics MR/L011964/1. This

report is independent research supported by the National Institute for Health Research (Senior Research Fellowship, R. Grieve,

SRF-2013-06-016). The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the

National Institute for Health Research or the Department of Health.

13

Figure 1: Median Bias for scenarios with 30% non-compliance and sample sizes of (a) n = 100 (top) and (b) n = 1000.Results are stratified by cost distribution, and correlation between cost and QALYs. The dotted line represents zerobias. Results for 2sls (not plotted) are identical to those for 3sls; uBGN was not applied to Normal cost data.

Normal Gamma Inverse Gaussian

0.000

0.025

0.050

0.075

−0.004

0.000

0.004

0.008

−100

−75

−50

−25

0

Incremental cost

Incremental Q

ALY

INB

0.4 −0.4 0.8 −0.8 0.4 −0.4 0.8 −0.8 0.4 −0.4 0.8 −0.8Correlation

Med

ian

bias

method3sls

BFL

uBN

uGN

(a) n=100Normal Gamma Inverse Gaussian

0.00

0.02

0.04

0.06

−0.003

0.000

0.003

0.006

−80

−60

−40

−20

0

Incremental cost

Incremental Q

ALY

INB

0.4 −0.4 0.8 −0.8 0.4 −0.4 0.8 −0.8 0.4 −0.4 0.8 −0.8Correlation

Med

ian

bias

method3sls

BFL

uBN

uGN

(b) n=1000

14

Table 1: The REFLUX study: descriptive statistics and cost-effectiveness according to ITT and alternative methodsfor estimating the CACE. Follow-up period is five years, and treatment switches are defined within the first year postrandomisation. Costs and INB numbers rounded to the nearest integer.

Medical management Laparoscopic surgeryN Assigned 179 178

N (%) Switched 10 (8.3) 67 (28.4)

N (%) missing costs 83 (46) 83 (47)Mean (SD) observed cost in £ 1258 (1687) 2971 (1828)

N (%) missing QALYs 91 (51) 94 (53)Mean (SD) observed QALYs 3.52 (0.99) 3.74 (0.90)

Baseline variables

N (%) missing EQ5D0 6 (3) 7 (4)Mean (SD) observed EQ5D0 0.72 (0.25) 0.71 (0.26)

Correlation between costs and QALYs −0.42 −0.07

Correlation of costs and QALYs −0.36 −0.18

by treatment received

Incremental costs, QALYs and INB of surgery vs medicine

Outcome Method estimate ( 95% CI )Incremental cost

ITT 1103 (593, 1613)

2sls 1899 (1073, 2724)

3sls 1899 (1073, 2724)

uBN 2960 (2026, 3998)

uBGN 2176 (1356, 3031)

BFL 2030 (1170, 2878)

Incremental QALYsITT 0.295 (0.002, 0.589)

2sls 0.516 (0.103, 0.929)

3sls 0.516 (0.103, 0.929)

uBN 0.568 (0.181, 0.971)

uBGN 0.268 (−0.229, 0.759)

BFL 0.511 (0.121, 0.947)

INBITT 7763 (−1059, 16585)

2sls 13587 (1101, 26073)

3sls 13587 (1002, 26173)

uBN 14091 (2485, 26086)

uBGN 5869 (−9204, 20740)

BFL 13340 (1406, 26315)

15

Table 2: CI Coverage rates (CR) and median width for incremental cost, QALYs, and INB, across scenarios with30% non-compliance, sample size n = 100 and moderate correlation between outcomes (with odd rows corresponding topositive correlation ρ and even rows to negative). uBGN was not applied in settings with normal cost data.

2sls 3sls uBN uBGN BFLY1 ∼ N CR CIW CR CIW CR CIW CR CIW CR CIWCost .952 .228 .952 .228 .992 .312 .988 .299ρ < 0 .952 .229 .952 .229 .993 .325 .986 .297QALYs .946 .112 .946 .112 .988 .155 .950 .121ρ < 0 .950 .113 .950 .113 .992 .163 .950 .121INB .988 405 .953 319 .982 398 .966 376ρ < 0 .900 409 .948 475 .951 509 .962 525Y1 ∼ G

Cost .952 .756 .952 .756 .955 .815 .941 .818 .954 .823ρ < 0 .942 .759 .942 .759 .949 .828 .936 .822 .945 .811QALYs .959 .113 .959 .113 .993 .160 .960 .122 .960 .122ρ < 0 .959 .113 .949 .113 .995 .163 .954 .122 .954 .122INB .982 829 .948 696 .958 764 .942 748 .956 760ρ < 0 .914 833 .948 943 .930 921 .941 1019 .951 1014Y1 ∼ IG

Cost .951 .880 .951 .880 .958 .949 .904 .866 .956 .945ρ < 0 .950 .878 .950 .878 .958 .951 .905 .864 .954 .932QALYs .945 .112 .945 .112 .991 .161 .944 .120 .999 .206ρ < 0 .954 .112 .954 .112 .993 .161 .952 .120 .999 .204INB .980 944 .954 818 .959 889 .917 814 .984 1001ρ < 0 .917 942 .947 1049 .934 1034 .911 1041 .971 1203

16

Table 3: RMSE for incremental Cost, QALYs and INB across scenarios with 30% non-compliance, moderatecorrelation between outcomes and sample size n = 100. uBGN was not applied in settings with normal costdata. Numbers for INB have been rounded to the nearest integer.Cost distribution ρ 3sls (2sls) uBN uBGN BFLNormal Cost

0.4 0.058 0.060 0.059-0.4 0.060 0.062 0.061

QALYs0.4 0.029 0.030 0.030-0.4 0.029 0.030 0.030

INB0.4 83 84 87-0.4 125 127 125

Gamma Cost0.4 0.198 0.202 0.212 0.202-0.4 0.200 0.204 0.212 0.203

QALYs0.4 0.030 0.030 0.030 0.029-0.4 0.029 0.030 0.030 0.030

INB0.4 181 184 193 184-0.4 246 251 261 252

Inverse CostGaussian 0.4 0.230 0.232 0.252 0.232

-0.4 0.230 0.232 0.250 0.232QALYs

0.4 0.029 0.030 0.030 0.030-0.4 0.029 0.030 0.030 0.030

INB0.4 211 214 231 214-0.4 273 278 296 278

17

A Sensitivity analyses for the REFLUX RCT

Our motivating example trial had over 50% missing cost-effectiveness outcomes, with a few missingdata in the baseline covariate. As primary analysis, we performed MI and full Bayesian analyses, thatintrinsically obtain a predictive posterior distribution for these missing values, and reported the resultsof these analyses in the main text.

As a sensitivity to the MAR assumption, we present here the complete case analysis (N=172) of theREFLUX RCT, adjusting for baseline EQ5D. The results are valid if conditional on baseline EQ5D,the missing data mechanism is independent of the outcomes.

Table A1: Estimated cost-effectiveness for the REFLUX RCT, over a time horizon of five years according toITT and alternative methods for estimating the CACE based on complete cases. Treatment switches are definedwithin the first year post randomisation. Costs and INB numbers rounded to the nearest integer.

Incremental costs, QALYs and INB of surgery vs medicine

Outcome Method estimate ( 95% CI )Incremental cost

ITT 1626 (1098, 2153)

2sls 2102 (1484, 2720)

3sls 2102 (1484, 2720)

uBN 2112 (1401, 2879)

uBGN 2195 (1481, 3065)

BFL 2081 (1457, 2705)

Incremental QALYsITT 0.319 (−0.055, 0.694)

2sls 0.412 (0.137, 0.688)

3sls 0.412 (0.137, 0.688)

uBN 0.413 (0.136, 0.704)

uBGN 0.285 (−0.066, 0.629)

BFL 0.414 (0.135, 0.703)

INBITT 7948. (−3259, 19155)

2sls 10275 (1997, 18554)

3sls 10275 (1828, 18724)

uBN 10280, (1947, 18980)

uBGN 6362 − 4497, 16941)

BFL 10353 (1789, 19165)

As can be seen in Table A1, the conclusions each model reaches are not substantially changed fromthose under MAR (either using MI or full Bayesian), presented in the main text.

18

A.1 Sensitivity analysis of the BFL to choice of priors

We now study the sensitivity of the BFL model to the choice of priors for the parameters in the model.Recall that this model can be written as Di

Y1i

Y2i

∼ N µ0i

µ1i

µ2i

Σ =

σ20 s01 s02

s01 σ21 s12

s02 s12 σ22

(15)

where sij = cov(Yi, Yj), and

µ0i = β0,0 + β1,0Zi

µ1i = β0,1 + β1,1β1,0Zi

µ2i = β0,2 + β1,2β1,0Zi (16)

Originally, we chose normal priors for the regression coefficients, βm,j ∼ N(0, 102), for j ∈ {0, 1, 2},m ∈{0, 1}, and a Wishart prior for the inverse of Σ (Gelman and Hill, 2006).

Here, we first vary the prior distribution for Σ, and assume a structured covariance matrix (seeCongdon (2007) example 5.8, and Lunn et al. (2012), example 9.1.4). Thus, we write:

Σ =

s00s00 ρ1s00s11 ρ2s00s22

ρ1s00s11 s11s11 ρ3s11s22

ρ2s00s22 ρ3s11s22 s22s22

(17)

and assumed the following priors

sij ∼ N(0, 102), ρj ∼ Unif[−1, 1] (18)

In a secondary sensitivity analysis, we assume a Wishart prior for Σ as before, but used uniform priorsfor the regression coefficients, βm,j ∼ Unif[−10, 10], for j ∈ {0, 1, 2},m ∈ {0, 1}. We did these changeson both available cases and complete cases. The results corresponding to INB are reported in TableA2.

Table A2: Results from sensitivity analysis of the BFL to prior specifications for the INB of themotivating example: CEA within the REFLUX trial. INB reported to the nearest integer, in GreatBritish Pounds (GBP).

Data Priors INB estimate 95% CIAvailable cases Structured precision matrix1 10421 (1878, 19187)

normal priors for βm,jAvailable cases Wishart prior for the precision 10405 (1733, 19330)

Unif prior for βm,jComplete cases Structured precision matrix 10425 (1792, 19156)

normal priors for βm,jComplete cases Wishart prior for the precision 10428 (1894, 19279)

Unif prior for βm,j1 Following the specifications of equation (18) in this Appendix.

B Supplementary Results from the simulations

19

Figure 2: Median Bias for scenarios with 70% non-compliance and sample sizes of (a) n = 100 (top) and (b) n = 1000.Results are stratified by cost distribution, and correlation between cost and QALYs. The dotted line represents zerobias. Results for 2sls (not plotted) are identical to those for 3sls; uBGN was not applied to Normal cost data.

Normal Gamma Inverse Gaussian

0.00

0.05

0.10

0.15

0.20

0.25

−0.01

0.00

0.01

0.02

0.03

0.04

−100

0

Incremental cost

Incremental Q

ALY

INB

0.4 −0.4 0.8 −0.8 0.4 −0.4 0.8 −0.8 0.4 −0.4 0.8 −0.8Correlation

Med

ian

bias

method3sls

BFL

uBN

uGN

(a) n=100Normal Gamma Inverse Gaussian

0.00

0.02

0.04

0.06

−0.0050

−0.0025

0.0000

0.0025

0.0050

−80

−60

−40

−20

0

Incremental cost

Incremental Q

ALY

INB

0.4 −0.4 0.8 −0.8 0.4 −0.4 0.8 −0.8 0.4 −0.4 0.8 −0.8Correlation

Med

ian

bias

method3sls

BFL

uBN

uGN

(b) n=1000

20

Tab

leA3:

CICoverag

eratesan

dmedianwidth

forincrem

entalC

ost,

QALY

s,an

dIN

B,a

crossscen

arioswith30

%no

n-complianc

e,mod

eratecorrelation

betw

eenou

tcom

esan

dsamplesizen

=100

0.uB

GN

was

notap

pliedin

settings

withno

rmal

cost

data.

2sls

3sls

uB

NuB

GN

BFL

Cos

tdis

trib

uti

onρ

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thN

orm

alC

ost

0.4

0.94

90.07

20.94

90.07

20.98

90.09

40.95

60.07

5-0.4

0.95

70.07

20.95

70.07

20.99

20.09

70.96

30.07

5Q

ALY

s0.4

0.94

20.03

60.94

20.03

60.98

20.04

60.93

70.03

5-0.4

0.94

70.03

60.94

70.03

60.99

10.04

80.94

40.03

5IN

B0.4

0.98

6129

0.95

110

20.97

712

00.95

010

1-0.4

0.92

012

90.95

615

00.96

215

50.94

9149

Gam

ma

Cos

t0.4

0.94

90.24

00.94

90.24

00.95

50.24

60.94

30.23

50.95

10.24

2-0.4

0.95

30.24

00.95

30.24

00.95

80.24

80.94

50.23

50.95

40.24

3Q

ALY

s0.4

0.94

70.03

60.94

70.03

60.99

00.04

80.94

60.03

60.94

30.03

5-0.4

0.95

00.03

60.95

00.03

60.99

30.04

80.94

90.03

60.95

00.03

6IN

B0.4

0.98

5262

0.95

022

10.95

722

90.94

421

80.94

922

1-0.4

0.92

026

20.95

229

70.93

227

60.94

629

20.95

430

1In

vers

eC

ost

Gau

ssia

n0.4

0.94

90.28

30.94

90.28

30.95

80.29

00.90

20.24

80.94

70.28

1-0.4

0.95

20.28

30.95

20.28

30.95

70.29

10.90

10.24

90.94

60.27

7Q

ALY

s0.4

0.95

40.03

60.95

40.03

60.99

00.04

80.94

20.03

50.96

70.03

8-0.4

0.94

90.03

60.94

90.03

60.98

90.04

80.94

20.03

50.96

10.03

8IN

B0.4

0.97

6303

0.95

626

30.95

826

80.91

423

50.95

826

8-0.4

0.92

230

20.95

033

60.93

231

50.90

630

00.94

733

3

21

Tab

leA4:

CICoverag

eratesan

dmedianwidth

forincrem

entalC

ost,

QALY

s,an

dIN

B,a

crossscen

arioswith70

%no

n-complianc

e,mod

eratecorrelation

betw

eenou

tcom

esan

dsamplesizen

=100

.uB

GN

was

notap

pliedin

settings

withno

rmal

cost

data.

2sls

3sls

uB

NuB

GN

BFL

Cos

tdis

trib

uti

onρ

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thN

orm

alC

ost

0.4

0.96

80.55

00.96

80.55

00.98

81.33

0.99

11.13

-0.4

0.96

60.54

50.96

60.54

50.99

01.30

0.98

51.08

QA

LYs

0.4

0.96

80.27

10.96

80.27

10.98

80.66

50.96

40.44

5-0.4

0.96

40.26

90.96

40.26

90.98

60.66

50.95

40.43

5IN

B0.4

0.99

2981

0.96

677

10.98

51453

0.98

114

11-0.4

0.91

797

30.96

611

310.95

717

880.95

819

12G

amm

aC

ost

0.4

0.96

61.68

0.96

61.68

0.96

42.64

00.94

42.21

0.96

02.73

-0.4

0.96

91.69

0.96

91.69

0.96

52.69

00.95

02.21

0.96

42.71

QA

LYs

0.4

0.97

20.26

70.97

20.26

70.99

40.62

40.95

90.34

80.96

50.42

-0.4

0.96

50.26

90.96

50.26

90.98

70.62

50.95

40.35

10.95

40.42

INB

0.4

0.98

818

700.966

1557

0.96

424

560.94

720

410.97

025

03-0.4

0.93

918

760.97

121

380.94

729

840.95

827

740.96

233

44In

vers

eC

ost

Gau

ssia

n0.4

0.96

51.92

0.96

51.92

0.96

73.04

0.91

22.29

0.97

23.31

-0.4

0.95

81.90

0.95

81.90

0.96

63.05

0.91

02.30

0.96

83.37

QA

LYs

0.4

0.96

70.26

80.96

70.26

80.98

80.63

70.94

80.35

51

0.76

6-0.4

0.96

60.27

10.96

60.27

10.99

00.63

10.94

50.35

90.99

90.77

4IN

B0.4

0.99

020

820.961

1777

0.96

328

270.91

221

570.99

135

54-0.4

0.93

620

710.97

023

350.91

233

890.92

428

550.97

948

19

22

Tab

leA5:

CICoverag

eratesan

dmedianwidth

forincrem

entalC

ost,

QALY

s,an

dIN

B,a

crossscen

arioswith70

%no

n-complianc

e,mod

eratecorrelation

betw

eenou

tcom

esan

dsamplesizen

=100

0.uB

GN

was

notap

pliedin

settings

withno

rmal

cost

data.

2sls

3sls

uB

NuB

GN

BFL

Cos

tdis

trib

uti

onρ

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thN

orm

alC

ost

0.4

0.95

40.16

90.95

40.16

90.98

60.22

30.95

80.17

8-0.4

0.95

30.16

90.95

30.16

90.99

10.23

00.96

00.17

9Q

ALY

s0.4

0.94

60.08

30.94

60.08

30.98

60.11

10.94

20.08

4-0.4

0.95

30.08

30.95

30.08

30.99

20.11

50.94

70.08

3IN

B0.4

0.98

6301

0.94

823

80.97

628

70.94

424

0-0.4

0.90

930

10.94

434

90.95

336

80.94

2356

Gam

ma

Cos

t0.4

0.95

20.52

60.95

20.52

60.95

80.55

60.93

90.52

30.95

20.54

1-0.4

0.95

20.52

50.95

20.52

50.95

80.55

60.94

30.52

40.95

20.54

1Q

ALY

s0.4

0.95

10.08

30.95

10.08

320.99

50.11

50.94

70.08

50.94

90.08

4-0.4

0.95

50.08

30.95

50.08

320.99

50.11

50.95

40.08

40.95

50.08

5IN

B0.4

0.98

0582

0.95

448

60.93

262

60.94

648

70.95

149

6-0.4

0.91

658

10.95

366

10.93

262

60.94

466

20.95

367

8In

vers

eC

ost

Gau

ssia

n0.4

0.95

20.59

50.95

20.59

50.95

90.62

10.90

90.53

80.95

00.60

3-0.4

0.94

60.95

50.59

90.95

50.59

90.62

90.91

00.54

20.94

80.59

9Q

ALY

s0.4

0.94

30.08

30.94

30.08

30.99

10.11

40.93

80.08

40.95

90.09

18-0.4

0.08

30.94

60.08

30.96

0.98

80.11

50.94

20.08

40.95

70.09

05IN

B0.4

0.97

5645

0.95

255

20.95

457

70.91

851

00.95

357

1-0.4

0.92

064

90.95

272

80.93

692

0.91

466

80.95

073

6

23

Tab

leA6:

CIC

overag

eratesan

dmedianwidth

forincrem

entalC

ost,QALY

s,an

dIN

B,a

crossscen

arioswith30

%no

n-complianc

e,high

correlationbe

tween

outcom

esan

dsamplesizen

=100.

uBGN

was

notap

pliedin

settings

withno

rmal

cost

data.

2sls

3sls

uB

NuB

GN

BFL

Cos

tdis

trib

uti

onρ

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thN

orm

alC

ost

0.8

0.95

10.22

90.95

10.22

90.98

40.29

90.98

70.29

9-0.8

0.94

60.22

80.94

60.22

80.99

20.32

40.98

60.29

5Q

ALY

s0.8

0.95

40.11

30.95

40.11

30.98

50.14

80.95

80.12

2-0.8

0.94

90.11

20.94

90.11

20.99

20.16

20.95

20.12

1IN

B0.8

0.99

9408

0.95

320

70.99

837

30.99

027

9-0.8

0.86

740

70.94

753

30.93

514

0.95

758

5G

amm

aC

ost

0.8

0.95

50.75

70.95

50.75

70.94

70.76

40.92

20.68

10.96

10.82

2-0.8

0.94

60.75

70.94

60.75

70.95

40.82

50.94

70.82

00.95

10.81

7Q

ALY

s0.8

0.95

20.11

30.95

20.11

30.98

60.15

20.92

80.10

90.95

70.12

3-0.8

0.94

70.11

30.94

70.11

30.99

0.16

20.95

00.12

10.95

40.12

3IN

B0.8

0.99

9828

0.95

553

90.99

707

0.94

252

80.96

659

6-0.8

0.88

483

10.94

510

410.90

192

00.88

290

20.95

1118

Inve

rse

Cos

tG

auss

ian

0.8

0.95

40.88

30.95

40.88

30.94

80.89

40.83

60.74

50.96

10.95

6-0.8

0.95

30.88

40.95

30.88

40.96

10.95

60.82

60.75

00.95

50.93

8Q

ALY

s0.8

0.95

40.11

30.95

40.11

30.98

90.15

40.90

00.10

81

0.20

4-0.8

0.95

10.11

30.95

10.11

30.99

30.16

30.91

60.10

81

0.20

4IN

B0.8

0.99

4947

0.95

267

40.97

882

50.83

759

10.99

087

7-0.8

0.89

394

80.95

311

550.91

010

420.84

299

30.96

913

07

24

Tab

leA7:

CIC

overag

eratesan

dmedianwidth

forincrem

entalC

ost,QALY

s,an

dIN

B,a

crossscen

arioswith30

%no

n-complianc

e,high

correlationbe

tween

outcom

esan

dsamplesizen

=1000

.uBGN

was

notap

pliedin

settings

withno

rmal

cost

data.

2sls

3sls

uB

NuB

GN

BFL

Cos

tdis

trib

uti

onρ

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thN

orm

alC

ost

0.8

.949

0.07

20.94

90.07

20.98

50.08

90.95

10.07

4-0.8

0.95

40.07

20.95

40.07

20.99

60.09

70.95

50.07

4Q

ALY

s0.8

0.94

30.03

60.94

30.03

60

0.98

80.04

40.93

80.03

5-0.8

0.95

20.03

60.95

20.03

60.99

20.04

80.95

10.03

6IN

B0.8

1129

0.95

65.4

111

20.94

666

.3-0.8

0.86

412

90.95

216

8.4

0.92

815

40.94

617

0.2

Gam

ma

Cos

t0.8

0.94

60.24

00.94

60.240

0.93

80.23

00.89

70.19

90.94

40.23

9-0.8

0.94

90.24

00.94

90.24

00.95

50.24

90.89

60.20

20.95

00.24

1Q

ALY

s0.8

0.94

80.03

60.94

80.036

0.98

30.04

50.92

10.03

20.94

60.03

5-0.8

0.95

20.03

60.95

20.03

60.99

00.04

80.92

50.03

20.95

00.03

6IN

B0.8

0.99

6263

0.95

017

20.98

221

30.92

815

30.94

817

2-0.8

0.88

326

30.94

632

90.89

927

70.89

827

50.94

633

0In

vers

eC

ost

Gau

ssia

n0.8

0.94

60.28

30.94

60.283

0.93

60.27

10.68

40.21

70.94

20.27

9-0.8

0.95

20.28

30.95

20.28

30.95

50.29

00.70

10.22

10.95

00.28

0Q

ALY

s0.8

0.95

00.03

560.95

0.03

560.98

60.04

570.83

30.03

20.96

40.03

8-0.8

0.94

40.03

560.94

40.03

560.99

00.04

80.84

90.03

20.96

50.03

8IN

B0.8

0.99

4303

0.95

221

80.97

225

00.72

217

30.95

422

1-0.8

0.89

302

0.94

736

70.90

431

50.72

229

20.94

636

4

25

Tab

leA8:

CIC

overag

eratesan

dmedianwidth

forincrem

entalC

ost,QALY

s,an

dIN

B,a

crossscen

arioswith70

%no

n-complianc

e,high

correlationbe

tween

outcom

esan

dsamplesizen

=100.

uBGN

was

notap

pliedin

settings

withno

rmal

cost

data.

2sls

3sls

uB

NuB

GN

BFL

Cos

tdis

trib

uti

onρ

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thN

orm

alC

ost

0.8

0.97

20.54

60.97

20.54

60.98

21.29

0.98

81.09

-0.8

0.97

00.55

00.97

00.55

00.99

1.36

0.98

81.07

QA

LYs

0.8

0.97

0.26

90.97

0.26

90.98

20.64

0.96

60.44

3-0.8

0.96

60.27

20.96

60.27

20.98

90.66

20.96

00.44

6IN

B0.8

1975

0.97

349

51

1322

0.99

310

32-0.8

0.87

5981

0.96

612

840.93

618

580.95

720

96G

amm

aC

ost

0.8

0.96

71.68

0.96

71.68

0.95

2.51

0.92

11.82

0.96

72.77

-0.8

0.96

51.70

0.96

51.70

0.96

52.75

0.90

41.88

0.96

42.80

QA

LYs

0.8

0.96

90.26

80.96

90.26

80.98

40.60

90.93

50.31

50.96

70.43

0-0.8

0.96

60.27

20.96

60.27

20.98

90.62

60.93

00.32

40.95

80.44

0IN

B0.8

118

710.963

1175

0.99

222

330.94

713

970.97

519

64-0.8

0.89

518

850.96

523

830.90

830

960.90

326

310.954

3867

Inve

rse

Cos

tG

auss

ian

0.8

0.96

1.92

0.96

1.92

0.94

72.92

0.83

92.03

0.96

63.42

-0.8

0.96

41.91

0.96

41.91

0.96

63.05

0.85

52.03

0.96

83.37

QA

LYs

0.8

0.96

30.27

10.96

30.27

10.98

30.64

50.90

60.32

20.99

90.78

9-0.8

0.97

30.27

0.97

30.27

00.99

0.63

10.91

80.32

50.99

90.77

4IN

B0.8

0.99

820

910.958

1417

0.98

126

540.86

315

860.99

031

45-0.8

0.90

220

750.96

625

680.91

233

890.86

627

550.979

4819

26

Tab

leA9:

CIC

overag

eratesan

dmedianwidth

forincrem

entalC

ost,QALY

s,an

dIN

B,a

crossscen

arioswith70

%no

n-complianc

e,high

correlationbe

tween

outcom

esan

dsamplesizen

=1000

.uB

GN

was

notap

pliedin

settings

withno

rmal

cost

data.

2sls

3sls

uB

NuB

GN

BFL

Cos

tdis

trib

uti

onρ

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thco

vera

gew

idth

cove

rage

wid

thN

orm

alC

ost

0.8

0.95

90.16

90.95

90.16

90.98

30.21

10.95

90.17

5-0.8

0.95

20.16

90.95

20.16

90.99

10.23

00.95

40.17

7Q

ALY

s0.8

0.95

70.08

30.95

70.08

30.98

30.10

60.95

10.08

4-0.8

0.95

40.08

30.95

40.08

30.99

10.11

50.95

10.08

5IN

B0.8

1302

0.95

215

30.99

926

90.95

215

8-0.8

0.87

3302

0.95

439

50.93

136

90.95

3406

Gam

ma

Cos

t0.8

0.95

20.52

70.95

20.52

70.93

80.51

40.89

00.44

20.94

80.53

5-0.8

0.95

00.52

50.95

0.52

50.95

40.55

60.90

40.44

90.94

90.54

2Q

ALY

s0.8

0.95

80.08

330.95

80.0833

0.98

60.10

80.92

30.07

60.95

20.08

41-0.8

0.95

10.08

320.95

10.0832

0.99

20.11

50.93

00.07

70.94

70.08

43IN

B0.8

0.99

8583

0.95

336

80.98

647

90.92

333

60.95

037

4-0.8

0.88

8582

0.95

373

40.90

262

70.90

462

70.95

175

3In

vers

eC

ost

Gau

ssia

n0.8

0.95

20.59

90.95

20.59

90.93

80.58

30.78

50.47

40.94

60.60

2-0.8

0.95

90.60

00.95

90.60

00.96

20.63

00.78

60.48

20.95

50.60

4Q

ALY

s0.8

0.94

80.08

30.94

80.08

30.99

20.10

90.89

10.07

60.96

0.09

1-0.8

0.95

60.08

30.95

60.08

30.99

20.11

50.89

60.07

70.96

80.09

1IN

B0.8

0.99

6650

0.94

644

70.97

854

20.80

037

00.95

546

8-0.8

0.89

9650

0.96

801

0.91

569

40.80

665

40.958

807

27

Table A10: RMSE for incremental Cost, QALYs and INB across scenarios with 30% non-compliance,high correlation between outcomes and sample size n = 100. uBGN was not applied in settings withnormal cost data. Numbers for INB have been rounded to the nearest integer.

Cost distribution ρ 3sls (2sls) uBN uBGN BFLNormal Cost

0.8 0.059 0.060 0.060-0.8 0.059 0.061 0.060

QALYs0.8 0.029 0.030 0.030-0.8 0.029 0.030 0.030

INB0.8 54 55 56-0.8 137 140 139

Gamma Cost0.8 0.192 0.196 0.194 0.196-0.8 0.196 0.200 0.205 0.200

QALYs0.8 0.029 0.030 0.030 0.030-0.8 0.029 0.030 0.030 0.030

INB0.8 138 141 137 141-0.8 271 276 281 276

Inverse CostGaussian 0.8 0.226 0.229 0.272 0.229

-0.8 0.225 0.228 0.272 0.228QALYs

0.8 0.029 0.030 0.032 0.030-0.8 0.028 0.029 0.031 0.029

INB0.8 175 178 210 178-0.8 291 296 345 296

28

Table A11: RMSE for incremental Cost, QALYs and INB across scenarios with 30% non-compliance,moderate correlation between outcomes and sample size n = 1000. uBGN was not applied in settingswith normal cost data. Numbers for INB have been rounded to the nearest integer.

Cost distribution ρ 3sls (2sls) uBN uBGN BFLNormal Cost

0.4 0.018 0.018 0.018-0.4 0.018 0.018 0.018

QALYs0.4 0.009 0.009 0.009-0.4 0.009 0.009 0.009

INB0.4 26 26 26-0.4 38 38 39

Gamma Cost0.4 0.061 0.061 0.061 0.061-0.4 0.061 0.061 0.061 0.061

QALYs0.4 0.009 0.009 0.009 0.009-0.4 0.009 0.009 0.009 0.009

INB0.4 55 55 56 55-0.4 75 75 76 75

Inverse CostGaussian 0.4 0.072 0.072 0.075 0.072

-0.4 0.072 0.072 0.076 0.073QALYs

0.4 0.009 0.009 0.035 0.009-0.4 0.009 0.009 0.035 0.009

INB0.4 66 66 69 66-0.4 85 86 89 86

29

Table A12: RMSE for incremental Cost, QALYs and INB across scenarios with 30% non-compliance,high correlation between outcomes and sample size n = 1000. uBGN was not applied in settings withnormal cost data. Numbers for INB have been rounded to the nearest integer.

Cost distribution ρ 3sls (2sls) uBN uBGN BFLNormal Cost

0.8 0.019 0.019 0.019-0.8 0.018 0.018 0.018

QALYs0.8 0.009 0.009 0.009-0.8 0.009 0.009 0.009

INB0.8 17 17 17-0.8 43 43 44

Gamma Cost0.8 0.062 0.062 0.061 0.062-0.8 0.062 0.062 0.062 0.062

QALYs0.8 0.009 0.009 0.009 0.009-0.8 0.009 0.009 0.009 0.009

INB0.8 44 44 43 44-0.8 85 85 84 85

Inverse CostGaussian 0.8 0.073 0.073 0.105 0.073

-0.8 0.072 0.072 0.103 0.072QALYs

0.8 0.009 0.009 0.011 0.009-0.8 0.009 0.009 0.011 0.009

INB0.8 56 56 80 56-0.8 94 94 132 94

30

Table A13: RMSE for incremental Cost, QALYs and INB across scenarios with 70% non-compliance,moderate correlation between outcomes and sample size n = 100. uBGN was not applied in settingswith normal cost data. Numbers for INB have been rounded to the nearest integer.

Cost distribution ρ 3sls (2sls) uBN uBGN BFLNormal Cost

0.4 0.145 0.207 0.181-0.4 0.146 0.203 0.182

QALYs0.4 0.074 0.104 0.089-0.4 0.074 0.109 0.088

INB0.4 212 272 264-0.4 309 396 383

Gamma Cost0.4 0.467 0.565 0.682 0.559-0.4 0.456 0.553 0.636 0.549

QALYs0.4 0.072 0.097 0.082 0.085-0.4 0.073 0.099 0.083 0.086

INB0.4 426 520 429 500-0.4 570 697 758 687

Inverse CostGaussian 0.4 0.521 0.622 0.946 0.622

-0.4 0.516 0.635 1.096 0.635QALYs

0.4 0.073 0.094 0.092 0.094-0.4 0.074 0.092 0.110 0.092

INB0.4 484 583 632 583-0.4 630 865 967 865

31

Table A14: RMSE for incremental Cost, QALYs and INB across scenarios with 70% non-compliance,high correlation between outcomes and sample size n = 100. uBGN was not applied in settings withnormal cost data. Numbers for INB have been rounded to the nearest integer.

Cost distribution ρ 3sls (2sls) uBN uBGN BFLNormal Cost

0.8 0.147 0.201 0.180-0.8 0.147 0.209 0.182

QALYs0.8 0.072 0.100 0.086-0.8 0.075 0.109 0.090

INB0.8 131 169 169-0.8 351 447 425

Gamma Cost0.8 0.450 0.543 0.604 0.531-0.8 0.454 0.546 0.563 0.538

QALYs0.8 0.072 0.097 0.081 0.086-0.8 0.073 0.096 0.084 0.086

INB0.8 313 377 429 368-0.8 638 768 758 773

Inverse CostGaussian 0.8 0.532 0.637 0.902 0.637

-0.8 0.526 0.635 1.010 0.635QALYs

0.8 0.074 0.092 0.100 0.092-0.8 0.074 0.092 0.096 0.092

INB0.8 395 468 615 468-0.8 704 865 1022 865

32

Table A15: RMSE for incremental Cost, QALYs and INB across scenarios with 70% non-compliance,moderate correlation between outcomes and sample size n = 1000. uBGN was not applied in settingswith normal cost data. Numbers for INB have been rounded to the nearest integer.

Cost distribution ρ 3sls (2sls) uBN uBGN BFLNormal Cost

0.4 0.043 0.043 0.043-0.4 0.043 0.043 0.043

QALYs0.4 0.022 0.022 0.022-0.4 0.022 0.022 0.022

INB0.4 61 62 62-0.4 90 91 91

Gamma Cost0.4 0.135 0.135 0.138 0.136-0.4 0.134 0.135 0.138 0.135

QALYs0.4 0.021 0.021 0.021 0.021-0.4 0.021 0.021 0.021 0.021

INB0.4 124 169 126 125-0.4 168 169 172 170

Inverse CostGaussian 0.4 0.151 0.153 0.158 0.153

-0.4 0.152 0.153 0.158 0.153QALYs

0.4 0.021 0.022 0.022 0.022-0.4 0.021 0.022 0.022 0.022

INB0.4 141 142 147 142-0.4 185 187 193 187

33

Table A16: RMSE for incremental Cost, QALYs and INB across scenarios with 70% non-compliance,high correlation between outcomes and sample size n = 1000. uBGN was not applied in settings withnormal cost data. Numbers for INB have been rounded to the nearest integer.

Cost distribution ρ 3sls (2sls) uBN uBGN BFLNormal Cost

0.8 0.043 0.043-0.8 0.043 0.043 0.043

QALYs0.8 0.021 0.021 0.021-0.8 0.021 0.022 0.021

INB0.8 39 39 39-0.8 100 101 101

Gamma Cost0.8 0.135 0.137 0.135 0.137-0.8 0.133 0.135 0.133 0.135

QALYs0.8 0.021 0.022 0.022 0.021-0.8 0.021 0.021 0.021 0.021

INB0.8 95 96 94 96-0.8 187 188 186 188

Inverse CostGaussian 0.8 0.155 0.157 0.190 0.157

-0.8 0.151 0.153 0.192 0.153QALYs

0.8 0.021 0.021 0.023 0.021-0.8 0.021 0.021 0.023 0.021

INB0.8 118 119 144 119-0.8 201 203 250 203

34

B.1 Software code for performing MI and 3sls in Stata

The code belows performs MI of total costs, total QALYs and the baseline EQ5D summary scoreeq5d_b, by chained equations using predictive mean matching with the 5 nearest neighbours (specifiedby the option knn(5). We include treatment received txreceived as a variable in the imputationmodels, and perform the imputations separately by randomised treatment arm, rnd (done by specifyingthe option by (rnd) .

After obtaining 50 imputations, 3sls estimates are obtained, and pooled following Rubin’s rules.

*MIsort rndmi set mlongmi register imputed total_qalys total_costs eq5d_bmi impute chained (pmm, knn(5) ) total_qalys eq5d_b total_costs= txreceived,///by (rnd) add(50) rseed (110)

*3slsmi estimate, cmdok: reg3 (total_costs total_qalys== txreceived), exog(rnd) iregmi estimate(inb: _b[total_qalys:txreceived] *30000 - _b[total_costs:txreceived]), ///cmdok: reg3 (total_costs total_qalys==txreceived), exog(rnd)

mi estimate (icer: _b[total_costs:txreceived]/_b[total_qalys:txreceived] ), ///cmdok: reg3 (total_costs total_qalys== txreceived), exog(rnd)

B.2 Software code for performing MI and 3sls in R

The following code performs MI by chain equations using PMM. The default option for the numberof donors is 5, so we do not need to specify this. The code also carries out 3sls using systemfit inR in each imputed data set. Rubin’s rules for pooling the estimates across imputed sets are codedmanually.

rm(list=ls())library(MASS,"AER",systemfit,mice,foreign)data <- read.dta("../data.dta")

#the data must be ordered as follows total_qalys total_costs txreceived eq5d_b

#MI using MICE in RM<-50 #50 imputationsdata<-data[order(data$rnd),] # order data by arm

#we impute separate in each randomised armdata0<-subset(data,rnd==0)data1<-subset(data,rnd==1)#the variable rnd must be dropped from data1 and data0

#rnd=1imp1 <-mice(data1, m = M, method =c("pmm","pmm", "","pmm"),print=FALSE)#rnd=0imp0 <-mice(data0, m = M, method =c("pmm","pmm", "","pmm"),print=FALSE)

##merging datamj<-imput_no<- rep(1:M, each=length(data$rnd))data_mi <-data[rep(1:nrow(data), M), ]data_mi$mj<-imput_no

35

data_mi$baseq_imput<-data_mi$qaly_imput<-<-array(NA, dim=M*length(data$rnd))data_mi$cost_imput<-array(NA, dim=M*length(data$rnd))

for (j in 1:M){data_mi$baseq_imput[data_mi$mj==j]<-c(complete(imp0,j)$baseq , complete(imp1,j)$baseq)data_mi$qaly_imput[data_mi$mj==j]<-c(complete(imp0,j)$qaly , complete(imp1,j)$qaly )data_mi$cost_imput[data_mi$mj==j]<-c(complete(imp0,j)$cost , complete(imp1,j)$cost)

}########################################################## 3sls######################################################coef.c<-var.coef.c<-matrix(NA, nrow=M, ncol=1)coef.q<-var.coef.q<-matrix(NA, nrow=M, ncol=1)coef.inb<-var.coef.inb<-matrix(NA, nrow=M, ncol=1)coef.icer.3sls<-matrix(NA, nrow=M, ncol=1)

for (j in 1:M){eq.cost <- data_mi$cost_imput[data_mi$mj==j] ~

data_mi$exp[data_mi$mj==j]+ data_mi$baseq_imput[data_mi$mj==j]eq.qaly <- data_mi$qaly_imput[data_mi$mj==j] ~

data_mi$exp[data_mi$mj==j]+ data_mi$baseq_imput[data_mi$mj==j]eq.Exp.Rand <- data_mi$exp[data_mi$mj==j] ~

data_mi$rnd[data_mi$mj==j]+ data_mi$baseq_imput[data_mi$mj==j]

System3eq <- list( cost=eq.cost, qaly=eq.qaly, trt.received = eq.Exp.Rand )reg3 <- systemfit(System3eq, method = "3SLS",

inst = ~ data_mi$rnd[data_mi$mj==j] + data_mi$baseq_imput[data_mi$mj==j])

coef.q[j]<-reg3$coefficients["qaly_data_mi$exp[data_mi$mj == j]"]coef.c[j]<-reg3$coefficients["cost_data_mi$exp[data_mi$mj == j]"]var.coef.q[j]<-diag(reg3$coefCov)["qaly_data_mi$exp[data_mi$mj == j]"]var.coef.c[j]<-diag(reg3$coefCov)["cost_data_mi$exp[data_mi$mj == j]"]cov<-reg3$coefCov["qaly_data_mi$exp[data_mi$mj == j]","cost_data_mi$exp[data_mi$mj == j]"]coef.inb[j]<- coef.q[j]*30000-coef.c[j]var.coef.inb[j]<-30000^2*(var.coef.q[j])+ var.coef.c[j] -(2*30000*cov)coef.icer.3sls[j]<-(coef.c[j])/(coef.q[j])}

###### Apply Rubin’s rules ####mean.c <-mean(coef.c)var.c <-mean(var.coef.c)with.var.c<- var.c # within-imputation variancebet.var.c<-(1/(M-1))*sum((coef.c-mean.c)^2)se.c<- sqrt(with.var.c+(1+1/M)*bet.var.c)

##Qmean.q <-mean(coef.q)var.q <-mean(var.coef.q)with.var.q<- var.q # within-imputation variancebet.var.q<-(1/(M-1))*sum((coef.q-mean.q)^2)se.q<- sqrt(with.var.q+(1+1/M)*bet.var.q)

##INBmean.inb <-mean(coef.inb)var.inb <-mean(var.coef.inb)with.var.inb<- var.inb # within-imputation variance

36

bet.var.inb<-(1/(M-1))*sum((coef.inb-mean.inb)^2)se.inb<- sqrt(with.var.inb+(1+1/M)*bet.var.inb)

##ONLY Mean ICERmean.icer <-mean(coef.icer.3sls)

res.3sls<-c(mean.c, se.c, mean.q, se.q, mean.inb, se.inb, mean.icer)

B.3 Model code for BFL in JAGS

The following model can also be run in WinBUGs. This may be necessary if there is missing outcomedata in the analysis at hand, since multivariate normal nodes cannot be partially observed in thecurrent version of JAGS.

The vector y=(txreceived, total_qalys,total_costs). The other variables are as before.

model{for (i in 1:n) {

eq5d_b[i] ~ dnorm(qb, pr.qb)

y[i,1:3] ~ dmnorm(mu[i,], pr[,])mu[i,1] <- b[1,1] + b[1,2] * rnd[i]mu[i,2] <- b[2,1] + b[2,2]*b[1,2] * rnd[i] + c*eq5d_b[i]mu[i,3] <- b[3,1] + b[3,2]* b[1,2] *rnd[i] + g*eq5d_b[i]

}# Priors:pr[1:3,1:3]~dwish(R,3)

R<-diag(1,3)

for (k in 1:3) {b[k,1:2] ~ dmnorm(vm[],vp[,])

}for (k in 1:2) {

vm[k] <- 0vp[k,k] <- 0.1

}vp[1,2] <- 0vp[2,1] <- 0

pr.qb<- 1 / (qb.sd * qb.sd)qb.sd <- exp(log.qb.sd)log.qb.sd ~ dnorm(0, 0.01)qb ~ dunif(-0.5,1)c ~ dnorm(0, 0.01)g ~ dnorm(0, 0.01)

#functionals of the parametersinb<- 30000*b[2,2]-b[3,2]icer<- b[3,2]/(b[2,2]+0.000001) #to avoid dividing by a zero

}

37

References

Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996), ‘Identification of causal effects using instrumentalvariables’, Journal of the American Statistical Association 91(434), pp. 444–455.

Angrist, J. D., Pischke, J. S. (2008), Mostly Harmless Econometrics: An Empiricist’s Companion. PrincetonUniversity Press.

Baiocchi, M., Cheng, J. and Small, D. S. (2014), ‘Instrumental variable methods for causal inference’, Statisticsin Medicine 33(13), 2297–2340.

Brilleman, S., Metcalfe, C., Peters, T. and Hollingsworth, W. (2015), ‘The reporting of treatment non-adherenceand its associated impact on economic evaluations conducted alongside randomised trials: a systematicreview.’, Value in Health 19 (1), pp., 99 – 108.

Burgess, S. and Thompson, S. G. (2012), ‘Improving bias and coverage in instrumental variable analysis withweak instruments for continuous and binary outcomes’, Statistics in Medicine 31(15), 1582–1600.

Clarke, P. S. and Windmeijer, F. (2012), ‘Instrumental Variable Estimators for Binary Outcomes.’ Journal ofthe American Statistical Association 107(500):1638–1652.

Congdon, P. (2007), Bayesian Statistical Modelling, 2nd ed., Wiley Series in Probability and Statistics.

Conley, T. G., Hansen, C. B. and Rossi, P. E. (2012), ‘Plausibly exogenous’, Review of Economics and Statistics94(1), 260–272.

Daniel, R. M., Kenward, M. G., Cousens, S. N. and De Stavola, B. L. (2012), ‘Using causal diagrams to guideanalysis in missing data problems’, Statistical Methods in Medical Research: 21(3), 243–256.

Davidson, R. and MacKinnon, J. G. (2004). Economic theory and methods, New York: Oxford UniversityPress.

Didelez, V., Meng, S. and Sheehan, N. (2010) ‘ Assumptions of IV Methods for Observational Epidemiology’.Statistical Science,25(1), 22–40.

Dodd, S., White, I. and Williamson, P. (2012), ‘Nonadherence to treatment protocol in published randomisedcontrolled trials: a review’, Trials 13(1), 84.

Gelman, A. and Hill, J. (2006), Data Analysis Using Regression and Multilevel/Hierarchical Models, AnalyticalMethods for Social Research, Cambridge University Press.

Grant, A. M., Boachie, C., Cotton, S. C., Faria, R., Bojke, L. and Epstein, D. ( 2013) ‘Clinical and economicevaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease:a 5-year follow-up of multicentre randomised trial (the REFLUX trial).’, Health Technology Assessment17(22).

Grant, A., Wileman, S., Ramsay, C., Boyke, L., Epstein, D. and Sculpher, M. (2008), ‘The effectiveness andcost-effectiveness of minimal access surgery amongst people with gastro-oesophageal reflux disease- a ukcollaborative study. the REFLUX trial.’ Health Technology Assessment 12 (31).

Greene, W. (2002). Econometric Analysis, Prentice-Hall international editions, Prentice Hall.

Hernán, M. A. and Robins, J. M. ‘Instruments for causal inference: an epidemiologist’s dream?’, Epidemiology17 (4), 360–372.

Hirano, K. , Imbens, G. W. , Rubin, D. B. and Zhou, X. H. ‘Assessing the effecr of an influenza vaccine in anencouragement design’, Biostatistics 1, 69 –88.

Hoch, J. S., Briggs, A. H. and Willan, A. R., (2002). ‘Something old, something new, something borrowed,something blue: A framework for the marriage of health econometrics and costâĂŘeffectiveness analysis’.Health economics, 11 (5) 415–430.

38

Hughes, D., Charles, J., Dawoud, D., Edwards, R. T., Holmes,E. , Jones, C. , Parham, P. , Plumpton, C. ,Ridyard, C., Lloyd-Williams, H., Wood, E., and Yeo,S. T. (2016). ‘Conducting economic evaluations alongsiderandomised trials: Current methodological issues and novel approaches.’ PharmacoEconomics, 1–15.

Imbens, G. W. and Angrist, J. D. (1994), ‘Identification and estimation of local average treatment effects’,Econometrica 62(2), 467–475.

Imbens, G. W. and Rubin, D. B. (1997), ‘Bayesian inference for causal effects in randomized experiments withnoncompliance’, The Annals of Statistics 25(1), 305–327.

Jo B. (2002a) ‘Estimating intervention effects with noncompliance: Alternative model specifications’. Journalof Educational and Behavioral Statistics. 27:385 –420.

Jo B. (2002b) ‘Model misspecification sensitivity analysis in estimating causal effects of interventions withnoncompliance.’ Statistics in Medicine. 21: 3161– 3181.

Jo B. and Muthén B. O. (2001) ‘Modeling of intervention effects with noncompliance: a latent variable ap-proach for randomised trials,’ In Marcoulides GA, Schumacker RE, eds. New developments and techniquesin structural equation modeling. Lawrence Erlbaum Associates, Mahwah, New Jersey: 57–87.

Kleibergen, F. and Zivot, E. (2003), ‘Bayesian and classical approaches to instrumental variable regression’,Journal of Econometrics 114(1), 29 – 72.

Lancaster, T. (2004), Introduction to Modern Bayesian Econometrics, Wiley.

Latimer, N. R., Abrams, K., Lambert, P., Crowther, M., Wailoo, A., Morden, J., Akehurst, R. and Campbell,M. (2014), ‘Adjusting for treatment switching in randomised controlled trials - a simulation study and asimplified two-stage method’, Statistical Methods in Medical Research: 0962280214557578.

Lunn, D. and Jackson, C. and Best, N. and Thomas, A. and Spiegelhalter, D., (2012) The BUGS Book: APractical Introduction to Bayesian Analysis, Chapman & Hall/CRC Texts in Statistical Science, Taylor &Francis.

Mantopoulos, T., Mitchell, P.M., Welton, N.J. McManus, R. and Andronis, L. (2016) ‘Choice of statisticalmodel for cost-effectiveness analysis and covariate adjustment: empirical application of prominent modelsand assessment of their results’, The European Journal of Health Economics 17:(8) 927–938.

NICE (2013) , Guide to the Methods of Technology Appraisal, National Institute for Health and Care Excellence,London, UK.

Nixon, R. M. and Thompson, S. G. (2005), ‘Methods for incorporating covariate adjustment, subgroup analysisand between-centre differences into cost-effectiveness evaluations.’ Health economics 14(12), 1217–29.

Rossi, P., Allenby, G. and McCulloch, R. (2012), Bayesian Statistics and Marketing, Wiley Series in Probabilityand Statistics, Wiley.

Rubin, D. (1987). Multiple Imputation for Nonresponse in Surveys. Chichester: Wiley.

Schmidt, P. (1990). ‘Three-stage least squares with different instruments for different equations’, Journal ofEconometrics 43(3), 389 – 394.

Terza, J. V., Basu, A. and Rathouz, P. J. (2008). ‘Two-stage residual inclusion estimation: Addressing endo-geneity in health econometric modeling’, Journal of Health Economics 27(3), 531 – 543.

White, I. R. and Carlin, J. B. (2010). ‘Bias and efficiency of multiple imputation compared with complete-caseanalysis for missing covariate values’. Statistics in Medicine, 28: 2920–2931.

White, I. R., Royston, P. and Wood, A. M. (2011). ‘Multiple imputation using chained equations: issues andguidance for practice’, Statistics in Medicine, 30 (4), 377–399.

39

Willan,A. R. (2006). ‘Statistical Analysis of cost-effectiveness data from randomised clinical trials’. ExpertRevision Pharmacoeconomics Outcomes Research 6, 337–346.

Willan, A. R., Briggs, A. and Hoch,J. (2004). ‘Regression methods for covariate adjustment and subgroupanalysis for non-censored cost-effectiveness data’. Health Econ. 13(5), 461–475.

Willan, A. R., Chen, E., Cook, R. and Lin, D. (2003). ‘Incremental net benefit in randomized clinical trialswith qualify-adjusted survival’. Statistics in Medicine 22, 353–362.

Zellner, A. (1962). ‘An efficient method of estimating seemingly unrelated regressions and tests for aggregationbias’. Journal of the American Statistical Association 57(298), pp. 348–368.

Zellner, A. and Huang D. S. (1962). ‘Further Properties of Efficient Estimators for Seemingly Unrelated Re-gression Equations’. International Economic Review 3(3), pp. 300–313.

Zellner, A. and Theil, H. (1962), ‘Three-stage least squares: Simultaneous estimation of simultaneous equations’.Econometrica 30(1), pp. 54–78.

Zhang, Z., Peluso, M. J., Gross, C. P., Viscoli, C. M. and Kernan, W. N. (2014). ‘Adherence reporting inrandomized controlled trials’. Clinical Trials 11(2), 195–204.

40