iid sampling from intractable distributions

IID Sampling from Intractable Distributions

Sourabh Bhattacharya∗

Abstract

In this article, we propose a novel methodology for drawing iid realizationsfrom any target distribution on the d-dimensional Euclidean space, for any d ≥ 1.No assumption of compact support is necessary for the validity of our theory andmethod. The key idea is to construct an infinite sequence of concentric closedellipsoids in keeping with the insight that the central one tends to support themodal region while the regions between successive concentric ellipsoids (ellipsoidalannuli) increasingly tend to support the tail regions of the target distribution asthe sequence progresses.

Representing the target distribution as an infinite mixture of distributions onthe central ellipsoid and the ellipsoidal annuli, we propose simulation from themixture by selecting a mixture component with the relevant mixing probability, andthen simulating from the mixture component via perfect simulation. We developthe perfect sampling scheme by first developing a minorization inequality of thegeneral Metropolis-Hastings algorithm driven by uniform proposal distributions onthe compact central ellipsoid and the annuli.

In contrast with most of the existing works on perfect sampling, ours is not onlya theoretically valid method, it is practically applicable to all target distributions onany dimensional Euclidean space and very much amenable to parallel computation.We validate the practicality and usefulness of our methodology by generating 10000iid realizations from the standard distributions such as normal, Student’s t with5 degrees of freedom and Cauchy, for dimensions d = 1, 5, 10, 50, 100, as well asfrom a 50-dimensional mixture normal distribution. The implementation time inall the cases are very reasonable, and often less than a minute in our parallelimplementation. The results turned out to be highly accurate.

We also apply our method to draw 10000 iid realizations from the posteriordistributions associated with the well-known Challenger data, a Salmonella dataand the 160-dimensional challenging spatial example of the radionuclide count dataon Rongelap Island. Again, we are able to obtain quite encouraging results withvery reasonable computing time.

Keywords: Ellipsoid; Minorization; Parallel computing; Perfect sampling; Resid-ual distribution; Transformation based Markov Chain Monte Carlo.

∗Sourabh Bhattacharya is an Associate Professor in Interdisciplinary Statistical Research Unit, IndianStatistical Institute, 203, B. T. Road, Kolkata 700108. Corresponding e-mail: [email protected].

1

arX

iv:2

107.

0595

6v2

[st

at.C

O]

15

Dec

202

1

1 Introduction

That Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation, isan understatement. But in spite of this, MCMC seems to be losing its charm, slowly butsteadily, due to the challenges in ascertaining its convergence in practice. Even the mostexperienced MCMC theorists and practitioners are tending to give up on MCMC andlooking for deterministic or even ad-hoc approximations to the posteriors on account ofthe challenges posed by MCMC, which they feel insurmountable.

The concept of perfect simulation, introduced by Propp and Wilson (1996), seemedpromising initially to the MCMC community for solving the convergence problem. Yet,apparent infeasibility of direct practical implementation of the concept in general Bayesianproblems managed to act as a deterrent to the community, to the extent of disbelief. Somuch so, that when the article Mukhopadhyay and Bhattacharya (2012), on perfectsampling in the mixture context, was initially submitted to a so-called reputed journal,the reaction of the reviewers was rejection on account of essentially total disbelief in thesuccess of the idea (one of them even remarked that the work was “too ambitious”). Assuch, we are not aware of any promising research on perfect sampling in the recent years.

In this article, we take up the task of producing iid realizations from any targetdistribution on Rd, where R is the real line and d ≥ 1. Our key idea revolves aroundjudicious construction of an infinite sequence of closed ellipsoids, and representing thetarget distribution as an infinite mixture on the central ellipsoid and the regions betweensuccessive concentric ellipsoids (ellipsoidal annuli). This construction is a reflection ofour insight that the central ellipsoid essentially tends to support the modal region andthe annuli tend to support the tail regions of the target distribution.

Our proposal for generating iid realizations hinges on drawing a mixture componentwith the relevant mixing probability, and the ability to simulate from the chosen com-ponent using a perfect sampling scheme. To develop the perfect sampling mechanism,we first consider the general Metropolis-Hastings algorithm on our compact ellipsoidsand annuli, driven by uniform proposal distributions on these compact sets. Exploitingcompactness, we then develop an appropriate minorization inequality for any given com-pact set in our sequence. Then representing the Metropolis-Hastings chain as a mixtureof two components on any given member of our set sequence, we construct a perfectsampler for the mixture component density on its relevant support set. Furthermore,adopting and rectifying an idea proposed in Murdoch and Green (1998), we avoid thecomputationally expensive coupling from the past (CFTP) algorithm proposed in Proppand Wilson (1996). Bypassing CFTP rendered our perfect sampling idea very muchcomputationally feasible. Integrating all our ideas, we develop a generic parallel algorithmfor simulating iid realizations from general target distributions.

To demonstrate the validity and feasibility of our developments in practice, we applyour parallel algorithm to some standard distributions: normal, Student’s t with 5 degreesof freedom and Cauchy, for dimensions d = 1, 5, 10, 50, 100. We also apply our ideas toa 50-dimensional mixture normal distribution. In each case, we consider an appropriatelocation vector and scale matrix. In all the aforementioned setups, we simulate 10000 iidrealizations with our parallel algorithm, and the results are highly accurate. Moreover,the implementation times turned out to be very reasonable, and several of the exercisestook less than a minute for completion.

Further, we apply our iid method to posterior distributions based on real data.Specifically, we obtain 10000 iid realizations from the posteriors corresponding to the

2

well-known Challenger data and the Salmonella data, consisting of two and three param-eters, respectively. Finally, we generate 10000 iid realizations from the 160-dimensionalposterior distribution corresponding to the highly challenging spatial example of theradionuclide count data on Rongelap Island. The spatial example is considered to be anextremely daunting MCMC convergence problem, yet we are able to obtain 10000 iidsamples in just 30 minutes. As we demonstrate, all our results are highly reliable.

The rest of our article is organized as follows. We introduce our key idea in Section2. The set-wise perfect sampling theory and method are developed in Section 3. InSection 4 we present our complete parallel algorithm for iid sampling from generaltarget distributions. Our experiments with iid simulations from standard univariate andmultivariate distributions are detailed in Section 5, and in Section 6 we provide detailsof our experiments on iid simulations from posterior distributions based on real data.Finally, we summarize our ideas and make concluding remarks in Section 7.

2 The iid sampling idea

Let π(θ) be the target distribution supported on Rd, where d ≥ 1. Here θ = (θ1, . . . , θd)T .

Note that the distribution can be represented as

π(θ) =∞∑i=1

π(Ai)πi(θ), (1)

where Ai are disjoint compact subsets of Rd such that ∪∞i=1Ai = Rd, and

πi(θ) =π(θ)

π(Ai)IAi

(θ), (2)

is the distribution of θ restricted on Ai; IAibeing the indicator function of Ai. In (1),

π(Ai) =∫Aiπ(dθ) ≥ 0. Clearly,

∑∞i=1 π(Ai) = 1.

The key idea of generating iid realizations from π(θ) is to randomly select πi withprobability π(Ai) and then to perfectly simulate from πi.

2.1 Choice of the sets Ai

For some appropriate d-dimensional vector µ and d × d positive definite scale matrixΣ, we shall set Ai = {θ : ci−1 ≤ (θ − µ)TΣ−1(θ − µ) ≤ ci} for i = 1, 2, . . ., where0 = c0 < c1 < c2 < · · · . Note that A1 = {θ : (θ − µ)TΣ−1(θ − µ) ≤ c1}, and for i ≥ 2,Ai = {θ : (θ − µ)TΣ−1(θ − µ) ≤ ci} \ ∪i−1

j=1Aj. Observe that ideally one should set

Ai = {θ : ci−1 < (θ − µ)TΣ−1(θ − µ) ≤ ci}, but since θ has continuous distribution inour setup, we shall continue with Ai = {θ : ci−1 ≤ (θ − µ)TΣ−1(θ − µ) ≤ ci}.

Thus, the first member of the sequence of sets Ai; i ≥ 1, is a closed ellipsoid, while theothers are closed annuli, the regions between two successive concentric closed ellipsoids.The compact ellipsoid A1 tends to support the modal region of the target distribution πand for increasing i ≥ 2, the compact annuli Ai tend to support the tail regions of thetarget distribution. Thus in practice, it is useful to choose

√c1 associated with A1 to be

relatively large compared to√ci for i ≥ 2. As will be seen in Section 2.3.1, for i ≥ 1,√

ci are the radii of d-dimensional balls of the form{Y : Y TY ≤ ci

}which has linear

correspondence with d-dimensional ellipsoids of the form {θ : (θ−µ)TΣ−1(θ−µ) ≤ ci}.Hence, we shall refer to

√ci; i ≥ 1, as radii.

3

2.2 Choice of µ and Σ

Theoretically, any choice of µ ∈ Rd and any d × d positive definite scale matrix Σ isvalid for our method. However, judicious choices of µ and Σ play important roles in theefficiency of our iid algorithm. In this regard, in practice, µ and Σ will be the estimates ofthe mean (if it exists, or co-ordinate-wise median otherwise) and covariance structure of π(it it exists, or some appropriate scale matrix otherwise). These estimates will be assumedto be obtained from some reliably implemented efficient MCMC algorithm. In this regard,we recommend the transformation based Markov Chain Monte Carlo (TMCMC) method-ology introduced by Dutta and Bhattacharya (2014) that simultaneously updates all thevariables using suitable deterministic transformations of some low-dimensional (oftenone-dimensional) random variable defined on some relevant support. Since the method isdriven by the low-dimensional random variable, this leads to drastic dimension reductionin effect, with great improvement in acceptance rates and computational efficiency, whileensuring adequate convergence properties. Our experiences also reveal that addition ofa further deterministic step to TMCMC, in a similar vein as in Liu and Sabatti (2000)often dramatically enhances the convergence properties of TMCMC.

2.3 Dealing with the mixing probabilities π(Ai)

Recall that the key idea of iid sampling from π(θ) is to randomly select πi with probabilityπ(Ai) and then to exactly simulate from πi. However, the mixing probabilities π(Ai) arenot available prior to simulation. Since there are infinite number of such probabilities,even estimation of all these probabilities using only a finite TMCMC sample from π isout of the question. But estimation of π(Ai) using Monte Carlo samples drawn uniformlyfromAi makes sense, and the strategy applies to all i ≥ 1. For the time being assume thatwe have an infinite number of parallel processors, and the i-th processor is used to estimateπ(Ai) using Monte Carlo sampling up to a constant. To elaborate, let π(θ) = Cπ(θ),where C > 0 is the unknown normalizing constant. Then for any Borel set A in the Borelσ-field of Rd, letting L(A) denote the Lebesgue measure of A, observe that

π(Ai) = CL(Ai)

∫π(θ)

1

L(Ai)IAi

(θ)dθ = CL(Ai)E [π(θ)] , (3)

the right hand side being CL(Ai) times the expectation of π(θ) with respect to theuniform distribution on Ai. This expectation can be estimated by generating realizationsof θ from the uniform distribution on Ai, evaluating π(θ) for the realizations and taking

their average. Let π(Ai) denote the Monte Carlo average times L(Ai), so that π(Ai) isan estimate of π(Ai). The required uniform sample generation from Ai and computationof L(Ai) can be achieved in straightforward ways, which we elucidate below.

2.3.1 Uniform sampling from Ai

First consider uniform generation from the d-dimensional ball {Y : Y TY ≤ c1}, which

consists of generation of Xkiid∼ N(0, 1), for k = 1, . . . , d, U ∼ U(0, 1), the uniform

distribution on [0, 1], and setting Y =√c1U

1/dX/‖X‖, where X = (X1, . . . , Xd)T and

‖ · ‖ is the Euclidean norm. Then, letting Σ = BBT , where B is the lower triangularCholesky factor of Σ, θ = µ + BY is a uniform realization from A1. To generateuniformly from Ai = {θ : ci−1 ≤ (θ−µ)TΣ−1(θ−µ) ≤ ci} = {θ : (θ−µ)TΣ−1(θ−µ) ≤

4

ci} \ {θ : (θ −µ)TΣ−1(θ −µ) ≤ ci−1} for i ≥ 2, we shall continue uniform simulation ofθ from {θ : (θ − µ)TΣ−1(θ − µ) ≤ ci} until θ satisfies (θ − µ)TΣ−1(θ − µ) > ci−1.

Note that this basic rejection sampling scheme will be inefficient with high rejectionprobabilities for i ≥ 2 when

√ci −√ci−1 are very small. In such cases, it is important

to replace this algorithm with methods that avoid the rejection mechanism. We indeeddevise such a method for situations with small

√ci −

√ci−1 for i ≥ 2 that completely

bypasses rejection sampling; see Section 6.

2.3.2 Lebesgue measure of Ai

It is well-known (see, for example, Giraud (2015)) that the volume of a d-dimensional

ball of radius c is given by πd/2

Γ(d/2+1)cd, where Γ(x) =

∫∞0tx−1 exp(−t)dt, for x > 0, is the

Gamma function. Hence, in our case, letting |B| denote the determinant of B,

L(A1) = |B| × L({θ : ‖θ‖ ≤√c1}) =

|B|πd/2

Γ(d/2 + 1)cd/21 ,

and for i ≥ 2, Ai has Lebesgue measure

L(Ai) = |B| × L({θ : ‖θ‖ ≤√ci})− |B| × L({θ : ‖θ‖ ≤ √ci−1})

= |B| × πd/2

Γ(d/2 + 1)

(cd/2i − c

d/2i−1

).

The aforementioned ideas show that all the estimates π(Ai), for i ≥ 1, are obtainablesimultaneously by parallel processing. Also assume that the Monte Carlo sample size Ni

is large enough for each i such that for any ε > 0,

1− ε ≤ π(Ai)

π(Ai)≤ 1 + ε. (4)

We further assume that depending on ε, there exists M0 ≥ 1 such that for all M ≥M0,

1− ε ≤ C

M∑i=1

π(Ai) ≤ 1 + ε. (5)

Letting

π(θ) = C∞∑i=1

π(Ai)πi(θ), (6)

which is a well-defined density due to (5), we next provide a rejection sampling basedargument to establish that it is sufficient to perfectly simulate from π(θ) in order tosample perfectly from π(θ).

2.4 A rejection sampling argument for validation of π(θ)

Due to (4), for all θ ∈ Rd,

π(θ) = C

∞∑i=1

π(Ai)×π(Ai)

π(Ai)× πi(θ) ≤ C(1 + ε)

∞∑i=1

π(Ai)πi(θ) = (1 + ε)π(θ).

5

That is, for all θ ∈ Rd,π(θ)

π(θ)≤ 1 + ε, (7)

showing that rejection sampling is applicable for generating samples from π(θ) by sam-pling θ from π(θ) and generating U ∼ U(0, 1) until the following is satisfied:

U <π(θ)

(1 + ε)π(θ). (8)

In (8), θ ∈ Ai for some i ≥ 1. Hence, due to (4),

π(θ)

(1 + ε)π(θ)=

1

(1 + ε)

π(Ai)

π(Ai)>

1− ε1 + ε

. (9)

The right most side of (9) can be made arbitrarily close to 1 for sufficiently small ε > 0.Now, since U ≤ 1 with probability 1, this entails that (8) is satisfied with probabilityalmost equal to one. Thus, for sufficiently small ε > 0, we can safely assume that (8) holdsfor all practical applications, so that any simulation of θ from π(θ) would be accepted inthe rejection sampling step. In other words, to generate samples from the target densityπ(θ) it is sufficient to simply generate from π(θ), for sufficiently small ε.

3 Perfect sampling from π(θ)

For perfect sampling from π(θ) we first need to select πi with probability Cπ(Ai) forsome i ≥ 1, and then need to sample from πi in the perfect sense. Now πi would beselected if ∑i−1

j=1π(Aj)∑∞

j=1π(Aj)

< U <

∑ij=1

π(Aj)∑∞j=1

π(Aj), (10)

where U ∼ U(0, 1) and π(A0) = 0. But the denominator of the left and right hand sidesof (10) involves an infinite sum, which can not be computed. However, note that due to(5),

1− ε ≤∑M

j=1π(Aj)∑∞

j=1π(Aj)

≤ 1 + ε. (11)

The following hold due to (11):

(1− ε)∑i−1

j=1π(Aj)∑M

j=1π(Aj)

≤∑i−1

j=1π(Aj)∑∞

j=1π(Aj)

≤ (1 + ε)

∑i−1j=1

π(Aj)∑Mj=1

π(Aj); (12)

(1− ε)∑i

j=1π(Aj)∑M

j=1π(Aj)

≤∑i

j=1π(Aj)∑∞

j=1π(Aj)

≤ (1 + ε)

∑ij=1

π(Aj)∑Mj=1

π(Aj). (13)

Due to (12) and (13), for sufficiently small ε > 0, in all practical implementations (10)holds if and only if ∑i−1

j=1π(Aj)∑M

j=1π(Aj)

< U <

∑ij=1

π(Aj)∑Mj=1

π(Aj)(14)

6

holds. If, in any case, i = M , then we shall compute π(Aj); j = M + 1, . . . , 2M , andre-check (14) for the same set of uniform random numbers U ∼ U(0, 1) as before byreplacing M with 2M , and continue the procedure until all the indices i are less than M .

Once πi is selected by the above strategy, we can proceed to sample perfectly from πi.We progress towards building an effective perfect sampling strategy by first establishinga minorization condition for the Markov transition kernel.

3.1 Minorization for πi

For πi, we consider the Metropolis-Hastings algorithm in which all the co-ordinates θk;k = 1, . . . , d, are updated in a single block using an independence uniform proposaldensity on Ai given by

qi(θ′) =

1

L(Ai)IAi

(θ′). (15)

Although independence proposal distributions are usually not recommended, particularlyin high dimensions, here this makes good sense. Indeed, since Ai are so constructed thatthe densities πi are expected to be relatively flat on these sets, the uniform proposal isexpected to provide good approximation to the target πi. Furthermore, as we shall show,the uniform proposal will be responsible for obtaining significantly large lower bound forour minorization, which, in turn, will play the defining role in our construction of anefficient perfect sampler for πi.

For θ ∈ Ai, for any Borel set B in the Borel σ-field of Rd, let Pi(θ,B ∩Ai) denotethe corresponding Metropolis-Hastings transition probability for πi. Let si = inf

θ∈Ai

π(θ)

and Si = supθ∈Ai

π(θ). Then, with (15) as the proposal density we have

Pi(θ,B ∩Ai) ≥∫B∩Ai

min

{1,π(θ′)

π(θ)

}qi(θ

′)dθ′

≥(siSi

)× L(B ∩Ai)

L(Ai)

= pi Qi(B ∩Ai), (16)

where pi = si/Si and

Qi(B ∩Ai) =L(B ∩Ai)

L(Ai)

is the uniform probability measure corresponding to (15). Since (16) holds for all θ ∈ Ai,the entire set Ai is a small set.

Here are some important remarks regarding justification of the choice of the uniformproposal density. First, since the uniform proposal is the independence sampler, inde-pendent of the previous state of the underlying Markov chain, the right hand side of(16) is rendered independent of θ, which is a requisite condition for minorization. Forproposal distributions of the form qi(θ

′|θ), infimum of qi(θ′|θ) for θ,θ′ ∈ Ai would be

necessary, which would lead to much smaller lower bound for Pi(θ,B ∩Ai) compared tothe right hand side of (16). This is precisely the reason why we do not consider TMCMCfor iid sampling, whose move types depend upon the previous state. Any independenceproposal density other than the uniform will not get cancelled in the Metropolis-Hastingsacceptance ratio and the corresponding si/Si may be vanishingly small if π is not well-approximated by the independence proposal on Ai. This would make perfect sampling

7

infeasible. Furthermore, as will be seen in Section 3.2, for perfect sampling it is requiredto simulate from the proposal density. But unlike the case of uniform proposal, it maybe difficult to simulate from the proposal on Ai, which has closed ellipsoidal or closedannulus structure.

Observe that si and Si associated with (16) will usually not be available in closedforms. In this regard, let si and Si denote the minimum and maximum of π(·) over the

Monte Carlo samples drawn uniformly from Ai in course of estimating π(Ai) by π(Ai).Then si

Si≤ si

Si. Hence, there exists ηi > 0 such that 1 ≥ si

Si≥ si

Si−ηi > 0. Let pi = si

Si−ηi.

Then it follows from (16) that

Pi(θ,B ∩Ai ≥ pi Qi(B ∩Ai), (17)

which we shall consider for our purpose from now on. In practice, ηi is expected to bevery close to zero, since the Monte Carlo sample size would be sufficiently large. Thus,pi is expected to be very close to pi.

3.2 Split chain

Due to the minorization (17), Pi(θ,B ∩ Ai) admits the following decomposition for allθ ∈ Ai:

Pi(θ,B ∩Ai) = pi Qi(B ∩Ai) + (1− pi) Ri(θ,B ∩Ai), (18)

where

Ri(θ,B ∩Ai) =Pi(θ,B ∩Ai)− pi Qi(B ∩Ai)

1− pi(19)

is the residual distribution.Therefore, to implement the Markov chain Pi(θ,B ∩ Ai), rather than proceeding

directly with the uniform proposal based Metropolis-Hastings algorithm, we can usethe split (18) to generate realizations from Pi(θ,B ∩ Ai). That is, given θ, we cansimulate from Qi with probability pi, and with the remaining probability, can generatefrom Ri(θ, ·).

To simulate from the residual density Ri(θ, ·) we devise the following rejection sam-pling scheme. Let Ri(θ,θ

′) and Pi(θ,θ′) denote the densities of θ′ with respect to Ri(θ, ·)

and Pi(θ, ·), respectively. Then it follows from (18) and (19) that for all θ ∈ Ai,

Ri(θ,θ′) =

Pi(θ,θ′)− pi qi(θ′)1− pi

≤ Pi(θ,θ′)

1− pi

⇔ Ri(θ,θ′)

Pi(θ,θ′)≤ 1

1− pi, for all θ′ ∈ Ai.

Hence, given θ we can continue to simulate θ′ ∼ Pi(θ, ·) using the uniform proposaldistribution (15) and generate U ∼ U(0, 1) until

U <(1− pi)Ri(θ,θ

′)

Pi(θ,θ′)

(20)

is satisfied, at which point we accept θ′ as a realization from Ri(θ, ·).

8

Now

Pi(θ,θ′) = qi(θ

′) min

{1,π(θ′)

π(θ)

}+ ri(θ)Iθ(θ

′)

=1

L(Ai)min

{1,π(θ′)

π(θ)

}+ ri(θ)Iθ(θ

′),

where

ri(θ) = 1−∫Ai

min

{1,π(θ)

π(θ)

}qi(θ)dθ

= 1−∫Ai

min

{1,π(θ)

π(θ)

}1

L(Ai)dθ. (21)

Let ri(θ) denote the Monte Carlo estimate of ri(θ) obtained by simulating θ from the

uniform distribution on Ai and taking the average of min{

1, π(θ)π(θ)

}in (21). Let

Pi(θ,θ′) =

1

L(Ai)min

{1,π(θ′)

π(θ)

}+ ri(θ)Iθ(θ

′), and

Ri(θ,θ′) =

Pi(θ,θ′)− pi qi(θ′)1− pi

.

Given θ and any ε > 0, let the Monte Carlo sample size be large enough such that

1− ε ≤ Pi(θ,θ′)

Pi(θ,θ′)≤ 1 + ε, and

1− ε ≤ Ri(θ,θ′)

Ri(θ,θ′)≤ 1 + ε.

Then (1− ε1 + ε

)Ri(θ,θ

′)

Pi(θ,θ′)≤ Ri(θ,θ

′)

Pi(θ,θ′)≤(

1 + ε

1− ε

)Ri(θ,θ

′)

Pi(θ,θ′). (22)

Due to (22), in all practical implementations, for sufficiently small ε > 0, (20) holds ifand only if

U <(1− pi)Ri(θ,θ

′)

Pi(θ,θ′)

(23)

holds. Hence, we shall carry out our implementations with (23).

3.3 Perfect simulation from πi

For perfect simulation we exploit the following idea first presented in Murdoch and Green(1998). From (18) note that at any given positive time Ti = t, θ′ will be drawn from Qi(·)with probability pi. Also observe that Ti can not take the value 0, since only initializationis done at the zero-th time point. Hence, Ti follows a geometric distribution given by

P (Ti = t) = pi(1− pi)t−1; t = 1, 2, . . . . (24)

9

This is in contrast with the form of the geometric distribution provided in Murdochand Green (1998), which gives positive probability to Ti = 0 (in our context, P (Ti =t) = pi(1 − pi)

t; t = 0, 1, 2, . . ., with respect to their specification). Indeed, all ourimplementations yielded correct iid simulations only for the form (24) and failed toprovide correct answers for that of Murdoch and Green (1998).

Because of (24), it is possible to simulate Ti from the geometric distribution andthen draw θ(−Ti) ∼ Qi(·). Then the chain only needs to be carried forward in time till

t = 0, using θ(t+1) = ψi(θ(t),U

(t+1)i ), where ψi(θ

(t),U(t+1)i ) is the deterministic function

corresponding to the simulation of θ(t+1) from Ri(θ(t), ·) using the set of appropriate

random numbers U(t+1)i ; the sequence {U (t)

i ; t = 0,−1,−2, . . .} being assumed to beavailable before beginning the perfect sampling implementation. The resulting draw θ(0)

sampled at time t = 0 is a perfect sample from πi(·).In practice, storing the uniform random numbers {U (t)

i ; t = 0,−1,−2, . . .} or explicitly

considering the deterministic relationship θ(t+1) = ψi(θ(t),U

(t+1)i ), are not required.

These would be required only if we had taken the search approach, namely, iterativelystarting the Markov chain at all initial values at negative times and carrying the samplepaths to zero.

4 The complete algorithm for iid sample generation

from π

We present the complete algorithm for generating iid realizations from the target distri-bution π as Algorithm 1.

Algorithm 1. IID sampling from the target distribution π

(1) Using TMCMC based estimates, obtain µ and Σ required for the sets Ai;

i ≥ 1.

(2) Fix M to be sufficiently large.

(3) Choose the radii√ci; i = 1, . . . ,M appropriately. A general strategy

for these selections will be discussed in the context of the applications.

(4) Compute the Monte Carlo estimates π(Ai); i = 1, . . . ,M, in parallel processors.

Thus, M parallel processors will obtain all the estimates simultaneously.

(5) Instruct each processor to send its respective estimate to all the other

processors.

(6) Let K be the required iid sample size from the target distribution π.Split the job of obtaining K iid realizations into parallel

processors, each processor scheduled to simulate a single realization

at a time. That is, with K parallel processors, K realizations will

be simulated simultaneously. In each processor, do the following:

(i) Select πi with probability proportional to π(Ai).

10

(ii) If i = M for any processor, then return to Step (2), increase M

to 2M, and repeat the subsequent steps (in Step (4) only π(Ai); i =M + 1, . . . , 2M, need to be computed). Else

(a) Draw Ti ∼ Geometric(pi) with respect to (24).

(b) Draw θ(−Ti) ∼ Qi(·).(c) Using θ(−Ti) as the initial value, carry the chain θ(t+1) ∼ Ri(θ

(t), ·)forward for t = −Ti,−Ti + 1, . . . ,−1.

(d) From the current processor, send θ(0) to processor 0 as a perfect

realization from π.

(7) Processor 0 stores the K iid realizations{θ

(0)1 . . . . ,θ

(0)K

}thus generated

from the target distribution π.

5 Experiments with standard distributions

To illustrate our iid sampling methodology, we now apply the same to generate iid samplesfrom standard distributions, namely, normal, Student’s t (with 5 degrees of freedom),Cauchy, and mixture of normals. We consider both univariate and multivariate versionsof the distributions, with dimensions d = 1, 5, 10, 50, 100. For the location vector, we setν = (ν1, . . . , νd)

T , with νi = i, for i = 1, . . . , d. We denote the d × d scale matrix by S,whose (i, j)-th element is specified by Sij = 10 × exp {−(i− j)2/2}. We also consider amixture of two normals for d = 50 with the means being ν1 = ν and ν2 = 2ν, and thecovariance matrix S the same as the above for both the normal components. The mixingproportions of the normals associated with ν1 and ν2 are 2/3 and 1/3, respectively. Forillustrative purposes, we assume that µ = ν and Σ = S associated with the sets Ai. Inthe normal mixture setup, there are two relevant sets of the form Ai for each i, associatedwith the two mixture components. We denote these sets by A1i and A2i, associated withµ = ν1 and µ = ν2, respectively, while Σ = S in both the cases. In pi, we set ηi = 10−5

in all the simulation experiments.Details of our iid simulation procedure and the results for the different standard

distributions are presented below. All our codes are written in C using the MessagePassing Interface (MPI) protocol for parallel processing. We implemented our codes on a80-core VMWare provided by Indian Statistical Institute. The machine has 2 TB memoryand each core has about 2.8 GHz CPU speed.

5.1 Simulation from normals

For d = 1, 5, 10, 50, 100, we set√c1 = 4 and for i = 2, . . . ,M , set

√ci =

√c1 + (i− 1)/2.

With these choices of the radii, it turned out that M = 71 was sufficient for all thedimensions considered. The choice of such forms of the radii has been guided by theinsight that the target distribution is concentrated onA1, and the setsAi, for increasing i,encapsulate the tail regions. Thus, setting

√c1 somewhat large and keeping the difference√

ci −√ci−1 adequately small for i > 1, seems to be a sensible strategy for giving more

weight to the modal region and less weight to the tail regions. Of course, it has to beborne in mind that a larger value of

√c1 than what is adequate can make p1 extremely

small, since for larger radii s1 would be smaller and S1 larger (see (16)). The explicit

11

choices in our cases are based on experimentations after fixing the forms of the radii fordimension d as follows:

√c1 = rd and for i = 2, . . . ,M ,

√ci =

√c1 + ad × (i − 1). The

values of M , rd and ad that yielded sufficiently large values of pi; i = 1, . . . ,M , for MonteCarlo size Ni = 10000, are considered for our applications. With a given choice of M , ifAM is represented in the set of iid realizations, then we increase M to 2M and repeatthe experiment, as suggested in Section 3 and made explicit in Algorithm 1. This basicmethod of selecting the radii and M will remain the same in all our applications.

With the Monte Carlo size Ni = 10000 used for the construction of π(θ), we thensimulated 10000 iid realizations from π(θ), which took 57 minutes, 1 hour 45 minutes,25 minutes, 6 minutes, and 1 hour 39 minutes on our VMWare, for d = 1, 5, 10, 50, 100,respectively. In all the cases, the true marginal densities and the correlation structuresare very accurately captured by the iid samples generated by our methodology. For thesake of brevity, we display in Figure 1 the results for only four co-ordinates in the contextof d = 100. The exact and estimated correlation structures for the first 20 co-ordinatesare displayed in Figure 2.

5.2 Simulation from Student’s t with 5 degrees of freedom

In the Student’s t setup, M = 1000 turned out to be sufficient for all the dimensionsthat we considered; this significantly larger value of M compared to that for the normalsetup is due to the much thicker tails of the t distribution with 5 degrees of freedom.Depending upon the dimension, we chose the radii differently for adequate performanceof our methodology. The general form of our choice is the same as in the normal setup,that is,

√c1 = rd and

√ci =

√c1 + ad × (i − 1), for i = 2, . . . ,M . Following our

experimentations as discussed in the normal simulation context, we set r1 = 5 and rd = 4,for d = 5, 10, 50, 100. As regards ad, we set a1 = 3.801, a5 = 2.1654, a10 = 2.5, a50 = 0.52and a100 = 0.52. As before, we used Monte Carlo size Ni = 10000 for the constructionof π(θ), and then simulated 10000 iid realizations from π(θ). The times taken for d =1, 5, 10, 50, 100 are 4 minutes, 29 minutes, 1 hour 3 minutes, less than a minute, and lessthan a minute, respectively. The reason for the significantly less time here compared tothe normal setup is again attributable to the thick tails of the t distribution; the flattertails ensure that the infimum si and supremum Si are not much different, entailing thatpi is reasonably large (see (16)), so that simulation from the geometric distribution (24)yielded significantly smaller values of the coalescence time Ti compared to the normalsetup. Thus, in spite of the larger value of M , these small values of Ti led to muchquicker overall computation.

Again, the iid simulations turned out to be highly reliable in all the cases. Figure3 shows the results for four co-ordinates in the context of d = 100, and the exact andestimated correlation structures for the first 20 co-ordinates are displayed in Figure 4.

5.3 Simulation from Cauchy

In this case, we set√c1 = rd and

√ci =

√c1 + ad × (i − 1), for i = 2, . . . ,Md. Thus,

even the value of M now depends upon d. Following our experimentations we obtainedthe following: (r1 = 5, a1 = 3.801,M1 = 2000), (r5 = 0.5, a5 = 0.5,M5 = 3000),(r10 = 0.5, a10 = 0.5,M10 = 3000), (r50 = 4, a50 = 0.52,M50 = 2000), (r100 = 4, a100 =0.52,M100 = 2576). Note that the values of Md are at least twice that for the Student’st distribution, which is clearly due to the much thicker tail of Cauchy compared to t.

12

0.00

0.05

0.10

−10 −5 0 5 10θ1

y IID

True

(a) True and iid-based density for θ1.

0.00

0.05

0.10

20 30θ25

y IID

True

(b) True and iid-based density for θ25.

0.00

0.05

0.10

70 80θ75

y IID

True

(c) True and iid-based density for θ75.

0.00

0.05

0.10

90 95 100 105 110θ100

y IID

True

(d) True and iid-based density for θ100.

Figure 1: Simulation from 100-dimensional normal distribution. The red and blue coloursdenote the iid sample based density and the true density, respectively.

13

●●●

●

●

●

●●●●

●

●

●

●

●●●●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●●●●

●

●

●

●

●●●

●

●

●

●

●● −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

(a) Exact correlation structure.

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

(b) Correlation structure based on iid sam-ples.

Figure 2: Simulation from 100-dimensional normal distribution. True and iid-based correlationstructures for the first 20 co-ordinates.

As before, using Monte Carlo size Ni = 10000 for the construction of π(θ), wethen simulated 10000 iid realizations from π(θ). In this case, the times taken ford = 1, 5, 10, 50, 100 are 2 minutes, 2 minutes, 2 minutes, less than a minute, and 2minutes, respectively. Notice the significantly less time taken in this setup compared toboth normal and Cauchy. Given the thickest tail of Cauchy, the reason for this is thesame as provided in Section 5.2 that explains lesser time for t compared to normal.

As expected, the iid simulations accurately represented the underlying Cauchy dis-tributions. Depictions of the true and iid sample based marginal densities of four co-ordinates are presented by Figure 5 in the context of d = 100. Since the covariancestructure does not exist for Cauchy, we do not present the correlation structure compar-ison unlike the previous cases.

5.4 Simulation from normal mixture

In this case, for both A1i and A2i, we chose r50 = 4 and a50 = 0.5. As in the normalsetup detailed in Section 5.1, M = 71 turned out to be sufficient. We set Ni = 10000as the Monte Carlo size, as in the previous cases. For drawing each iid realization,assuming known mixing probabilities 2/3 and 1/3, we first selected either of the twomixing densities, indexed by j = 1, 2. Then, with the corresponding sequence of sets Aji,we proceeded with perfect sampling from the mixing density indexed by j. The timetaken for the entire implementation was 7 minutes for generating 10000 iid realizations.

As before, the iid realizations very accurately represented the true mixture distribu-tion, Figure 6 bearing some testimony. Figure 7 compares the true correlation structureand that estimated by our iid method. As expected, the estimated structure yieldedby our iid method has been highly accurate. Here it is worth remarking that the truecorrelation structure is also an estimated structure, based on 10000 direct iid realizations

14

0.000

0.025

0.050

0.075

0.100

0.125

−20 0 20θ1

y IID

True


0.000

0.025

0.050

0.075

0.100

0.125

0 25 50θ25

y IID

True


0.000

0.025

0.050

0.075

0.100

0.125

0 25 50 75 100θ75

y IID

True


0.000

0.025

0.050

0.075

0.100

0.125

70 90 110 130 150θ100

y IID

True


Figure 3: Simulation from 100-dimensional Student’s t distribution with 5 degrees of freedom.The red and blue colours denote the iid sample based density and the true density, respectively.

15

●●●

●

●

●

●●●●

●

●

●

●

●●●●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●●●●

●

●

●

●

●●●

●

●

●

●

●● −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20


●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20


Figure 4: Simulation from 100-dimensional Student’s t distribution with 5 degrees of freedom.True and iid-based correlation structures for the first 20 co-ordinates.

from the normal mixture.Details of the method for generating perfect iid realizations from any general multi-

modal target distribution π, is provided in Bhattacharya (2021).

6 Experiments with posterior distributions given real

data

We now consider the application of our iid method to posterior distributions givenreal datasets. Our first application in this regard will be the well-known Challengerspace shuttle problem (see, for example, Dalal et al. (1989), Martz and Zimmer (1992)and Robert and Casella (2004), Dutta and Bhattacharya (2014)). The data can befound in Robert and Casella (2004) and in the supplement of Dutta and Bhattacharya(2014). Here we consider drawing iid realizations from the two-parameter posteriordistribution corresponding to a logit likelihood and compare the iid based and TMCMCbased posteriors.

For our second application, we shall consider the Salmonella example from Chapter6.5.2 of Lunn et al. (2012). The data, three-parameter Poisson log-linear model, thepriors and the BUGS/JAGS codes are also available at http://runjags.sourceforge.net/quickjags.html. In this example as well, we shall compare the iid based andTMCMC based posteriors.

Our third application is on the really challenging, spatial data and model setup relatedto radionuclide count data on Rongelap Island (Diggle et al. (1997), Diggle et al. (1998)).The model for the radionuclide counts is a Poisson log-linear model encapsulating alatent Gaussian process to account for spatial dependence. The model consists of 160

16

http://runjags.sourceforge.net/quickjags.html

http://runjags.sourceforge.net/quickjags.html

0.000

0.025

0.050

0.075

0.100

−400 0 400θ1

y IID

True


0.000

0.025

0.050

0.075

0.100

−500 −250 0 250 500θ25

y IID

True


0.000

0.025

0.050

0.075

0.100

−1000 −500 0 500θ75

y IID

True


0.000

0.025

0.050

0.075

0.100

−400 0 400 800θ100

y IID

True


Figure 5: Simulation from 100-dimensional Cauchy distribution. The red and blue coloursdenote the iid sample based density and the true density, respectively.

17

0.00

0.04

0.08

0.12

−10 −5 0 5 10θ1

y IID

True


0.00

0.02

0.04

0.06

0.08

0 10 20 30θ10

y IID

True


0.00

0.02

0.04

0.06

0.08

10 20 30 40 50 60θ25

y IID

True


0.00

0.02

0.04

0.06

0.08

40 60 80 100θ50

y IID

True


Figure 6: Simulation from 50-dimensional mixture normal distribution. The red and bluecolours denote the iid sample based density and the true density, respectively.

18

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●

●●●●●●●●●●●●●●●

●

●

●

●

●

●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●● −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20


●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●

●●●●●●●●●●●●●●●

●

●

●

●

●

●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●

●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●●

●

●

●

●●●●●●●●●●●●●●●●● −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20


Figure 7: Simulation from 50-dimensional mixture normal distribution. True and iid-basedcorrelation structures for the first 20 co-ordinates.

parameters. The difficulty of MCMC convergence for the posterior analysis in thisproblem is very well-known and several research works have been devoted to this (see, forexample, Rue (2001), Christensen (2004), Christensen (2006)). Very encouragingly, evenin this challenging problem, we are able to very successfully obtain iid samples from theposterior, after using a specific diffeomorphic transformation to the posterior to render itsufficiently thick-tailed to yield reasonably large values of pi. As in the previous examples,here also we shall compare the iid and TMCMC based posteriors.

6.1 Application to the Challenger dataset

In 1986, the space shuttle Challenger exploded during take off, killing the seven astronautsaboard. The explosion was the result of an O-ring failure, a splitting of a ring of rubberthat seals the parts of the ship together. The accident was believed to be caused by theunusually cold weather (31◦F or 0◦C) at the time of launch, as there is reason to believethat the O-ring failure probabilities increase as temperature decreases.

In this regard, for i = 1, . . . , n, where n = 23, let yi denote the indicator variabledenoting failure of the O-ring. Also let xi = ti/max ti, where ti is the temperature(degrees F) at flight time i. We assume that for i = 1, . . . , n, yi ∼ Bernoulli(p(xi))independently, where p(xi) = exp(α + βxi)

/(1 + exp(α + βxi)). As regards the prior for

(α, β), we set π(α, β) ≡ 1 for α, β ∈ R.For TMCMC, we consider the additive transformation detailed in Section 4 of Dutta

and Bhattacharya (2014), with scaling constants in the additive transformation being7.944 and 9.762 respectively for α and β. After accepting the proposed additive TMCMCmove or the previous state in accordance with the TMCMC acceptance probability,we conduct another mixing-enhancement step in a similar vein as in Liu and Sabatti(2000) which, given the current accepted state, proposes another move that gives either

19

forward transformation to both the parameters or backward transformation to both theparameters, with equal probability. Either this final move, or the last accepted additiveTMCMC move is accepted in accordance with the TMCMC acceptance probability. Withthis strategy, we run our chain for 2 × 105 iterations, discarding the first 105 as burn-in. We estimate µ and Σ required for our sets Ai using the stored 105 iterations afterburn-in.

In order to select the radii, we set as before, with d = 2,√c1 = rd and

√ci =√

c1 + ad× (i− 1), for i = 2, . . . ,M . Our experiments indicated that M = 85, rd = 2 andad = 0.02 are appropriate choices. Now, since

√c2−√c1 = 2×0.02 and

√ci−√ci−1 = 0.02

for i ≥ 3, rejection sampling in order to simulate uniformly from Ai for i ≥ 2 is costlycomputationally, since the probability of any point θ drawn uniformly from the ellipsoid{θ : (θ−µ)TΣ−1(θ−µ) ≤ ci} to satisfy (θ−µ)TΣ−1(θ− µ) > ci−1, is small. To handlethis, we adopt the following strategy. First observe that in this situation, Ai here is wellapproximated by the region close to the surface of the above ellipsoid. For sufficiently

large value of d, Y =√ciU

1/dX/‖X‖ with X = (X1, . . . , Xd)T where Xk

iid∼ N(0, 1),for k = 1, . . . , d, U ∼ U(0, 1), represents the uniform distribution essentially about thesurface of the ball centered around 0 and with radius

√ci, so that θ = µ+BY essentially

represents the desired uniform distribution on Ai. Thus, this method completely avoidsrejection sampling. Setting d = 105 turned out to yield quite accurate results with thisstrategy. Indeed, comparison with actual rejection sampling showed that the final resultsare almost indistinguishable.

Instead of setting Ni = 10000 as the Monte Carlo size as in our experiments with thestandard distributions, here we set Ni = 5000. One obvious reason for this is to reducethe computing time, since there are M = 105 setAi where the Monte Carlo method needsto be applied to. Another reason is that, due to the narrow regions covered individuallyby the current sets Ai, not much Monte Carlo realizations are necessary. Again, wecompared the results with actual rejection sampling by setting Ni = 10000, and againthe final results turned out to be almost the same.

There is another subtle issue that deserves mention. The above strategy for avoidingrejection sampling does not guarantee that none of the points sampled from Ai will fallin Ai−1, for i ≥ 2. Indeed, a few points must percolate into Ai−1, but are still countedfor computation of the infimum and supremum of π(θ) in Ai. These points thus renderthe effect of lowering the value of the ratio of the actual infimum and supremum. Tocounter this effect, we set ηi = 0 in the formula for pi. Note however, that the MonteCarlo estimate of π(Ai) is unaffected by this percolation since the effect of the few points,when divided by the Monte Carlo size, is washed away.

As before, we sampled 10000 iid realizations from this Challenger posterior. Theimplementation time of our iid method has been 22 minutes for this problem.

Figure 8 compares the iid based and TMCMC based marginal posteriors of α and β.Here we thinned the 105 TMCMC realizations by using one in every 10 for comparingwith our 10000 iid realizations. Observe that the iid based densities are in very closeagreement with those based on TMCMC.

The TMCMC and iid based posterior correlations between α and β are −0.99771−0.99761, respectively, exhibiting close agreement.

20

0.00

0.01

0.02

0.03

0.04

0 20 40 60α

de

nsity

IID

TMCMC

(a) TMCMC and iid-based density for α.

0.00

0.01

0.02

0.03

−80 −60 −40 −20 0β

de

nsity

IID

TMCMC

(b) TMCMC and iid-based density for β.

Figure 8: Challenger posterior. The red and blue colours denote the iid sample based densityand the TMCMC based density, respectively.

6.2 Application to the Salmonella dataset

Here the data concerned is a mutagenicity assay data on Salmonella. In the relevantexperiment, three plates have each been processed at various doses of quinoline andsubsequently the number of revertant colonies of TA98 Salmonella are measured; seeBreslow (1984), Lunn et al. (2012).

For i = 1, . . . , 6 and j = 1, 2, 3, letting yij denote the number of colonies observed onplate j at dose xi, the model considered (see Lunn et al. (2012)) is yij ∼ Poisson(µ(xi)),where log(µi) = α+β+γ(xi+10)+γxi. The priors for α, β, γ are independent zero-meannormal distributions with standard deviation 100.

We first consider an additive TMCMC application to this problem, with the scal-ing constants in the additive transformation for α, β, γ being 0.21832, 0.056563 and0.00024192, respectively. The forward and backward transformations are considered withequal probability. The rest of the details remain the same as in the Challenger problem.Quite encouraging TMCMC convergence properties are indicated by our informal diag-nostics. As before, µ and Σ are estimated from the stored 105 TMCMC realizations,after ignoring the first 105 realizations as burn-in.

For application of our iid method, we set, for d = 3, rd = 3, ad = 0.02, and M = 200,as suggested by our experiments. Again, we avoided rejection sampling in the same wayas in the Challenger case, and considered Ni = 5000, drawing 10000 realizations from thethree-parameter posterior. The time for implementation here is only a minute.

Figure 9 compares TMCMC and iid sampling with respect to density estimation of theparameters with 105 TMCMC realizations thinned to 10000 realizations by consideringone in every 10. Once again, the iid results are in close agreement with the TMCMCresults.

The estimated correlation structures, however, are not much different with respect tothe competing methods. The TMCMC estimates of the posterior correlations between

21

(α, β), (α, γ) and (β, γ) are −0.9627896, 0.7810736 and −0.8901597, respectively, whilethe respective iid estimates are −0.9625148, 0.7774780 and −0.8887078.

6.3 Application to the Rongelap Island dataset

We now consider application of our iid sampling idea to the challenging spatial statisticsproblem involving radionuclide count data on Rongelap Island. Following Diggle et al.(1998), we model the count data, for i = 1, . . . , 157, as

Yi ∼ Poisson(µi),

whereµi = ti exp{β + S(xi)},

ti being the duration of observation at location xi, β is an unknown parameter and S(·)is a zero-mean Gaussian process with isotropic covariance function of the form

Cov (S(x1), S(x2)) = σ2 exp{−α ‖ x1 − x2 ‖}

for any two locations x1, x2. In the above, (σ2, α) are unknown parameters. We assumeuniform priors on the entire parameter space corresponding to (β, log(σ2), log(α)). Sincethere are 157 latent Gaussian variables S(xi) and three other unknowns (α, β, σ2), thereare in all 160 unknowns in this problem.

To first consider a TMCMC implementation, we adopted the same additive TMCMCsampler of Dey and Bhattacharya (2017) specific to this spatial count data problem (seeSection 8 of their article), but also enhanced its convergence properties by adding the stepwith the flavour of Liu and Sabatti (2000) in the same way as in the previous applicationsregarding Challenger and Salmonella. After discarding the first 106 realizations as burn-in, we stored one in 100 in the next 106 iterations, to obtain 10000 realizations, which weused for estimation of µ and Σ needed for Ai of our iid sampler. The TMCMC exercisetook about 1 hour 40 minutes.

Now, with d = 160, application of the iid method requires appropriate specificationof rd and ad such that pi are sufficiently large. However, in this challenging application,all our experiments failed to yield significant pi. To handle this challenge, we decidedto flatten the posterior distribution in a way that its infimum and the supremum arereasonably close (so that pi are adequately large) on all the Ai. This requires anappropriate flattening bijective transformation. As it turns out, the MCMC literaturealready contains instances of such transformations with stronger properties such thatthe transformation and its inverse are continuously differentiable. Such transformationsare called diffeomorphisms. In fact, diffeomorphisms with stronger properties have beenconsidered in Johnson and Geyer (2012); see also Dey and Bhattacharya (2016) whoimplement such diffeomorphisms in the TMCMC context. However, in the above works,the transformations are meant to reduce thick-tailed target distributions to thin-tailedones. Here we need just the opposite; the Rongelap Island posterior needs to be convertedto a thick-tailed distribution for our iid sampling purpose. As can be anticipated, theinverse transformations for thick to thin tail conversions will be appropriate here. Detailsfollow.

22

0.0

0.5

1.0

1.5

2.5 3.0 3.5 4.0α

de

nsity

IID

TMCMC


0

2

4

6

−0.2 −0.1 0.0 0.1 0.2β

de

nsity

IID

TMCMC


0

500

1000

1500

−0.0005 0.0000 0.0005 0.0010 0.0015γ

de

nsity

IID

TMCMC

(c) TMCMC and iid-based density for γ.

Figure 9: Salmonella posterior. The red and blue colours denote the iid sample based densityand the TMCMC based density, respectively.

23

6.3.1 A suitable diffeomorphic transformation and the related method

Observe that if πθ, the multivariate target density of some random vector θ is interest,then

πγ(γ) = πθ (h(γ)) |det ∇h(γ)| (25)

is the density of γ = h−1(θ), where h is a diffeomorphism. In the above, ∇h(γ) denotesthe gradient of h at γ and det ∇h(γ) stands for the determinant of the gradient of h atγ.

Johnson and Geyer (2012) obtain conditions on h which make πγ super-exponentiallylight. Specifically, they define the following isotropic function h : Rd 7→ Rd:

h(γ) =

{f(‖γ‖) γ

‖γ‖ , γ 6= 0

0, γ = 0,(26)

for some function f : (0,∞) 7→ (0,∞). Johnson and Geyer (2012) confine attention toisotropic diffeomorphisms, that is, functions of the form h where both h and h−1 arecontinuously differentiable, with the further property that det ∇h and det ∇h−1 are alsocontinuously differentiable. In particular, if πβ is only sub-exponentially light, then thefollowing form of f : [0,∞) 7→ [0,∞) given by

f(x) =

{ebx − e

3, x > 1

b

x3 b3e6

+ x be2, x ≤ 1

b,

(27)

where b > 0, ensures that the transformed density πγ of the form (25), is super-exponentiallylight.

In our current context, the d = 160 dimensional target distribution πθ needs tobe converted to some thick-tailed distribution πγ . Hence, we apply the transformationγ = h(θ), the inverse of the transformation considered in Johnson and Geyer (2012).Consequently, the density of γ here becomes

πγ(γ) = πθ(h−1(γ)

)|det ∇h(γ)|−1 , (28)

where h is the same as (26) and f is given by (27). We also give the same transformationto the uniform proposal density (15), so that the new proposal density now becomes

qi(γ′) =

1

L(Ai)IAi

(h−1(γ ′)) |det ∇h(γ ′)|−1. (29)

With (29) as the proposal density for (28), the proposal will not cancel in the acceptanceratio of the Metropolis-Hastings acceptance probability. For any set A, let h(A) =

{h(θ) : θ ∈ A}. Also, now let si = infγ∈h(Ai)

πγ(γ)

qi(γ)and Si = sup

γ∈h(Ai)

πγ(γ)

qi(γ), where πγ(γ) is

the same as (28) but without the normalizing constant. Then, with (29) as the proposaldensity we have

Pi(γ, h(B ∩Ai)) ≥∫h(B∩Ai)

min

{1,πγ(γ ′)/qi(γ

′)

πγ(γ)/qi(γ)

}qi(γ

′)dγ ′

≥ pi Qi(h(B ∩Ai)),

where pi = si/Si and Qi is the probability measure corresponding to (29). The rest ofthe details remain the same as before with necessary modifications pertaining to the newproposal density (29) and the new Metropolis-Hastings acceptance ratio with respect to(29) incorporated in the subsequent steps. Once γ is generated from (28) we transformit back to θ using θ = h−1(γ).

24

6.3.2 Results with the diffeomorphic transformation

With the transformed posterior density corresponding to the form (25), our experimentssuggested that M = 10000, b = 0.01, rd = 7.8 and ad = 0.0005 are appropriate,and rendered pi significantly large for the transformed target. With Ni = 5000 andthe technique of avoiding rejection sampling remaining the same as in the previoustwo applications, generation of 10000 iid realizations took 30 minutes in our parallelprocessors.

Figures 10 and 11 show that the marginal density estimates with the TMCMC basedand the iid based samples are in agreement, as in the previous examples. That thecorrelation structures obtained by both the methods are also in close agreement, is borneout by Figure 12.

7 Summary and conclusion

In this article, we have attempted to provide a valid and practically feasible methodologyfor generating iid realizations from any target distribution on the Euclidean space ofarbitrary dimensionality. The key insight leading to the development is that, perfectsimulation, which lies at the heart of our methodology, is expected to be practicallyfeasible if implemented on appropriately constructed compact sets on which the targetdensity is relatively flat. This motivated us to create an infinite sequence of concentriccompact ellipsoids such that the target density restricted to the central ellipsoid and theellipsoidal annuli are expected to be amenable to very feasible perfect sampling.

We formally attempted to achieve this by representing the target distribution as aninfinite mixture of such densities. Consequently, to generate a realization from the target,a mixture component can be picked with its corresponding mixing probability and arealization can be obtained exactly from the mixture component by perfect simulation.Using minorization, we achieve the goal of appropriate perfect sampler construction forthe mixture densities restricted on the ellipsoids and the annuli. The computationalburden of the CFTP based search method for perfect sampling is avoided in our approachby accommodating the proposed idea of Murdoch and Green (1998) with due rectification.Amalgamating the above ideas coherently enabled us to construct an efficient parallelalgorithm for iid sampling from general target distributions, with or without compactsupports.

Applications of our iid sampling algorithm to various standard univariate and mul-tivariate distributions, with dimension as high as 100, vindicated the accuracy andcomputational efficiency of our ideas. Further applications to real data scenarios againbrought out the usefulness, general applicability and reliability of our method.

The particularly important and challenging real data application is that of the spatialradionuclide count data on Rongelap Island, consisting of 160 parameters and ill-famed forrendering most MCMC approaches ineffective. In this case, using our original strategywe were initially unable to obtain any ellipsoid sequence amenable to feasible perfectsampling. However, application of an appropriate flattening diffeomorphic transformationto the posterior density seemed to be an appropriate response to the challenge, andfacilitated cheap implementation of iid sampling. It thus seems that the class of diffeo-morphic transformations is destined to play an important role in further development ofour research in this extremely important topic of iid sampling from general intractabledistributions.

25

0

50

100

150

0.005 0.010 0.015 0.020 0.025α

de

nsity

IID

TMCMC


0

5

10

15

1.75 1.80 1.85 1.90β

de

nsity

IID

TMCMC


0

2

4

6

0.2 0.3 0.4 0.5 0.6

σ2

de

nsity

IID

TMCMC

(c) TMCMC and iid-based density for σ2.

0

1

2

3

−3.4 −3.2 −3.0 −2.8S1

de

nsity

IID

TMCMC

(d) TMCMC and iid-based density for S1.

Figure 10: Rongelap Island posterior. The red and blue colours denote the iid sample baseddensity and the TMCMC based density, respectively.

26

0

3

6

9

12

−0.10 −0.05 0.00 0.05 0.10 0.15S10

de

nsity

IID

TMCMC

(a) TMCMC and iid-based density for S10.

0

5

10

0.25 0.30 0.35 0.40 0.45S50

de

nsity

IID

TMCMC

(b) TMCMC and iid-based density for S50.

0

5

10

0.35 0.40 0.45 0.50S100

de

nsity

IID

TMCMC

(c) TMCMC and iid-based density for S100.

0

5

10

−0.10 −0.05 0.00 0.05 0.10S157

de

nsity

IID

TMCMC

(d) TMCMC and iid-based density for S157.

Figure 11: Rongelap Island posterior. The red and blue colours denote the iid sample baseddensity and the TMCMC based density, respectively.

27

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●●●●●●●●●●●●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●●

●

●●●●●

●●

●

●●

●

●

●

●●●●●●●●●●●●●●

●

●●

●

●

●

●

●●●●

●

●

●●●●●●●

●

●●

●

●

●

●●

●●●●

●

●●●●●

●●

●

●●

●

●

●

●●●

●●●

●

●●●●●●●

●

●●

●

●

●

●●

●●

●●

●

●●●●●

●●

●

●●

●

●

●

●●

●

●

●

●●

●

●●●●

●●

●

●●

●

●

●

●●

●●●

●

●●●●●

●●●

●

●●

●

●

●

●●●●●●●

●●●●●●●

●

●●

●

●

●●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●

●●●

●●

●●

●

●●

●

●

●

●●●

●●●

●●●●●

●●●

●

●●

●

●

●

●●●●●●●●●●●●

●●

●

●●

●

●

●●●●●●●●●●●●●●

● −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

(a) TMCMC based correlation structure.

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●●●●●●●●●●●●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●●

●

●●●●

●

●●

●

●●

●

●

●

●●●●●●●●●●●●●●

●

●●

●

●

●

●

●●●●●

●

●●●●●●●

●

●●

●

●

●

●●

●●●●

●●●●●●

●●

●

●●

●

●

●

●●●

●●●

●

●●●●●●●

●

●●

●

●

●

●●

●●

●●

●

●●●●●

●●

●

●●

●

●

●

●●

●●

●

●●

●●●●●

●●

●

●●

●

●

●

●●

●●●

●

●●

●●●

●●●

●

●●

●

●

●

●●●●●●●

●●●●●●●

●

●●

●

●

●●●●●●●●●●●●●●●

●

●●

●

●

●

●●●●●●●●●●●●●●

●

●●

●

●

●

●●

●●●●●

●●●

●●●●

●

●●

●

●

●

●●●

●●●

●●●●●

●●●

●

●●

●

●

●

●●●●●●●●●●●●

●●

●

●●

●

●

●●●●●●●●●●●●●●

● −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20


Figure 12: Rongelap Island posterior. TMCMC and iid-based correlation structures for thefirst 20 co-ordinates.

Bibliography

Bhattacharya, S. (2021). IID Sampling from Intractable Multimodal and Variable-

Dimensional Distributions. arXiv preprint.

Breslow, N. (1984). Extra-Poisson Variation in Log-Linear Models. Applied Statistics ,

33, 38–44.

Christensen, O. F. (2004). Monte Carlo Maximum Likelihood in Model-Based Geostatis-

tics. Journal of Computational and Graphical Statistics , 13, 702–718.

Christensen, O. F. (2006). Robust Markov Chain Monte Carlo Methods for Spatial

Generalized Linear Mixed Models. Journal of Computational and Graphical Statistics ,

15, 1–17.

Dalal, S. R., Fowlkes, E. B., and Hoadley, B. (1989). Risk Analysis of the Space Shuttle:

pre-Challenger Prediction of Failure. Journal of the American Statistical Association,

84, 945–957.

Dey, K. K. and Bhattacharya, S. (2016). On Geometric Ergodicity of Additive and

Multiplicative Transformation Based Markov Chain Monte Carlo in High Dimensions.

Brazilian Journal of Probability and Statistics , 30, 570–613. Also available at

“http://arxiv.org/pdf/1312.0915.pdf”.

28

Dey, K. K. and Bhattacharya, S. (2017). A Brief Tutorial on Transformation Based

Markov Chain Monte Carlo and Optimal Scaling of the Additive Transformation.

Brazilian Journal of Probability and Statistics , 31, 569–617. Also available at

“http://arxiv.org/abs/1307.1446”.

Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1997). Geostatistical Analysis of Residual

Contamination from Nuclear Weapons Testing. In V. Barnet and K. F. Turkman,

editors, Statistics for Environment 3: Pollution Assessment and Control , pages 89–

107. Chichester: Wiley.

Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998). Model-Based Geostatistics (with

discussion). Applied Statistics , 47, 299–350.

Dutta, S. and Bhattacharya, S. (2014). Markov Chain Monte Carlo Based

on Deterministic Transformations. Statistical Methodology , 16, 100–116.

Also available at http://arxiv.org/abs/1106.5850. Supplement available at

http://arxiv.org/abs/1306.6684.

Giraud, C. (2015). Introduction to High-Dimensional Statistics . CRC Press, Boca Raton,

FL.

Johnson, L. T. and Geyer (2012). Variable Transformation to Obtain Geometric

Ergodicity in the Random-Walk Metropolis Algorithm. The Annals of Statistics , 40,

3050–3076.

Liu, J. S. and Sabatti, S. (2000). Generalized Gibbs Sampler and Multigrid Monte Carlo

for Bayesian Computation. Biometrika, 87, 353–369.

Lunn, D., Jackson, C., Best, N., Thomas, A., and Spiegelhalter, D. (2012). The BUGS

Book: A Practical Introduction to Bayesian Analysis . CRC Press, Boca Raton, Florida.

Martz, H. F. and Zimmer, W. J. (1992). The Risk of Catastrophic Failure of the Solid

Rocket Boosters on the Space Shuttle. The American Statistician, 46, 42–47.

Mukhopadhyay, S. and Bhattacharya, S. (2012). Perfect Simulation for Mixtures with

Known and Unknown Number of Components. Bayesian Analysis , 7, 675–714.

Murdoch, D. and Green, P. J. (1998). Exact sampling for a continuous state. Scandinavian

Journal of Statistics , 25, 483–502.

Propp, J. G. and Wilson, D. B. (1996). Exact Sampling with Coupled Markov Chains and

Applications to Statistical Mechanics. Random Structures and Algorithms , 9, 223–252.

29

Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods . Springer-Verlag,

New York.

Rue, H. (2001). Fast sampling of Gaussian Markov random fields. Journal of the Royal

Statistical Society. Series B , 63, 325–338.

30

iid sampling from intractable distributions

Documents