mixture models, monte carlo, bayesian updating and dynamic models mike west computing science and...

Mixture Models, Monte Carlo, Bayesian Updating and

Dynamic Models

Mike WestComputing Science and

Statistics, Vol. 24, pp. 325-333, 1993

Abstract

• The development of discrete mixture distributions as approximations to priors and posteriors in Bayesian analysis– Adaptive density estimation

Adaptive mixture modeling• p() : the continuous posterior density function fo

r a continuous parameter vector .• g() : approximating density for importance sampl

ing function.– T-distribution

= {j, j=1,…,n} : random sample from g(). = {wj, j=1,…,n} : weights

– wj = p()/(kg())

– k = )(/)(1 j

n

j j gp

Importance sampling and mixture

• Univariate random sampling– Direct Bayesian interpretations (based on mixt

ures of Dirichlet processes)• Multivariate kernel estimation

– Weighted kernel estimator

(1) ),|()( 2

11 hdwg j

n

jj Vθθθ

Adaptive methods of posterior approximation

• Possible patterns of local dependence exhibited by p() – Easy

• Different regions of parameter space are associated with rather different patterns of dependence.– V is varying with local j and more heavily

depending on j.

Adaptive importance sampling

• The importance sampling distribution is sequently revised based on information derived from successive Monte Carlo samples.

AIS algorithm1. Choose an initial importance sampling

distribution with density g0(), draw a small sample n0 and compute weights, deducing the summary 0 = {g0, n0, 0, 0}. Compute the Monte Carlo estimates and V0 of the mean and variance of p0

2. Construct a revised importance function g1() using (1) with sample size n0, points 0,j, weights w0,j, and variance matrix V0

3. Draw a larger sample of size n1 from g1(), and replace 0 with 1

4. Either stop, and base inferences on 1, or proceed, if desired, to a further revised version g2(), constructed similarly.

0

Approximating mixtures by mixtures

• The computational burden increases if further refinement with larger sample sizes.– Solution) Using a mixtures of several

thousand T

• Reducing the number of components by replacing ‘nearest neighboring’ components with some form of average

Clustering routine1. Set r = n, starting with the r = n component mix

ture, choose k < n as the number of components for the final, reduced mixture.

2. Sort r values of j. in in order of increasing values of weights wj in

3. Find the index i such that j. is the nearest neighbor of 1, and reduce the sets and to sets of size r –1 by removing components 1 and i, and inserting ‘average’ values

i

iii

www

wwww

1*

111* )/()( θθθ

4. Proceed to (2), stopping here only when r = k

5. The resulting mixture, the locations based on the final k averaged values, with associated combined weights, the same scale matrix V but new, and larger, window-width h based on the current, reduced ‘sample size’ r rather than n

Sequential updating and dynamic models

• Updating a prior to posterior distribution for a random quantity or parameter vector based on received data summarized through a likelihood function for the parameter

Dynamic models

• Observation model

• Evolution model

)|(~)|( 0 tttt YpY θθ

)|(~)|( 11 ttett p θθθθ

Computations

• Evolution step– Compute the current prior for t.

• Updating step– Observing Yt, compute the current posterior

11111 )|()|()|( tttttett dDppDp θθθθθ

)|()|()|( 01 tttttt YpDpDp θθθ

Computations: evolution step1. Various features of the prior p(t|Dt-1) of interest

can be computer directly using the Monte Carlo structure

2. The prior density function can be evaluated by Monte Carlo integration at any point

1

1,1,11 ]|[]|[

tn

iitteittt EwDE θθ

1

1,1,11 )|()|(

tn

iitteittt pwDp θθ

3. The initial Monte Carlo samples t* (by t

from p(t| t-1,i)) provide starting values for the evaluation of the prior.

4. t* may be used with weights t-1 to

construct a generalized kernel density estimate of the prior

5. Monte Carlo computations can be performed to approximate forecast moments and probabilities

*

]|[]|[ 0,11

tt

ttittt YEwDYEθ

θ

Computations: updating step

• Adaptive Monte Carlo density

Examples

• Example 1– A normal, linear, first-order polynomial

model

• Example 2– Not normal– Using T distributions

• Example 3– bifurcating

]1),1(255.0[~)|(

]10,05.0[~)|(2

1111

2

ttttt

ttt

N

NY

Examples

• Example 4– Television advertising

mixture models, monte carlo, bayesian updating and dynamic models mike west computing science and...

Documents