1 outline. - university of washingtonfaculty.washington.edu/bajari/metricssp08/8207lecture7.pdf ·...
TRANSCRIPT
1 Outline.
1. MSL.
2. MSM and Indirect Inference.
3. Example of MSM-Berry(1994) and BLP(1995).
4. Ackerberg’s Importance Sampler.
2 Some Examples.
• In this chapter, we will be concerned with exam-ples of the form:
f(y|x, θ) =Zf(y|x, γ)g(γ|θ)
• We shall discuss some methods for estimatingthese problems.
• Then we shall discuss, in detail, two examples.
• The first is the approach to demand estimationfollowing Berry (1994) and BLP(1995).
• The second is discrete games using the Ackerbergimportance sampler.
• That is, the density f(y|x, γ) is allowed to dependon parameters γ that vary within the population.
• The population distribution of these parametersis g(γ|θ).
• Examples of this are random effects models of
unobserved heterogeneity.
• In both cases, integration needs to be performedin order to evaluate the likelihood function.
• We can evaluate the above integral using eitherdeterministic methods for intergration or stochas-
tic methods based on draw pseudo random num-
bers.
• In practice, simulation is more common since theasymptotic theory takes account of the approxi-
mation error in evaluating the integral.
• To do Monte Carlo integration, we need to draws = 1, ..., S pseudo-random deviaties γ(s) from
the density g(γ).
• Our estimate of the integral is then:
bf(yi|xi, θ) = 1
S
SXs=1
f(yi|xi, γ(s)i , θ)
• Note that we are drawing a seperate set of devi-ates for each yi and xi in our simulator.
• Also note that we have assumed that θ does notenter into g(γ).
• In practice, we may have to carefully parameterizeor problems to make sure that this is true.
• We shall consider an example of this shortly.
• Note that we could form a confidence interval to
evaluate the accuracy of our integral.
• This simulation is unbiased and consistent as S →∞.
• The MSL (method of simulated likelihood) esti-mator is:
lnLN(θ) =1
N
Xi
ln bf(yi|xi, θ)
• bf(yi|xi, θ) is a smooth and differentiable functionof θ.
• Our estimator bθMSL is defined as:
bθMSL = argmax lnLN(θ)
• Prop 21.1 from Gourieroux and Monfort (stated
in the text) demonstrates that if
• (i)the likelihood satisfies the regularity conditionsfor asymptotic normality with limit variance Λ−1(θ0)
Λ(θ0) = −p lim⎡⎣N−1 NX
i=1
∂2 ln f(y|x, θ)∂θ∂θ0
⎤⎦
• (ii) the density is simulated with an unbiased sim-ulator
• (iii)if S,N→∞ witih N−1/2/S → 0, then
• N−1(bθMSL − θ0)→d N(0,Λ−1(θ0))
• There are two approaches one can take to calcu-lating the variance.
• A first is to bootstrap your standard errors.
• A second is based on differentiating ln bf(yi|xi, θ) =1S
PSs=1 ln f(yi|xi, γ
(s)i )
• We shall talk about the mechanics of impliment-ing this estimator in a demand estimation exam-
ple from Berry (1994) which is closesly related to
BLP(1995).
3 MSM and Indirect Inference
• A MSM estimator starts by specifying a moment
equation that depends on the distribution of some
random variable.
m(yi, xi, θ) =Zh(yi, xi, γ, θ)g(γ)
• We proceed analogously to the MSL estimatorand use Monte Carlo to simulate this integral.
• We need to draw s = 1, ..., S pseudo-random de-
viaties γ(s) from the density g(γ|θ).
cm(yi, xi, θ) = SXi=1
h(yi, xi, γ(s)i , θ)
• As in the case of SML, we assume that the para-meters θ do not enter into g(γ).
• This may require a careful parameterization of ourmodel.
• If we could perfectly evaluate our integral, in thejust identified case, our GMM estimator would be:
QN(θ) =
⎡⎣ NXi=1
wim(yi, xi, θ)
⎤⎦0 ⎡⎣ NXi=1
wim(yi, xi, θ)
⎤⎦
• where wi corresponds to our weights.
• In MSM, we plug in the sample analogue-
QN(θ) =
⎡⎣ NXi=1
wicm(yi, xi, θ)⎤⎦0 ⎡⎣ NX
i=1
wicm(yi, xi, θ)⎤⎦
• Under the regularity conditions stated in the text,MSM is consistent and asymtotically normal for
a fixed S.
• Unlike MSL, we do not need to let S→∞ in order
to estimate the model.
• Given that we have asymptotic normality, this jus-tifies the use of the bootstrap to compute our
standard errors.
• A final method discussed in the text is indirect
inference.
• This method is useful for models that are easyto simulate, but where it is hard to form MSL or
MSM estimators.
• Suppose that our model specifies that yi = f(xi, η, θ)
where xi is a random variable, η is a stochastic
shock and θ is a vector of parameters.
• Suppose that g(η) is the density for our shock.
• In indirect inference, we start with an auxiliarymodel.
• For example, we might specify ad hoc regressionsof the dependent variable yi on the exogenous
variables xi.
• We run this regression on the true data and comeup with regression coefficients bβ.
• Given a vector of parameters, θ and we could sim-ulate our model to generate a sequence of psuedo
random (eyi, exi) i = 1, ..., N.
• We could then run our auxilary model on thepsuedo random (eyi, exi) to come up with a bβ(θ).
• The indirect inference estimator is:
bθ = ³bβ − bβ(θ)´0Ω−1 ³bβ − bβ(θ)´
• Where Ω−1 is a weight matrix.
• Intuitive, indirect inference attempts to match theparameters of the auxiliary model on the real and
simulated data.
• Essentially, it is an extremely convenient way toform moments for a simulation based model.
• Another attract feature is that often the weightmatrix Ω−1 can be formed using the data (yi, xi)and does not require simulating the model.
4 Example of MSM
• In Berry (1994) and BLP (1995), consumer pref-erences can be written as:
u(xj, ξj, pj, vi; θd)
where:
• xj = (xj,1, ..., xj,K) is a vector of K character-
istics of product j that are observed by both the
economist and the consumer.
• ξj is a characteristic of product j observed by the
consumer but not by the economist.
• pj is the price of good j
• vi vector of taste parameters for consumer i
• θd vector of demand parameters.
• One commonly used specification is the logit modelwith random (normal) coefficients:
uij = xjβi − αpj + ξj + εij
• The K random coefficients are:
βi,k = βk + σkηi,k
ηi,k ∼ N(0, 1), iid
• Consumer i will purchase good j if and only if it isutility maximizing, just as in the previous lecture.
• Question: How do we interpret the parameters ofthis model?
• It is useful to decompose utility into two parts, thefirst is a “mean” level of utility and the second is
a heteroskedastic error terms that captures the
effect of random tastes parameters:
υij =
⎡⎣Xk
xjkσkηi,k
⎤⎦+ εij
δj = xjβ − αpj + ξj
• We can now write utility of person i for productj as:
uij = δj + υij
• Next, we will write the market shares for aggre-gate demand in a particularly convenient fashion.First define the set of “error terms” that makeproduct j utility maximizing given the J dimen-sional vector δ = (δj)
Aj(δ) =nυi = (vij)|δj + vij ≥ δj0 + vij0 for all j
0 6= jo
• The market share of product j can then be writtenas (assuming a law of large numbers):
sj(δ(x, p, ξ), x, θ) =ZAj(δ)
f(υ)dυ
• In this case, the parameter θ is β, α and σ.
• Given θ and the demand for product j actually
observed in the data, esj it must be the case that:
esj = sj(δ(x, p, ξ), x, θ)
• Given θ, this can be expressed as a system of J
equations in J unknowns (the ξj).
• To estimate, we find a set of instruments for theξj.
• We must find a set of instruments correlated withthe endogenous variable pj, but uncorrelated with
the residual ξj.
Commonly used instruments:
1. The product characteristics.
2. Prices of products in other markets (interpret ξjas a demand shifter).
3. Measures of isolation in product space (Pj06=j xj0,k)
4. Cost shifters.
4.1 Computation.
• In this section, I shall outline some of the keysteps needed to actually compute Berry (1994).
• A key step in many programming projects is to
do a fake data experiment/monte carlo study.
• Simulate the model using fixed parameter values.
• Pretend you don’t know the parameter values andestimate.
• This tests the code and sometimes shows you lim-itations of the models.
• One of the best ways to really learn the econo-metrics in a paper is to do a fake data experiment.
• We shall consider as an example the random co-efficinet logit model.
There are basically 4 things we need to do in order tocompute the value of the objective function in orderto do GMM.
1. For a given value of σ and δ, compute the vectorof market shares.
2. For a given value of σ, find the vector δ thatequates the observed market shares and those pre-dicted by the model using the contraction map-ping.
3. Given δ and β, α compute the value of ξ
4. Search for the value of ξ that mimizes the objec-tive function.
• We shall consider these one at a time.
4.2 Computing Market Shares.
• In the random coefficient logit model, we can
compute the market shares, given δ as follows:
sj(δ, σ) =Z exp(δj +
Pk xj,kηi,kσk)
1 +Pj0 exp(δj0 +
Pk xj0,kηi,kσk)
df(ηi)
• In practice, the integral above is computed usingsimulation.
• Make a set of S simulation draws for each j andkeep them fixed for the whole problem.
• Denote the draws as η(s)i , s = 1, ..., S. Let our
simulated shares are:
bsj(δ, σ) = SXs=1
exp(δj +Pk xj,kη
(s)i,kσk)
1 +Pj0 exp(δj0 +
Pk xj0,kη
(s)i,kσk)
• Sometimes importance sampling is useful in orderto improve the speed/accuracy of the integration.
• Imporance sampling is discussed in the text.
• We can compute confidence intervals using stan-dard methods to see whether the simulated mar-
ket shares are well estimated.
4.3 The contraction mapping.
• Next, we wish to find the δ that matches the
observed market shares given σ.
• In Berry and BLP they demonstrate that the fol-lowing is a contraction:
δ(n+1)j = δ
(n)j + ln(esj)− ln(bsj(δ, σ))
• Berry proves that this is a contraction.
• Point: Market shares can be inverted very quicklyin a fairly simple manner!
5 Computing the value of ξ
• The next set is simple. Just let:
ξj = δj − (xjβ − αpj)
where δj is computed using the contraction mapping.
5.1 Computing the value of the objective
function.
• Let Z be the set of instruments.
• The objective function is formulated as in all MSMproblems assuming E (ξ|Z) = 0.
• The econometrician then chooses β, α, and σ inorder to minimize the MSM objective function.
• Standard mathematical programs (MATLAB, GAUSS,IMSL,NAG) contain software for optimization prob-
lems.
• One standard way to proceed is to do a roughglobal search first and then use a derivative based
method second once you have a very rough sense
of the overall shape of the objective function.
• Multiple starting points commonly used in orderto search for multiple local solutions to minimiza-
tion problem..
• See Judd for an overview of numerical minimiza-tion.
6 Ackerberg’s Importance Sampler.
• Finally, we consider a method that is useful whenthe model is difficult to compute but it is possible
to ”reparameterize” the model.
• With the reparameterization, the algorithm is par-allel and can be computed more efficiently.
• Ackerberg (2006) describes this method in detail.
• As an example, we take the problem of discrete/normalform games as studied in Bajari, Hong and Ryan
(2006).
• Consider static entry game (see Bresnahan andReiss (1990,1991), Berry (1992), Tamer (2002),
Ciliberto and Tamer (2003), and Manuszak and
Cohen (2004)).
• The economist observes a cross section of mar-kets.
• The players in the game are a finite set of poten-tial entrants.
• In each market, the potential entrants simultane-ously choose whether to enter.
• Let ai = 1 denote entry and ai = 0 denote nonen-try.
• In applications, the function fi takes a form such
as:
fi =nθ1 · x+ δ
Xj 6=i
aj if ai = 1
0 if ai = 0(1)
• The covariates x are variables which influence theprofitability of entering a market.
• These might include the number of consumers inthe market, average income and market specific
cost indicators.
• The term δ measures the influence of j’s choice
on i’s entry decision.
• The εi(a) capture shocks to the profitability of
entry that are commonly observed by all firms in
the market.
• This is a simultaneous system of logit models!
• In the paper, we also discuss network effects andpeer effects as other examples.
7 The Model.
• Simultaneous move game of complete informa-tion (normal form game).
• There are i = 1, ..., N players with a finite set of
actions Ai.
• A =Yi
Ai.
• Utility ui : A→ R, where R is the real line.
• Let πi is a mixed strategy.
• A Nash equilibrium is a set of best responses.
• Following Bresnahan and Reiss (1990,1991), econo-metrically a game is a discrete choice model.
• Except actions of others are right hand size vari-ables.
ui(a) = fi(x, a; θ1) + εi(a). (2)
• Mean utility, fi(x, a; θ1)
• a, the vector of actions, covariates x, and a pa-
rameters θ.
• εi(a) preference shocks.
• εi(a) ∼ g(ε|θ2) iid.
• Standard random utility model, except utility de-
pends on actions of others.
• E(u) set of Nash equilibrium given a vector of
utilities u.
• λ(π;E(u), β) is probability of equilibrium, π ∈E(u) given parameters β.
• λ(π;E(u), β) corresponds to a finite vector of
probabilities.
• In an application, might let λ depend on
1. Satisfies a particular refinement concept (e.g. trem-
bling hand perfection).
2. The equilibrium is in pure strategies.
3. Maximizes joint payoffs (efficiency).
4. Maximizes profit of incumbent firms (as in airlines
examples).
• In practice, we could create dummy variables forwhether a given equilibrium, π ∈ E(u) satisfies
1-4 above.
• Let x(π, u) be this vector of dummies.
• A straightforward way to model λ is:
λ(π;E(u), β) =exp(β · x(π, u))P
π0∈E(u) exp(β · x(π0, u))(3)
• Computing the set E(u), all of the equilibrium to
a normal form game, is a well understood prob-
lem.
• McKelvy and McLennan (1996) survey the avail-able algorithms in detail.
• Software package Gambit.
• Also not hard to program directly.
8 Estimation.
• P (a|x, θ, β) is probability of a given x, θ and β
P (a|x, θ, β) =Z ⎧⎪⎨⎪⎩X
π∈E(u(x,θ,ε))λ(π;u(x, θ1, ε), β)
³YNi=1π(ai)
´⎫⎪⎬⎪⎭ g(ε|θ2)dε
• Computation of the above integral is facilitated bythe importance sampling procedure of Ackerberg.
• Make a change of variables to integrate over la-tent utility ui instead of over εi.
• With this change, we won’t need to recomputethe equilibria to the game during estimation.
• Estimation not feasible without this insight.
• Often g(ε|θ2) is a simple parametric distribution(e.g. normal, extreme value, etc...)
• For instance, suppose it is normal and let φ(·|μ, σ)denote the normal density.
• Then, the density h(u|θ, x) for the vNM utilities
u is:
h(u|θ, x) =Yi
Ya∈A
φ(εi(a); fi(θ, x, θ) + μ, σ)
where for all i and all a, εi(a) = fi(x, a; θ1)− ui(a)
• Evaluating h(u|θ, x) is cheap.
• Draw s = 1, ..., S vectors of vNM utilities, u(s) =
(u(s)1 , ..., u
(s)N ) from an importance density q(u).
• We can then simulate P (a|x, θ, β) as follows:
bP (a|x, θ, β) =PSs=1
⎧⎪⎨⎪⎩X
π∈E(u)λ(π;E(u(s)), β)
³YNi=1π(ai)
´⎫⎪⎬⎪⎭ h(u(s)|θ,x)q(u(s))
• Precompute E(u(s)) for a large number of ran-
domly drawn games s = 1, ..., S.
• Evaluating bP (a|x, θ, β) at new parameters DOESNOT REQUIRE RECOMPUTINGE(u(s)) for new
s = 1, ..., S!
• Evaluating simulation estimator of bP (a|x, θ, β) ofP (a|x, θ, β) only requires “reweighting” of theequilibrium by new λ and
h(u(s)|θ,x)q(u(s))
.
• This is a cheap computation.
• Normally, the computational expense of structuralestimation comes from recomputing the equilib-
rium many times.
• Also, note that this approach is naturally parallel.
• This saves on the computational time by ordersof magnitude.
• Given bP (a|x, θ, β) we can simulate the likelihoodfunction or simulate the moments.
• The asymptotics are standard.
9 Overview of Bayes.
• In Bayesian econometrics, the econometrician actsas a rational decision maker, just like the agents
in economic theory.
• The econometrician starts off with a prior distri-bution p(θ) about the model parameters.
• The econometrician observes some data y = [y1, ..., yn]
• The econometrician has a model, f(y|θ) which isthe probability of observing y conditional on the
parameters θ.
• The econometrician’s posterior probability by BayesTheorem is:
p(θ|y) = p(θ)f(y|θ)Rp(θ)f(y|θ)dθ
• In the last decade, there has been an explosion inthe applications of Bayesian approaches in statis-
tics and econometrics.
• In Markov chain monte carlo, the econometriciansimulates the poterior distribution p(θ|y).
• This involves simulating a markov chain wherethe invariant (or long run) distribution is exactly
equal to the posterior.
• The output of this simulation is a sequence ofpseudo random numbers θ(1), ..., θ(S).
• Let f(θ(1), ..., θ(S)) denote the density that putsweight 1S on each simulation draw.
• Then under suitably regularity conditions
f(θ(1), ..., θ(S))→d p(θ|y)
• This posterior distribuiton expresses the econome-tricians beliefs about the parameters after seeing
the data.
• We could use the posterior to:
1. Construct a 95 percent credible set (i.e. a set
that has 95 percent posterior probability). This
is analogous to a confidence interval.
2. Simulate the distribution of functions of the pa-
rameters, g(θ) as 1SPs g(θ
(s)).
3. Construct a predictive distribution for forecasting,
i.e. simulate the model for each pseudo-random
draw θ(1), ..., θ(S)
• There are a few advantages to the Bayesian ap-proach.
• The first is that it is very elegant numerically.
• There are some problems in latent variable, timeseries and other models that can only be solved
through the use of Bayes.
• The second is that Bayes is exact in finite samples(up to our ability to approximate the posterior).
• Asymptotic theory depended on first order Taylorseries expansion.
• Essentially, you are hoping that the first and sec-ond derivatives capture the behavior of the func-
tion.
• This can be a very poor approximation in finitesamples in some cases.
• In Bayes, such linearizations are not required.
• However, you do need to be able to simulate theposterior accurately.
• Third, Bayes fits into decision theory and can helpto guide rational decision making.
• For example, Rossi et. al. (1997) study the prob-lem of target marketing.
• You see each household in a scanner panel dataset choose a handful of times.
• You can use Bayes’ theorem to update on the
random coefficients of each household i.
• e.g. if there are 10,000 housholds, you have a
posterior for the preferences parameters for each
of the 10,000 household conditional on its pur-
chase history.
• Using this posterior, you can form your posterior
beliefs about the profits from sending a coupon
to an individual household.
• Fourth, in Bayes you can form posteriors over
models.
• Suppose that you have m = 1, ...,M probability
models, fm(y|θ).
• Then f(y|θ) =MX
m=1
pmfm(y|θ) where pm is the
probability of model m.
• You can use Bayes theorem to express your pos-
terior probability pm about model m.
• This is a very elegant way to handle non-nestedmodel and might be superior to classical approaches
to non-nested testing.
• Finally, in Bayesian econometrics, you can workwith models that are not identified or that do not
exhibit normal asymptotics.
• A flat likelihood does not affect your ability to
construct a posterior.
• We shall illustrate Bayesian methods by consider-ing simulation of the multinomial probit.
10 Markov Chains.
• Two common ways to conduct MCMC are Gibbssampling and Metropolis.
• A normal random walk metropolis works as fol-
lows.
• First, the econometrician comes up with a roughguess θ0 at the MLE.
• Second, come up with a rough guess at I0 at theinformation matrix using the hessian of the MLE.
• A sequence of psueorandom values θ(1), ..., θ(S)
is drawn as follows. Given θ(s), we draw θ(s+1)
as follows:
1. First, draw a candidate value eθ ∼ N(θ(s), I0)
2. Second, compute α = min p(eθ)f(y|eθ)p(θ(s))f(y|θ(s))
, 1
3. Set θ(s+1) = eθ with probability α and θ(s+1) =
θ(s) with probability α.
• Implimenting this algorithm simply requires the
econometrician to evaluate the likihood repeat-
edly and draw normal deviates.
• A second algorithm for constructing a Markov
Chain is Gibbs sampling.
• Partition parameters into θ1, ..., θd blocks
• Let pk(θk|θ1, ..., θk−1, θk+1, ..., θd) denote the con-ditional distribution of the kth block of parame-
ters given the others.
• In some applications, this distribution can be con-venient to form even if the entire likelihood is
quite complicated!
• Starting with an initial value θ0, Gibbs samplingworks as follows. Given θ(s)
1. Draw θ(s+1)1 ∼ p1(θ1|θ
(s)2 , θ
(s)2 ..., θ
(s)d )
2. Draw θ(s+1)2 ∼ p2(θ1|θ
(s+1)1 , θ
(s)3 , θ
(s)4 ..., θ
(s)d )
3. Draw θ(s+1)3 ∼ p3(θ1|θ
(s+1)1 , θ
(s+1)2 , θ
(s)4 ..., θ
(s)d )
...
d. Draw θ(s+1)d ∼ pd(θ1|θ
(s+1)1 , θ
(s+1)2 , θ
(s+1)4 ..., θ
(s+1)d−1 )
d+1 Return to 1.
11 Bayesian Analysis of Regression
• MCMC/Gibbs sampling are particularly powerfulin problems with latent variables.
• In discrete choice, utiltity is latent to the econo-metrician.
• In a multinomial probit, if utility was observed bythe econometrician, estimating parameters would
boil down to linear regression.
• For our analysis, it will be useful to consider theBayesian analysis of linear regression.
• A key step in Rossi et. al.’s paper essentially in-volve the Bayesian analysis of a normal linear re-
gression.
• The analysis here follows Geweke’s textbook, Con-temporary Bayesian Econometrics (an excellent
intro to the subject).
• y is a T × 1 vector of dependent variables
• X is a T ×k matrix of covariates (nonstochastic)
• Assume that error terms are normal, homoskedas-tic.
• y|β, h,X ∼ N(Xβ, h−1IT )
p(y|β, h,X) = (2π)−T/2hT/2 exp(−h(y−Xβ)0(y−Xβ)/2))
• h is called the precision parameter (inverse of vari-
ance)
• IT is a scalar covariance matrix
• This is our likelihood function.
• Recall that Bayes theorem implies that the pos-
terior distribution of the model parametes is the
prior times the likelihood.
• We need to specify a prior distribution for ourmodel parameters.
• p(β) = N(β,H−1)
p(β) = (2π)−k/2|H|1/2 exp(−(β − β)0H−1(β − β)))
• The prior on beta is normal with prior mean, βand prior precision H
• The prior distribution on h is s2h ∼ χ2(v)
p(h) =h2v/2Γ(v/2)
i−1(s)v/2h(v−2)/2 exp(−s2h/2)
• Remark- this is essentially a gamma distribution,rewritten in a manner that will be convenient for
reasons below.
• This form for the prior is chosen because of conju-gacy, i.e. the posterior distribution can be written
in an analytically convenient manner.
• Now recall tht the posterior is proportional to theprior time the likelihood.
• That is, p(β, h|X, y) ∝ p(β)p(h)p(y|β, h,X)
• Combining the equations above yields p(β, h|X, y) ∝:
(2π)−T/2hT/2h2v/2Γ(v/2)
i−1 |H|1/2(s)v/2h(T+v−2)/2 exp(−s2h/2)
exp
"−(β − β)0H−1(β − β)/2−h(y −Xβ)0(y −Xβ)/2
#
• Now recall from Cameron and Trivedi, the idea
behind Gibbs sampling is to ”block” the parame-
ters into a set of convenient conditional distribu-
tions.
• In this case, we will want to block p(β|h,X, y)
and p(h|β,X, y)
• Let’s first derive p(β|h,X, y).
• It is obvious from the agove expression, that β isgoing to be normally distributed.
• We will want to complete the the square insidethe exp() to rewrite the expression in the form(β − eβ)0fH−1(β − eβ)
• Then eβ will be the posterior mean and fH theposterior precision
• Distributing terms and completing the square inβ yields that:
(β − β)0H−1(β − β) + h(y −Xβ)0(y −Xβ) =
(β − β)0H−1(β − β) + h(y0 −Xβ)0(y0 −Xβ)
where y0 = fitted ols values
= (β − eβ)0fH−1(β − eβ) +QfH = H + hX 0Xeβ = fH−1(H−1β + hX 0Xb)
where b is the ols estimate of β
Q is a constant that does not depend on β
• Note that the posterior precision is a weightedaverage of the prior precison and the ols estimate
of the precision
• The posterior estimate of eβ involves a weightedaverage of the prior mean and the ols estimate.
• The weights depend on the posterior precision,the prior precision and the ols estimate of the
precision.
• As the sample size becomes sufficiently large, thedata will ”swamp” the prior.
• The number of terms in the likelihood is a func-tion of T and grows with the sample size
• The number of terms in the prior remains fixed.
• Next, we have to derive the posterior in h
• If you look at the prior times the likelihood, itis obvious that the posterior will be of the form
p(h|β,X, y) of hα exp(−hω)
• If we can derive α and ω we can express the pos-terior
• By some straightforward, albeit tedious, algebrawe can write the posterior distribution as p(h|β,X, y):
es2h ∼ χ2(ev)es = s+ (y0 −Xβ)0(y0 −Xβ)ev = v + T
• A Gibbs sampler generates a pseudo-random se-
quence (h(s), β(s))s = 1, ..., S using the following
markov chain,
1. Given (h(s), β(s)), draw β(s+1) ∼ p(β|h(s),X, y)
2. Given β(s+1) draw h(s+1) ∼ p(h|β(s+1),X, y)
3. Return to 1
• This example illustrates the importance of conju-gacy.
• That is, choosing our prior and likelihoods to bethe ”right” functional form greatly simplifies the
analysis of the posterior distribution.
• Standard texts in Bayesian statistics (e.g. Berger/Bernardoand Smith) have appendices that lay out conju-
gate distributions.
12 Bayesian Multinomial Probit
• The multinomial probit is closely related to theanalysis of the normal linear model.
• The multinomial probit is defined as:
yij = xijβ + εij (4)
var(εi1, ..., εiJ) = h−1IJcij = 1yij > yij0 for j
0 6= j
• In the above, yij is the utility of person i for
alternative j
• εij is the stochastic preference shock
• xij are covariates that enter into i’s utility
• cij = 1 if i chooses j
• If the yij were known, then we could use the Gibbssampler above to estimate β and h
• However, the yij are latent variables and thereforewe do data augmentation.
• The idea behind data augmentation is simple— weintegrate out the distribution of the variables that
we do not see.
• Follwing the notation in Cameron and Trivedi, letf(θ|y, y∗) denote the posterior conditional on theobserved variables, y and the latent variables, y∗.
• Let f(y∗|y, θ) denote the distribution of the latentvariable conditional on y and parameters.
• Then the posterior can be written as:
p(θ|y) =Zf(θ|y, y∗)f(y∗|y, θ)dy∗
• Taking account of the latent variable simply in-volves an additional Gibbs step.
• The distribution of the latent utility yijis a tru-
cated normal distribution.
• If cij = 1, yij is a truncated normal with mean pa-rameter β, precision h and lower truncation point
maxyij0, j0 6= j.
• If cij = 0, yij is a truncated normal with mean
parameter β, precision h and upper truncation
point maxyij.
• The Gibbs sampler for the multinomial probit sim-ply adds the data augmentation step above:
• A Gibbs sampler generates a pseudo-random se-
quence (h(s), β(s),½y(s)ij
¾i∈I,j∈J
) s = 1, ..., S us-
ing the following markov chain
1. Given (h(s), β(s)), draw β(s+1) ∼ p(β|h(s),X, y, C)
2. Given β(s+1) draw h(s+1) ∼ p(h|β(s+1),X, y, C)
3. For each I, draw y(s+1)i1 ∼ p(h|β(s+1),X, y
(s)i2 , ..., y
(s)iJ , C)
4. Draw y(s+1)i2 ∼ p(h|β(s+1),X, y
(s+1)i1 , ..., y
(s)iJ , C)
5. ...
6. Draw y(s+1)iJ ∼ p(h|β(s+1),X, y
(s+1)i1 , ..., y
(s+1)iJ−1 , C)
7. Return to 1