1 outline. - university of washingtonfaculty.washington.edu/bajari/metricssp08/8207lecture7.pdf ·...

1 Outline.

1. MSL.

2. MSM and Indirect Inference.

3. Example of MSM-Berry(1994) and BLP(1995).

4. Ackerberg’s Importance Sampler.

2 Some Examples.

• In this chapter, we will be concerned with exam-ples of the form:

f(y|x, θ) =Zf(y|x, γ)g(γ|θ)

• We shall discuss some methods for estimatingthese problems.

• Then we shall discuss, in detail, two examples.

• The first is the approach to demand estimationfollowing Berry (1994) and BLP(1995).

• The second is discrete games using the Ackerbergimportance sampler.

• That is, the density f(y|x, γ) is allowed to dependon parameters γ that vary within the population.

• The population distribution of these parametersis g(γ|θ).

• Examples of this are random effects models of

unobserved heterogeneity.

• In both cases, integration needs to be performedin order to evaluate the likelihood function.

• We can evaluate the above integral using eitherdeterministic methods for intergration or stochas-

tic methods based on draw pseudo random num-

bers.

• In practice, simulation is more common since theasymptotic theory takes account of the approxi-

mation error in evaluating the integral.

• To do Monte Carlo integration, we need to draws = 1, ..., S pseudo-random deviaties γ(s) from

the density g(γ).

• Our estimate of the integral is then:

bf(yi|xi, θ) = 1

S

SXs=1

f(yi|xi, γ(s)i , θ)

• Note that we are drawing a seperate set of devi-ates for each yi and xi in our simulator.

• Also note that we have assumed that θ does notenter into g(γ).

• In practice, we may have to carefully parameterizeor problems to make sure that this is true.

• We shall consider an example of this shortly.

• Note that we could form a confidence interval to

evaluate the accuracy of our integral.

• This simulation is unbiased and consistent as S →∞.

• The MSL (method of simulated likelihood) esti-mator is:

lnLN(θ) =1

N

Xi

ln bf(yi|xi, θ)

• bf(yi|xi, θ) is a smooth and differentiable functionof θ.

• Our estimator bθMSL is defined as:

bθMSL = argmax lnLN(θ)

• Prop 21.1 from Gourieroux and Monfort (stated

in the text) demonstrates that if

• (i)the likelihood satisfies the regularity conditionsfor asymptotic normality with limit variance Λ−1(θ0)

Λ(θ0) = −p lim⎡⎣N−1 NX

i=1

∂2 ln f(y|x, θ)∂θ∂θ0

⎤⎦

• (ii) the density is simulated with an unbiased sim-ulator

• (iii)if S,N→∞ witih N−1/2/S → 0, then

• N−1(bθMSL − θ0)→d N(0,Λ−1(θ0))

• There are two approaches one can take to calcu-lating the variance.

• A first is to bootstrap your standard errors.

• A second is based on differentiating ln bf(yi|xi, θ) =1S

PSs=1 ln f(yi|xi, γ

(s)i )

• We shall talk about the mechanics of impliment-ing this estimator in a demand estimation exam-

ple from Berry (1994) which is closesly related to

BLP(1995).

3 MSM and Indirect Inference

• A MSM estimator starts by specifying a moment

equation that depends on the distribution of some

random variable.

m(yi, xi, θ) =Zh(yi, xi, γ, θ)g(γ)

• We proceed analogously to the MSL estimatorand use Monte Carlo to simulate this integral.

• We need to draw s = 1, ..., S pseudo-random de-

viaties γ(s) from the density g(γ|θ).

cm(yi, xi, θ) = SXi=1

h(yi, xi, γ(s)i , θ)

• As in the case of SML, we assume that the para-meters θ do not enter into g(γ).

• This may require a careful parameterization of ourmodel.

• If we could perfectly evaluate our integral, in thejust identified case, our GMM estimator would be:

QN(θ) =

⎡⎣ NXi=1

wim(yi, xi, θ)

⎤⎦0 ⎡⎣ NXi=1

wim(yi, xi, θ)

⎤⎦

• where wi corresponds to our weights.

• In MSM, we plug in the sample analogue-

QN(θ) =

⎡⎣ NXi=1

wicm(yi, xi, θ)⎤⎦0 ⎡⎣ NX

i=1

wicm(yi, xi, θ)⎤⎦

• Under the regularity conditions stated in the text,MSM is consistent and asymtotically normal for

a fixed S.

• Unlike MSL, we do not need to let S→∞ in order

to estimate the model.

• Given that we have asymptotic normality, this jus-tifies the use of the bootstrap to compute our

standard errors.

• A final method discussed in the text is indirect

inference.

• This method is useful for models that are easyto simulate, but where it is hard to form MSL or

MSM estimators.

• Suppose that our model specifies that yi = f(xi, η, θ)

where xi is a random variable, η is a stochastic

shock and θ is a vector of parameters.

• Suppose that g(η) is the density for our shock.

• In indirect inference, we start with an auxiliarymodel.

• For example, we might specify ad hoc regressionsof the dependent variable yi on the exogenous

variables xi.

• We run this regression on the true data and comeup with regression coefficients bβ.

• Given a vector of parameters, θ and we could sim-ulate our model to generate a sequence of psuedo

random (eyi, exi) i = 1, ..., N.

• We could then run our auxilary model on thepsuedo random (eyi, exi) to come up with a bβ(θ).

• The indirect inference estimator is:

bθ = ³bβ − bβ(θ)´0Ω−1 ³bβ − bβ(θ)´

• Where Ω−1 is a weight matrix.

• Intuitive, indirect inference attempts to match theparameters of the auxiliary model on the real and

simulated data.

• Essentially, it is an extremely convenient way toform moments for a simulation based model.

• Another attract feature is that often the weightmatrix Ω−1 can be formed using the data (yi, xi)and does not require simulating the model.

4 Example of MSM

• In Berry (1994) and BLP (1995), consumer pref-erences can be written as:

u(xj, ξj, pj, vi; θd)

where:

• xj = (xj,1, ..., xj,K) is a vector of K character-

istics of product j that are observed by both the

economist and the consumer.

• ξj is a characteristic of product j observed by the

consumer but not by the economist.

• pj is the price of good j

• vi vector of taste parameters for consumer i

• θd vector of demand parameters.

• One commonly used specification is the logit modelwith random (normal) coefficients:

uij = xjβi − αpj + ξj + εij

• The K random coefficients are:

βi,k = βk + σkηi,k

ηi,k ∼ N(0, 1), iid

• Consumer i will purchase good j if and only if it isutility maximizing, just as in the previous lecture.

• Question: How do we interpret the parameters ofthis model?

• It is useful to decompose utility into two parts, thefirst is a “mean” level of utility and the second is

a heteroskedastic error terms that captures the

effect of random tastes parameters:

υij =

⎡⎣Xk

xjkσkηi,k

⎤⎦+ εij

δj = xjβ − αpj + ξj

• We can now write utility of person i for productj as:

uij = δj + υij

• Next, we will write the market shares for aggre-gate demand in a particularly convenient fashion.First define the set of “error terms” that makeproduct j utility maximizing given the J dimen-sional vector δ = (δj)

Aj(δ) =nυi = (vij)|δj + vij ≥ δj0 + vij0 for all j

0 6= jo

• The market share of product j can then be writtenas (assuming a law of large numbers):

sj(δ(x, p, ξ), x, θ) =ZAj(δ)

f(υ)dυ

• In this case, the parameter θ is β, α and σ.

• Given θ and the demand for product j actually

observed in the data, esj it must be the case that:

esj = sj(δ(x, p, ξ), x, θ)

• Given θ, this can be expressed as a system of J

equations in J unknowns (the ξj).

• To estimate, we find a set of instruments for theξj.

• We must find a set of instruments correlated withthe endogenous variable pj, but uncorrelated with

the residual ξj.

Commonly used instruments:

1. The product characteristics.

2. Prices of products in other markets (interpret ξjas a demand shifter).

3. Measures of isolation in product space (Pj06=j xj0,k)

4. Cost shifters.

4.1 Computation.

• In this section, I shall outline some of the keysteps needed to actually compute Berry (1994).

• A key step in many programming projects is to

do a fake data experiment/monte carlo study.

• Simulate the model using fixed parameter values.

• Pretend you don’t know the parameter values andestimate.

• This tests the code and sometimes shows you lim-itations of the models.

• One of the best ways to really learn the econo-metrics in a paper is to do a fake data experiment.

• We shall consider as an example the random co-efficinet logit model.

There are basically 4 things we need to do in order tocompute the value of the objective function in orderto do GMM.

1. For a given value of σ and δ, compute the vectorof market shares.

2. For a given value of σ, find the vector δ thatequates the observed market shares and those pre-dicted by the model using the contraction map-ping.

3. Given δ and β, α compute the value of ξ

4. Search for the value of ξ that mimizes the objec-tive function.

• We shall consider these one at a time.

4.2 Computing Market Shares.

• In the random coefficient logit model, we can

compute the market shares, given δ as follows:

sj(δ, σ) =Z exp(δj +

Pk xj,kηi,kσk)

1 +Pj0 exp(δj0 +

Pk xj0,kηi,kσk)

df(ηi)

• In practice, the integral above is computed usingsimulation.

• Make a set of S simulation draws for each j andkeep them fixed for the whole problem.

• Denote the draws as η(s)i , s = 1, ..., S. Let our

simulated shares are:

bsj(δ, σ) = SXs=1

exp(δj +Pk xj,kη

(s)i,kσk)

1 +Pj0 exp(δj0 +

Pk xj0,kη

(s)i,kσk)

• Sometimes importance sampling is useful in orderto improve the speed/accuracy of the integration.

• Imporance sampling is discussed in the text.

• We can compute confidence intervals using stan-dard methods to see whether the simulated mar-

ket shares are well estimated.

4.3 The contraction mapping.

• Next, we wish to find the δ that matches the

observed market shares given σ.

• In Berry and BLP they demonstrate that the fol-lowing is a contraction:

δ(n+1)j = δ

(n)j + ln(esj)− ln(bsj(δ, σ))

• Berry proves that this is a contraction.

• Point: Market shares can be inverted very quicklyin a fairly simple manner!

5 Computing the value of ξ

• The next set is simple. Just let:

ξj = δj − (xjβ − αpj)

where δj is computed using the contraction mapping.

5.1 Computing the value of the objective

function.

• Let Z be the set of instruments.

• The objective function is formulated as in all MSMproblems assuming E (ξ|Z) = 0.

• The econometrician then chooses β, α, and σ inorder to minimize the MSM objective function.

• Standard mathematical programs (MATLAB, GAUSS,IMSL,NAG) contain software for optimization prob-

lems.

• One standard way to proceed is to do a roughglobal search first and then use a derivative based

method second once you have a very rough sense

of the overall shape of the objective function.

• Multiple starting points commonly used in orderto search for multiple local solutions to minimiza-

tion problem..

• See Judd for an overview of numerical minimiza-tion.

6 Ackerberg’s Importance Sampler.

• Finally, we consider a method that is useful whenthe model is difficult to compute but it is possible

to ”reparameterize” the model.

• With the reparameterization, the algorithm is par-allel and can be computed more efficiently.

• Ackerberg (2006) describes this method in detail.

• As an example, we take the problem of discrete/normalform games as studied in Bajari, Hong and Ryan

(2006).

• Consider static entry game (see Bresnahan andReiss (1990,1991), Berry (1992), Tamer (2002),

Ciliberto and Tamer (2003), and Manuszak and

Cohen (2004)).

• The economist observes a cross section of mar-kets.

• The players in the game are a finite set of poten-tial entrants.

• In each market, the potential entrants simultane-ously choose whether to enter.

• Let ai = 1 denote entry and ai = 0 denote nonen-try.

• In applications, the function fi takes a form such

as:

fi =nθ1 · x+ δ

Xj 6=i

aj if ai = 1

0 if ai = 0(1)

• The covariates x are variables which influence theprofitability of entering a market.

• These might include the number of consumers inthe market, average income and market specific

cost indicators.

• The term δ measures the influence of j’s choice

on i’s entry decision.

• The εi(a) capture shocks to the profitability of

entry that are commonly observed by all firms in

the market.

• This is a simultaneous system of logit models!

• In the paper, we also discuss network effects andpeer effects as other examples.

7 The Model.

• Simultaneous move game of complete informa-tion (normal form game).

• There are i = 1, ..., N players with a finite set of

actions Ai.

• A =Yi

Ai.

• Utility ui : A→ R, where R is the real line.

• Let πi is a mixed strategy.

• A Nash equilibrium is a set of best responses.

• Following Bresnahan and Reiss (1990,1991), econo-metrically a game is a discrete choice model.

• Except actions of others are right hand size vari-ables.

ui(a) = fi(x, a; θ1) + εi(a). (2)

• Mean utility, fi(x, a; θ1)

• a, the vector of actions, covariates x, and a pa-

rameters θ.

• εi(a) preference shocks.

• εi(a) ∼ g(ε|θ2) iid.

• Standard random utility model, except utility de-

pends on actions of others.

• E(u) set of Nash equilibrium given a vector of

utilities u.

• λ(π;E(u), β) is probability of equilibrium, π ∈E(u) given parameters β.

• λ(π;E(u), β) corresponds to a finite vector of

probabilities.

• In an application, might let λ depend on

1. Satisfies a particular refinement concept (e.g. trem-

bling hand perfection).

2. The equilibrium is in pure strategies.

3. Maximizes joint payoffs (efficiency).

4. Maximizes profit of incumbent firms (as in airlines

examples).

• In practice, we could create dummy variables forwhether a given equilibrium, π ∈ E(u) satisfies

1-4 above.

• Let x(π, u) be this vector of dummies.

• A straightforward way to model λ is:

λ(π;E(u), β) =exp(β · x(π, u))P

π0∈E(u) exp(β · x(π0, u))(3)

• Computing the set E(u), all of the equilibrium to

a normal form game, is a well understood prob-

lem.

• McKelvy and McLennan (1996) survey the avail-able algorithms in detail.

• Software package Gambit.

• Also not hard to program directly.

8 Estimation.

• P (a|x, θ, β) is probability of a given x, θ and β

P (a|x, θ, β) =Z ⎧⎪⎨⎪⎩X

π∈E(u(x,θ,ε))λ(π;u(x, θ1, ε), β)

³YNi=1π(ai)

´⎫⎪⎬⎪⎭ g(ε|θ2)dε

• Computation of the above integral is facilitated bythe importance sampling procedure of Ackerberg.

• Make a change of variables to integrate over la-tent utility ui instead of over εi.

• With this change, we won’t need to recomputethe equilibria to the game during estimation.

• Estimation not feasible without this insight.

• Often g(ε|θ2) is a simple parametric distribution(e.g. normal, extreme value, etc...)

• For instance, suppose it is normal and let φ(·|μ, σ)denote the normal density.

• Then, the density h(u|θ, x) for the vNM utilities

u is:

h(u|θ, x) =Yi

Ya∈A

φ(εi(a); fi(θ, x, θ) + μ, σ)

where for all i and all a, εi(a) = fi(x, a; θ1)− ui(a)

• Evaluating h(u|θ, x) is cheap.

• Draw s = 1, ..., S vectors of vNM utilities, u(s) =

(u(s)1 , ..., u

(s)N ) from an importance density q(u).

• We can then simulate P (a|x, θ, β) as follows:

bP (a|x, θ, β) =PSs=1

⎧⎪⎨⎪⎩X

π∈E(u)λ(π;E(u(s)), β)

³YNi=1π(ai)

´⎫⎪⎬⎪⎭ h(u(s)|θ,x)q(u(s))

• Precompute E(u(s)) for a large number of ran-

domly drawn games s = 1, ..., S.

• Evaluating bP (a|x, θ, β) at new parameters DOESNOT REQUIRE RECOMPUTINGE(u(s)) for new

s = 1, ..., S!

• Evaluating simulation estimator of bP (a|x, θ, β) ofP (a|x, θ, β) only requires “reweighting” of theequilibrium by new λ and

h(u(s)|θ,x)q(u(s))

.

• This is a cheap computation.

• Normally, the computational expense of structuralestimation comes from recomputing the equilib-

rium many times.

• Also, note that this approach is naturally parallel.

• This saves on the computational time by ordersof magnitude.

• Given bP (a|x, θ, β) we can simulate the likelihoodfunction or simulate the moments.

• The asymptotics are standard.

9 Overview of Bayes.

• In Bayesian econometrics, the econometrician actsas a rational decision maker, just like the agents

in economic theory.

• The econometrician starts off with a prior distri-bution p(θ) about the model parameters.

• The econometrician observes some data y = [y1, ..., yn]

• The econometrician has a model, f(y|θ) which isthe probability of observing y conditional on the

parameters θ.

• The econometrician’s posterior probability by BayesTheorem is:

p(θ|y) = p(θ)f(y|θ)Rp(θ)f(y|θ)dθ

• In the last decade, there has been an explosion inthe applications of Bayesian approaches in statis-

tics and econometrics.

• In Markov chain monte carlo, the econometriciansimulates the poterior distribution p(θ|y).

• This involves simulating a markov chain wherethe invariant (or long run) distribution is exactly

equal to the posterior.

• The output of this simulation is a sequence ofpseudo random numbers θ(1), ..., θ(S).

• Let f(θ(1), ..., θ(S)) denote the density that putsweight 1S on each simulation draw.

• Then under suitably regularity conditions

f(θ(1), ..., θ(S))→d p(θ|y)

• This posterior distribuiton expresses the econome-tricians beliefs about the parameters after seeing

the data.

• We could use the posterior to:

1. Construct a 95 percent credible set (i.e. a set

that has 95 percent posterior probability). This

is analogous to a confidence interval.

2. Simulate the distribution of functions of the pa-

rameters, g(θ) as 1SPs g(θ

(s)).

3. Construct a predictive distribution for forecasting,

i.e. simulate the model for each pseudo-random

draw θ(1), ..., θ(S)

• There are a few advantages to the Bayesian ap-proach.

• The first is that it is very elegant numerically.

• There are some problems in latent variable, timeseries and other models that can only be solved

through the use of Bayes.

• The second is that Bayes is exact in finite samples(up to our ability to approximate the posterior).

• Asymptotic theory depended on first order Taylorseries expansion.

• Essentially, you are hoping that the first and sec-ond derivatives capture the behavior of the func-

tion.

• This can be a very poor approximation in finitesamples in some cases.

• In Bayes, such linearizations are not required.

• However, you do need to be able to simulate theposterior accurately.

• Third, Bayes fits into decision theory and can helpto guide rational decision making.

• For example, Rossi et. al. (1997) study the prob-lem of target marketing.

• You see each household in a scanner panel dataset choose a handful of times.

• You can use Bayes’ theorem to update on the

random coefficients of each household i.

• e.g. if there are 10,000 housholds, you have a

posterior for the preferences parameters for each

of the 10,000 household conditional on its pur-

chase history.

• Using this posterior, you can form your posterior

beliefs about the profits from sending a coupon

to an individual household.

• Fourth, in Bayes you can form posteriors over

models.

• Suppose that you have m = 1, ...,M probability

models, fm(y|θ).

• Then f(y|θ) =MX

m=1

pmfm(y|θ) where pm is the

probability of model m.

• You can use Bayes theorem to express your pos-

terior probability pm about model m.

• This is a very elegant way to handle non-nestedmodel and might be superior to classical approaches

to non-nested testing.

• Finally, in Bayesian econometrics, you can workwith models that are not identified or that do not

exhibit normal asymptotics.

• A flat likelihood does not affect your ability to

construct a posterior.

• We shall illustrate Bayesian methods by consider-ing simulation of the multinomial probit.

10 Markov Chains.

• Two common ways to conduct MCMC are Gibbssampling and Metropolis.

• A normal random walk metropolis works as fol-

lows.

• First, the econometrician comes up with a roughguess θ0 at the MLE.

• Second, come up with a rough guess at I0 at theinformation matrix using the hessian of the MLE.

• A sequence of psueorandom values θ(1), ..., θ(S)

is drawn as follows. Given θ(s), we draw θ(s+1)

as follows:

1. First, draw a candidate value eθ ∼ N(θ(s), I0)

2. Second, compute α = min p(eθ)f(y|eθ)p(θ(s))f(y|θ(s))

, 1

3. Set θ(s+1) = eθ with probability α and θ(s+1) =

θ(s) with probability α.

• Implimenting this algorithm simply requires the

econometrician to evaluate the likihood repeat-

edly and draw normal deviates.

• A second algorithm for constructing a Markov

Chain is Gibbs sampling.

• Partition parameters into θ1, ..., θd blocks

• Let pk(θk|θ1, ..., θk−1, θk+1, ..., θd) denote the con-ditional distribution of the kth block of parame-

ters given the others.

• In some applications, this distribution can be con-venient to form even if the entire likelihood is

quite complicated!

• Starting with an initial value θ0, Gibbs samplingworks as follows. Given θ(s)

1. Draw θ(s+1)1 ∼ p1(θ1|θ

(s)2 , θ

(s)2 ..., θ

(s)d )

2. Draw θ(s+1)2 ∼ p2(θ1|θ

(s+1)1 , θ

(s)3 , θ

(s)4 ..., θ

(s)d )

3. Draw θ(s+1)3 ∼ p3(θ1|θ

(s+1)1 , θ

(s+1)2 , θ

(s)4 ..., θ

(s)d )

...

d. Draw θ(s+1)d ∼ pd(θ1|θ

(s+1)1 , θ

(s+1)2 , θ

(s+1)4 ..., θ

(s+1)d−1 )

d+1 Return to 1.

11 Bayesian Analysis of Regression

• MCMC/Gibbs sampling are particularly powerfulin problems with latent variables.

• In discrete choice, utiltity is latent to the econo-metrician.

• In a multinomial probit, if utility was observed bythe econometrician, estimating parameters would

boil down to linear regression.

• For our analysis, it will be useful to consider theBayesian analysis of linear regression.

• A key step in Rossi et. al.’s paper essentially in-volve the Bayesian analysis of a normal linear re-

gression.

• The analysis here follows Geweke’s textbook, Con-temporary Bayesian Econometrics (an excellent

intro to the subject).

• y is a T × 1 vector of dependent variables

• X is a T ×k matrix of covariates (nonstochastic)

• Assume that error terms are normal, homoskedas-tic.

• y|β, h,X ∼ N(Xβ, h−1IT )

p(y|β, h,X) = (2π)−T/2hT/2 exp(−h(y−Xβ)0(y−Xβ)/2))

• h is called the precision parameter (inverse of vari-

ance)

• IT is a scalar covariance matrix

• This is our likelihood function.

• Recall that Bayes theorem implies that the pos-

terior distribution of the model parametes is the

prior times the likelihood.

• We need to specify a prior distribution for ourmodel parameters.

• p(β) = N(β,H−1)

p(β) = (2π)−k/2|H|1/2 exp(−(β − β)0H−1(β − β)))

• The prior on beta is normal with prior mean, βand prior precision H

• The prior distribution on h is s2h ∼ χ2(v)

p(h) =h2v/2Γ(v/2)

i−1(s)v/2h(v−2)/2 exp(−s2h/2)

• Remark- this is essentially a gamma distribution,rewritten in a manner that will be convenient for

reasons below.

• This form for the prior is chosen because of conju-gacy, i.e. the posterior distribution can be written

in an analytically convenient manner.

• Now recall tht the posterior is proportional to theprior time the likelihood.

• That is, p(β, h|X, y) ∝ p(β)p(h)p(y|β, h,X)

• Combining the equations above yields p(β, h|X, y) ∝:

(2π)−T/2hT/2h2v/2Γ(v/2)

i−1 |H|1/2(s)v/2h(T+v−2)/2 exp(−s2h/2)

exp

"−(β − β)0H−1(β − β)/2−h(y −Xβ)0(y −Xβ)/2

#

• Now recall from Cameron and Trivedi, the idea

behind Gibbs sampling is to ”block” the parame-

ters into a set of convenient conditional distribu-

tions.

• In this case, we will want to block p(β|h,X, y)

and p(h|β,X, y)

• Let’s first derive p(β|h,X, y).

• It is obvious from the agove expression, that β isgoing to be normally distributed.

• We will want to complete the the square insidethe exp() to rewrite the expression in the form(β − eβ)0fH−1(β − eβ)

• Then eβ will be the posterior mean and fH theposterior precision

• Distributing terms and completing the square inβ yields that:

(β − β)0H−1(β − β) + h(y −Xβ)0(y −Xβ) =

(β − β)0H−1(β − β) + h(y0 −Xβ)0(y0 −Xβ)

where y0 = fitted ols values

= (β − eβ)0fH−1(β − eβ) +QfH = H + hX 0Xeβ = fH−1(H−1β + hX 0Xb)

where b is the ols estimate of β

Q is a constant that does not depend on β

• Note that the posterior precision is a weightedaverage of the prior precison and the ols estimate

of the precision

• The posterior estimate of eβ involves a weightedaverage of the prior mean and the ols estimate.

• The weights depend on the posterior precision,the prior precision and the ols estimate of the

precision.

• As the sample size becomes sufficiently large, thedata will ”swamp” the prior.

• The number of terms in the likelihood is a func-tion of T and grows with the sample size

• The number of terms in the prior remains fixed.

• Next, we have to derive the posterior in h

• If you look at the prior times the likelihood, itis obvious that the posterior will be of the form

p(h|β,X, y) of hα exp(−hω)

• If we can derive α and ω we can express the pos-terior

• By some straightforward, albeit tedious, algebrawe can write the posterior distribution as p(h|β,X, y):

es2h ∼ χ2(ev)es = s+ (y0 −Xβ)0(y0 −Xβ)ev = v + T

• A Gibbs sampler generates a pseudo-random se-

quence (h(s), β(s))s = 1, ..., S using the following

markov chain,

1. Given (h(s), β(s)), draw β(s+1) ∼ p(β|h(s),X, y)

2. Given β(s+1) draw h(s+1) ∼ p(h|β(s+1),X, y)

3. Return to 1

• This example illustrates the importance of conju-gacy.

• That is, choosing our prior and likelihoods to bethe ”right” functional form greatly simplifies the

analysis of the posterior distribution.

• Standard texts in Bayesian statistics (e.g. Berger/Bernardoand Smith) have appendices that lay out conju-

gate distributions.

12 Bayesian Multinomial Probit

• The multinomial probit is closely related to theanalysis of the normal linear model.

• The multinomial probit is defined as:

yij = xijβ + εij (4)

var(εi1, ..., εiJ) = h−1IJcij = 1yij > yij0 for j

0 6= j

• In the above, yij is the utility of person i for

alternative j

• εij is the stochastic preference shock

• xij are covariates that enter into i’s utility

• cij = 1 if i chooses j

• If the yij were known, then we could use the Gibbssampler above to estimate β and h

• However, the yij are latent variables and thereforewe do data augmentation.

• The idea behind data augmentation is simple— weintegrate out the distribution of the variables that

we do not see.

• Follwing the notation in Cameron and Trivedi, letf(θ|y, y∗) denote the posterior conditional on theobserved variables, y and the latent variables, y∗.

• Let f(y∗|y, θ) denote the distribution of the latentvariable conditional on y and parameters.

• Then the posterior can be written as:

p(θ|y) =Zf(θ|y, y∗)f(y∗|y, θ)dy∗

• Taking account of the latent variable simply in-volves an additional Gibbs step.

• The distribution of the latent utility yijis a tru-

cated normal distribution.

• If cij = 1, yij is a truncated normal with mean pa-rameter β, precision h and lower truncation point

maxyij0, j0 6= j.

• If cij = 0, yij is a truncated normal with mean

parameter β, precision h and upper truncation

point maxyij.

• The Gibbs sampler for the multinomial probit sim-ply adds the data augmentation step above:

• A Gibbs sampler generates a pseudo-random se-

quence (h(s), β(s),½y(s)ij

¾i∈I,j∈J

) s = 1, ..., S us-

ing the following markov chain

1. Given (h(s), β(s)), draw β(s+1) ∼ p(β|h(s),X, y, C)

2. Given β(s+1) draw h(s+1) ∼ p(h|β(s+1),X, y, C)

3. For each I, draw y(s+1)i1 ∼ p(h|β(s+1),X, y

(s)i2 , ..., y

(s)iJ , C)

4. Draw y(s+1)i2 ∼ p(h|β(s+1),X, y

(s+1)i1 , ..., y

(s)iJ , C)

5. ...

6. Draw y(s+1)iJ ∼ p(h|β(s+1),X, y

(s+1)i1 , ..., y

(s+1)iJ−1 , C)

7. Return to 1

1 outline. - university of washingtonfaculty.washington.edu/bajari/metricssp08/8207lecture7.pdf ·...

Documents