chapter 5 discrete choice models

Upload: askazy007

Post on 01-Mar-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Chapter 5 Discrete Choice Models

    1/19

    5Discrete choice models

    Choice models attempt to analyze decision markers preferences amongst alterna-

    tives. Well primarily address the binary case to simplify the illustrations though

    in principle any number of discrete choices can be analyzed. A key is choices are

    mutually exclusive and exhaustive. This framing exercise impacts the interpreta-

    tion of the data.

    5.1 Latent utility index models

    Maximization of expected utility representation implies that two choices a andb involve comparison of expected utilities such that Ua > Ub (the reverse, orthe decision maker is indifferent). However, the analyst typically cannot observe

    all attributes that affect preferences. The functional representation of observable

    attributes affecting preferences,Zi, is often called representative utility. TypicallyUi!=Ziand Ziis linear in the parametersXi!,1

    Ua= Za+"a= Xa!+"a

    Ub= Zb+"b= Xb!+"b

    1

    Discrete response models are of the form

    Pi ! E[Yi|!i] = F(h(Xi, !))

    This is a general specification. F(Xi!)is more common. The key is to employ a transformation (link)function F(X)that has the properties

    F("#) = 0,F(#) = 1, andf(X) !!F(X)!X

    >0.

    77

  • 7/25/2019 Chapter 5 Discrete Choice Models

    2/19

    78 5. Discrete choice models

    where X is observable attributes, characteristics of decision maker, etc. and "represents unobservable (to the analyst) features. Since utility is ordinal, addition

    of a constant to all Zor scaling by # > 0 has no substantive impact and is notidentified for probability models.

    Consequently, !0for eitheraorbis fixed (at zero) and the scale is chosen ($=1, for probit). Hence, the estimated parameters are effectively !/$and reflect thedifferences in contribution to preference Xa" Xb. Even the error distributionis based on differences " = "a" "b (the difference in preferences related to theunobservable attributes). Of course, this is why this is a probability model as some

    attributes are not observed by the analyst. Hence, only probabilistic statements of

    choice can be offered.

    P r (Ua> Ub)

    = P r (Za+"a > Zb+"b)

    = P r (Xa!+"a > Xb!+"b)= P r ("b " "a < Za " Zb)= F!(X!)

    where" = "b " "aand X= Xa "Xb. This reinforces why sometimes the latentutility index model is writtenY! =Ua " Ub= X!" V, where "V ="a " "b.

    Were often interested in the effect of a regressor on choice. Since the model is

    a probability model this translates into the marginal probability effect.The mar-

    ginal probability effect is "F(X#)

    "x =f(X!)!wherexis a row ofX. Hence, the

    marginal effect is proportional (not equal) to the parameter with changing propor-

    tionality over the sample. This is often summarized for the population by reference

    to the sample mean or some other population level reference (see Greene [1997],

    ch. 21 for more details).

    5.2 Linear probability models

    Linear probability models

    Y =X!+"

    where Y# {0, 1}(in the binary case), are not really probability models as the pre-dicted values are not bounded between0and1. But they are sometimes employedfor exploratory analysis or to identify relative starting values for MLE.

    5.3 Logit (logistic regression) models

    As the first demonstrated random utility model (RUM)consistent with expected

    utility maximizing behavior (Marschak [1960]) logit is the most popular dis-

    crete choice model. Standard logit models assume independence of irrelevant al-

    ternatives (IIA) (Luce [1959]) which can simplify experimentation but can also be

  • 7/25/2019 Chapter 5 Discrete Choice Models

    3/19

    5.3 Logit (logistic regression) models 79

    unduly restrictive and produce perverse interpretations.2 Well explore the emer-

    gence of this property when we discuss multinomial and conditional logit.

    A variety of closely related logit models are employed in practice. Interpretation

    of these models can be subtle and holds the key to their distinctions. A few popular

    variations are discussed below.

    5.3.1 Binary logit

    The logit model employs the latent utility index model where " is extreme value

    distributed.

    Y! =Ua " Ub= X!+"where X= Xa"Xb, "= "b""a. The logit model can be derived by assum-

    ing the log of the odds ratio equals an index function Xt!. That is, log Pt1"Pt

    =Xt!wherePt = E[Yt|!t] =F(Xt!)and !t is the information set available at

    t.First, the logistic (cumulative distribution) function is

    " (X) $ !1 +e"X""1=

    eX

    1 +eX

    It has first derivative or density function

    # (X) $ eX

    (1 +eX)2

    = " (X)" ("X)

    Solving the log-odds ratio forPtyieldsPt

    1" Pt= exp (Xt!)

    Pt = exp (Xt!)

    1 +exp (Xt!)

    = [1 +exp ("Xt!)]"1= " (Xt!)

    Notice if the regressors are all binary the log-odds ratio provides a particularly

    straightforward and simple method for estimating the parameters. This also points

    out a difficulty with estimation that sometimes occurs if we encounter a perfect

    classifier. For a perfect classifier, there is some range of the regressor(s) for which

    Yt is always 1 or0 (a separating hyperplane exists). Since ! is not identifiable(over a compact parameter space), we cannot obtain sensible estimates for !as

    any sensible optimization approach will try to choose !arbitrarily large or small.

    2The connection between discrete choice models and RUMis reviewed in McFadden [1981,2001].

  • 7/25/2019 Chapter 5 Discrete Choice Models

    4/19

    80 5. Discrete choice models

    5.3.2 Multinomial logit

    Multinomial logit is a natural extension of binary logit designed to handle J+1al-ternatives (J= 1 produces binary logit).3 The probability of observing alternativekis

    Pr (Yt= k) = Pkt=exp

    #Xt!

    k$

    J%j=0

    exp!

    Xt!j"

    with parameter vector!0 = 0.4 Notice the vector of regressors remains constantbut additional parameter vectors are added for each additional alternative. It may

    be useful to think of the regressors as individual-specific characteristics rather than

    attributes of specific-alternatives. This is a key difference between multinomial

    and conditional logit (see the discussion below). For multinomial logit, we have

    PktPlt

    =exp

    #Xt!

    k$

    exp#

    Xt!l$ =exp #Xt

    #!k " !l

    $$

    That is, the odds of two alternatives depend on the regressors and the difference

    in their parameter vectors. Notice the odds ratio does not depend on other alterna-

    tives; henceIIAapplies.

    5.3.3 Conditional logit

    The conditional logit model deals with Jalternatives where utility for alternativekis

    Y!k =Xk!+"k

    where"k isiidGumbel distributed. The Gumbel distribution has density function

    f(") = exp("")exp("e"!) and distribution functionexp ("e"!). The proba-bility that alternativek is chosen is theP r (Uk > Uj)forj!=k which is5

    Pr (Yt= k) = Pkt= exp (Xkt!)J%j=1

    exp (Xjt!)

    3Multinomial choice models can represent unordered or ordered choices. For simplicity, we focus

    on the unordered variety.4Hence,P0t =

    1

    1+

    J

    !j=1

    exp(xt"j)

    .

    5See Train [2003, section 3.10] for a derivation. The key is to rewrite Pr (Uk > Uj) asP r ("j

  • 7/25/2019 Chapter 5 Discrete Choice Models

    5/19

    5.3 Logit (logistic regression) models 81

    Notice a vector of regressors is associated with each alternative ( J vectors ofregressors) and one parameter vector for the model. It may be useful to think of

    the conditional logit regressors as attributes associated with specific-alternatives.

    The odds ratio for conditional logit

    PktPlt

    =exp (Xkt!)

    exp (Xlt!) =exp (Xkt "Xlt)!

    is again independent of other alternatives; henceIIA.IIAarises as a result of prob-

    ability assignment to the unobservable component of utility. Next, we explore a

    variety of models that relax these restrictions in some manner.

    5.3.4 GEV (generalized extreme value) models

    GEVmodels are extreme valued choice models that seek to relax the IIAassump-

    tion of conditional logit. McFadden [1978] developed a process to generate GEVmodels.6 This process allows researchers to develop newGEVmodels that best fit

    the choice situation at hand.

    LetYj$ exp(Zj), whereZj is the observable part of utility associated withchoicej. G = G (Yj , . . . , Y J) is a function that depends on Yj for all j . IfGsatisfies the properties below, then

    Pi =YiGi

    G =

    exp(Zj) "G"Yi

    G

    whereGi = "G"Yi

    .

    Condition 5.1 G % 0for all positive values ofYj .Condition 5.2 Gis homogeneous of degree one.7

    Condition 5.3 G &' asYj& ' for anyj.Condition 5.4 The cross partial derivatives alternate in signs as follows:Gi ="G"Yi

    % 0,Gij = "2G

    "Yi"Yj( 0,Gijk = "

    3G"Yi"Yj"Yk

    % 0for i,j, andk distinct, andso on.

    These conditions are not economically intuitive but its straightforward to connect

    the ideas to some standard logit and GEVmodels as depicted in table 5.1 (person

    nis suppressed in the probability descriptions).

    5.3.5 Nested logit models

    Nested logit models relax IIAin a particular way. Suppose a decision maker faces

    a set of alternatives that can be partitioned into subsets or nestssuch that

    6See Train [2003] section 4.6.7G (#Y1, . . .#YJ) = #G (Y1, . . . Y J). Ben-Akiva and Francois [1983] show this condition can

    be relaxed. For simplicity, its maintained for purposes of the present discussion.

  • 7/25/2019 Chapter 5 Discrete Choice Models

    6/19

    82 5. Discrete choice models

    Table 5.1: Variations of multinomial logits

    model G Pi

    ML

    J!j=0

    Yj , Y0 = 1 exp(Zi)

    J!j=0

    exp(Zj)

    CL

    J!j=1

    Yjexp(Zi)J!j=1

    exp(Zj)

    NL

    K!k=1

    #$ !j!Bk

    Y1/#kj

    %k

    where0"#k"1

    exp(Zi/#k)

    !"#!j!Bk

    exp[Zj/#k]

    $%&!k"1

    K!"=1

    !"#!j!B"

    exp[Zj/#"]

    $%&!"

    GNLK!k=1

    #$ !j!Bk

    '$jkYj

    (1/#k%k

    $jk#0and!k

    $jk=1$j

    !k (

    $ikeZi

    )1/!k

    !"#!j!Bk

    '$jke

    Zj (1/!k$%&!k"1

    K!"=1

    !"#!j!B"

    ($j"exp[Zj ])1/!"

    $%&!"

    Zj = X!j for multinomial logitML

    butZj = Xj! for conditional logitCL

    NLrefers to nested logit withJ alternatives

    inKnestsB1, . . . , BKGNLrefers to generalized nested logit

    Condition 5.5 IIA holds within each nest. That is, the ratio of probabilities for

    any two alternatives in the same nest is independent of other alternatives.

    Condition 5.6 IIA does not hold for alternatives in different nests. That is, the

    ratio of probabilities for any two alternatives in different nests can depend on

    attributes of other alternatives.

    The nested logit probability can be decomposed into a marginal and conditional

    probability

    Pi= Pi|BkPBk

    where the conditional probability of choosing alternative i given that an alternative

    in nestBk is chosen is

    Pi|Bk =exp

    #Zi$k

    $%j#Bk

    exp#Zj$k

    $

  • 7/25/2019 Chapter 5 Discrete Choice Models

    7/19

    5.3 Logit (logistic regression) models 83

    and the marginal probability of choosing an alternative in nest Bk is

    PBk =

    &'%j#Bk

    exp(Zj$k

    )*+$k

    K%%=1

    &'%j#B"

    exp(Zj$"

    )*+$"

    which can be rewritten (see Train [2003, ch. 4, p. 90])

    =eWk

    #,j#Bk

    exp(Zj$k

    )$$kK

    %%=1

    eW"

    #,j#B"exp

    (Zj$" )$

    $"

    where observable utility is decomposed as Vnj = Wnk + Znj, Wnk dependsonly on variables that describe nestk (variation over nests but not over alterna-tives within each nest) and Znj depends on variables that describe alternative j(variation over alternatives within nests) for individualn.

    The parameter #k indicates the degree of independence in unobserved utility

    among alternatives in nest Bk. The level of independence or correlation can varyacross nests. If#k = 1for all nests, there is independence among all alternativesin all nests and the nested logit reduces to a standard logit model. An example

    seems appropriate.

    Example 5.1 For simplicity we consider two nestsk#{A, B}and two alterna-tives within each nestj# {a, b}and a single variable to differentiate each of thevarious choices.

    8

    The latent utility index is

    Ukj =Wk!1+Xkj!2+"

    where " =)

    %k&k +)

    1" %k&j , k != j, and& has a Gumbel (type I extremevalue) distribution. This implies the lead term captures dependence within a nest.

    Samples of1, 000observations are drawn with parameters!1 = 1, !2 = "1, and%A= %B = 0.5for1, 000simulations withWkandXkj drawn from independentuniform(0, 1) distributions. Observables are defined asYka = 1 ifUka > Ukband0 otherwise, andYA = 1 ifmax {UAa, UAb} > max {UBa, UBb} and0otherwise. Now, the log-likelihood can be written as

    L =%

    YAYAalog (PAa) +YA(1" YAa)log(PAb)

    +YBYBalog (PBa) +YBYBalog (PBb)where Pkj is as defined the table above. Results are reported in tables 5.2 and 5.3.

    8There is no intercept in this model as the intercept is unidentified.

  • 7/25/2019 Chapter 5 Discrete Choice Models

    8/19

    84 5. Discrete choice models

    The parameters are effectively recovered with nested logit. However, for condi-

    tional logit recovery is poorer. Suppose we vary the correlation in the within nest

    errors (unobserved components). Tables 5.4 and 5.5 report comparative results

    with low correlation (%A = %B = 0.01) within the unobservable portions of thenests. As expected, conditional logit performs well in this setting.

    Suppose we try high correlation (%A=%B = 0.99) within the unobservable por-tions of the nests. Table 5.6 reports nested logit results. As indicated in table

    5.7 conditional logit performs poorly in this setting as the proportional relation

    between the parameters is substantially distorted.

    5.3.6 Generalizations

    Generalized nested logit models involve nests of alternatives where each alterna-

    tive can be a member of more than one nest. Their membership is determined by

    an allocation parameter'jk which is non-negative and sums to one over the nestsfor any alternative. The degree of independence among alternatives is determined,

    as in nested logit, by parameter #k. Higher #k means greater independence and

    less correlation. Interpretation ofGNL models is facilitated by decomposition of

    the probability.

    Pi= Pi|BkPk

    wherePk is the marginal probability of nest k

    %j#Bk

    ('jkexp [Zj ])1!k

    K%%=1

    &'%j#B"

    ('j%exp [Zj])

    1!"*+

    $"

    andPi|Bk is the conditional probability of alternativeigiven nestk

    ('jkexp [Zj ])1!k%

    j#B"

    ('jkexp [Zj ])1!k

    Table 5.2: Nested logit with moderate correlation

    !1 !2

    #A #B

    mean 0.952 "0.949 0.683 0.677std. dev. 0.166 0.220 0.182 0.185

    (.01, .99)quantiles

    (0.58, 1.33) ("1.47,"0.46) (0.29, 1.13) (0.30, 1.17)!1 = 1,!2 = "1, %A= %B = 0.5

  • 7/25/2019 Chapter 5 Discrete Choice Models

    9/19

    5.3 Logit (logistic regression) models 85

    Table 5.3: Conditional logit with moderate correlation

    !1 !2mean 0.964 "1.253

    std. dev. 0.168 0.131(.01, .99)quantiles

    (0.58, 1.36) ("1.57,"0.94)!1 = 1,!2 = "1, %A= %B = 0.5

    Table 5.4: Nested logit with low correlation

    !1 !2

    #A #Bmean 0.994 "0.995 1.015 1.014

    std. dev. 0.167 0.236 0.193 0.195

    (.01, .99)quantiles

    (0.56, 1.33) ("1.52,"0.40) (0.27, 1.16) (0.27, 1.13)!1 = 1,!2 = "1, %A= %B = 0.01

    Table 5.5: Conditional logit with low correlation

    !1 !2

    mean 0.993 "1.004std. dev. 0.168 0.132

    (.01, .99)quantiles

    (0.58, 1.40) ("1.30,"0.72)!1 = 1,!2 = "1, %A= %B = 0.01

    Table 5.6: Nested logit with high correlation

    !1 !2

    #A #Bmean 0.998 "1.006 0.100 0.101

    std. dev. 0.167 0.206 0.023 0.023(.01, .99)quantiles

    (0.62, 1.40) ("1.51,"0.54) (0.05, 0.16) (0.05, 0.16)!1 = 1,!2 = "1, %A= %B = 0.99

    Table 5.7: Conditional logit with high correlation

    !1 !2

    mean 1.210 "3.582std. dev. 0.212 0.172(.01, .99)quantiles

    (0.73, 1.71) ("4.00,"3.21)!1 = 1,!2 = "1, %A= %B = 0.99

  • 7/25/2019 Chapter 5 Discrete Choice Models

    10/19

    86 5. Discrete choice models

    5.4 Probit models

    Probit models involve weaker restrictions from a utility interpretation perspective

    (noIIAconditions) than logit. Probit models assume the same sort of latent utility

    index form except that V is assigned a normal or Gaussian probability distribu-tion. Some circumstances might argue that normality is an unduly restrictive or

    logically inconsistent mapping of unobservables into preferences.

    A derivation of the latent utility probability model is as follows.

    P r (Yt= 1) = P r (Y!

    t >0)

    = P r (Xt!+Vt> 0)

    = 1" P r (Vt( "Xt!)

    For symmetric distributions, like the normal (and logistic),F(

    "X) = 1

    "F(X)

    whereF()refers to the cumulative distribution function. Hence,

    P r (Yt= 1) = 1" P r (Vt( "Xt!)= 1" F("Xt!) = F(Xt!)

    Briefly, first order conditions associated with maximization of the log-likelihood

    (L) for the binary case are

    (L

    (!j=

    n%t=1

    Yt) (Xt!)

    # (Xt!)Xjt+ (1 " Yt) ") (Xt!)

    1"# (Xt!) Xjt

    where) ()and # ()refer to the standard normal density and cumulative distrib-

    ution functions, respectively, and scale is normalized to unity.

    9

    Also, the marginalprobability effects associated with the regressors are

    (pt(Xjt

    =) (Xt!)!j

    5.4.1 Conditionally-heteroskedastic probit

    Discrete choice model specification may be sensitive to changes in variance of

    the unobservable component of expected utility (see Horowitz [1991] and Greene

    [1997]). Even though choice models are normalized as scale cannot be identified,

    parameter estimates (and marginal probability effects) can be sensitive to changes

    in the variability of the stochastic component as a function of the level of re-

    gressors. In other words, parameter estimates (and marginal probability effects)can be sensitive to conditional-heteroskedasticity. Hence, it may be useful to con-

    sider a model specification check for conditional-heteroskedasticity. Davidson and

    9See chapter 4 for a more detailed discussion of maximum likelihood estimation.

  • 7/25/2019 Chapter 5 Discrete Choice Models

    11/19

    5.4 Probit models 87

    MacKinnon [1993] suggest a standard (restricted vs. unrestricted) likelihood ra-

    tio test where the restricted model assumes homoskedasticity and the unrestricted

    assumes conditional-heteroskedasticity.

    Suppose we relax the latent utility specification of a standard probit model by

    allowing conditional heteroskedasticity.

    Y! =Z+"

    where" * N(0, exp (2W*)). In a probit frame, the model involves rescaling theindex function by the conditional standard deviation

    pt= Pr (Yt= 1) = #

    - Xt!

    exp [Wt*]

    .

    where the conditional standard deviation is given by exp [Wt*] and W refers tosome rankqsubset of the regressorsX(for notational convenience, subscripts arematched so that

    Xj = Wj) and of course, cannot include an intercept (recall the

    scale or variance is not identifiable in discrete choice models).10

    Estimation and identification of marginal probability effects of regressors pro-

    ceed as usual but the expressions are more complex and convergence of the like-

    lihood function is more delicate. Briefly, first order conditions associated with

    maximization of the log-likelihood (L) for the binary case are

    (L

    (!j=

    n%t=1

    Yt)#

    Xt#exp[Wt&]

    $

    #

    # Xt#exp[Wt&]

    $ Xjtexp [Wt*]

    + (1" Yt)")

    # Xt#exp[Wt&]

    $

    1" ##

    Xt#exp[Wt&]

    $ Xjtexp [Wt*]

    where) ()and # ()refer to the standard normal density and cumulative distrib-ution functions, respectively. Also, the marginal probability effects are 11

    (pt(Wjt =)

    - Xt!exp [Wt*]

    .-!j"Xt!*j

    exp [Wt*].

    5.4.2 Artificial regression specification test

    Davidson and MacKinnon [2003] suggest a simple specification test for conditional-

    heteroskedasticity. As this is not restricted to a probit model, well explore a gen-

    eral link functionF(). In particular, a test of*= 0, implying homoskedasticity

    10Discrete choice models are inherently conditionally-heteroskedastic as a function of the regressors

    (MacKinnon and Davidson [1993]). Consider the binary case, the binomial setup produces variance

    equal to pj(1 " pj) where pj is a function of the regressors X. Hence, the heteroskedastic probitmodel enriches the error (unobserved utility component).

    11Because of the second term, the marginal effects are not proportional to the parameter estimates

    as in the standard discrete choice model. Rather, the sign of the marginal effect may be opposite thatof the parameter estimate. Of course, if heteroskedasticity is a function of some variable not included

    as a regressor the marginal effects are simpler

    %pt

    %Wjt=&

    ) Xt!

    exp [Wt']

    *) !j

    exp [Wt']

    *

  • 7/25/2019 Chapter 5 Discrete Choice Models

    12/19

    88 5. Discrete choice models

    withexp [Wt*] = 1, in the following artificial regression (see chapter 4 to reviewartificial regression)

    V"

    12

    t

    #Yt " Ft

    $= V

    "12

    t ftXtb" V"12

    t ftXt!Wtc +residual

    where Vt = Ft

    #1" Ft

    $, Ft = F

    #Xt!

    $, ft = f

    #Xt!

    $, and !is the ML es-

    timate under the hypothesis that * = 0. The test for heteroskedasticity is basedon the explained sum of squares(ESS)for the above BRMRwhich is asymptot-ically distributed+2 (q) under*= 0. That is, under the null, neither term offersany explanatory power.

    Lets explore the foundations of this test. The nonlinear regression for a discrete

    choice model is

    Yt= F(Xt!) +,t

    where,thave zero mean, by construction, and variance

    E/,2t0

    = E(

    (Yt " F(Xt!))2)

    =F(Xt!) (1" F(Xt!))2 + (1" F(Xt!))(0" F(Xt!))2

    =F(Xt!) (1" F(Xt!))The simplicity, of course, is due to the binary nature of Yt. Hence, even though thelatent utility index representation here is homoskedastic, the nonlinear regression

    is heteroskedastic.12

    The Gauss-Newton regression (GNR) that corresponds to the above nonlinear

    regression is

    Yt " F(Xt!) = f(Xt!) Xtb+residualas the estimate corresponds to the updating term,"H(j"1)g(j"1), in Newtonsmethod (see chapter 4)

    b=!

    XTf2 (X!) X""1

    XTf(X!) (y " F(X!))

    where f(X!) is a diagonal matrix. The artificial regression for binary responsemodels (BRMR) is the above GNR after accounting for heteroskedasticity noted

    above. That is,

    V"

    12

    t (Yt " F(Xt!)) = V"12

    t f(Xt!) Xtb+residual

    whereVt = F(Xt!) (1"

    F(Xt!)). The artificial regression used for specifica-tion testing reduces to this BRMRwhen*= 0and thereforec= 0.

    12The nonlinear regression for the binary choice problem could be estimated via iteratively

    reweighted nonlinear least squares using Newtons method (see chapter 4). Below we explore an alter-

    native approach that is usually simpler and computationally faster.

  • 7/25/2019 Chapter 5 Discrete Choice Models

    13/19

    5.4 Probit models 89

    The artificial regression used for specification testing follows simply from using

    the development in chapter 4 which says asymptotically the change in estimate via

    Newtons method is

    "H(j)g(j) a=#

    XT(j)X(j)

    $"1XT(j)

    !y " x(j)

    "

    Now, for the binary response model recall x(j)= F# X#(j)exp(W&)

    $and, after conve-

    niently partitioning, the matrix of partial derivatives

    XT(j)=

    1 "F

    ) X#(j)exp(W$)

    *

    "#(j)

    F

    ) X#(j)exp(W$)

    *

    "&

    2

    Under the hypothesis*= 0,

    XT(j) = ( f#X!(j)$X "f#X!(j)$X!(j)Z )

    Replace!(j)by theML estimate for!and recognize a simple way to compute

    #XT(j)X(j)

    $"1XT(j)

    !Y" x(j)

    "is via artificial regression. After rescaling for heteroskedasticity, we have the arti-

    ficial regression used for specification testing

    V"

    12

    t

    #Yt " Ft

    $= V

    "12

    t ftXtb" V"12

    t ftXt!Wtc +residual

    See Davidson and MacKinnon [2003, ch. 11] for additional discrete choice model

    specification tests. Its time to explore an example.

    Example 5.2 Suppose the DGP is heteroskedastic

    Y! =X1 "X2+"," * N#

    0, (exp(X1))2$

    13

    Y! is unobservable butY = 1 ifY! > 0 andY = 0 otherwise is observed.Further,x1 andx2 are both uniformly distributed between(0, 1)and the samplesizen= 1, 000. First, we report standard (assumed homoskedastic) binary probitresults based on1, 000simulations in table 5.8. Though the parameter estimatesremain proportional, they are biased towards zero.

    Lets explore some variations of the above BRMR specification test. Hence, as re-

    ported in table 5.9, the appropriately specified heteroskedastic test has reasonable

    power. Better than50%of the simulations produce evidence of misspecification atthe80%confidence level.14

    Now, leave the DGP unaltered but suppose we suspect the variance changes due

    to another variable, sayX2. Misidentification of the source of heteroskedasticity

    13To be clear, the second parameter is the variance so that the standard deviation is exp (X1).14As this is a specification test, a conservative approach is to consider a lower level (say, 80% vs.

    95%) for our confidence intervals.

  • 7/25/2019 Chapter 5 Discrete Choice Models

    14/19

    90 5. Discrete choice models

    Table 5.8: Homoskedastic probit results with heteroskedastic DGP

    !0 !1 !2mean "0.055 0.647 "0.634

    std. dev. 0.103 0.139 0.140(0.01, 0.99)

    quantiles ("0.29, 0.19) (0.33, 0.98) ("0.95,"0.31)

    Table 5.9: BRMR specification test 1 with heteroskedastic DGP

    b0 b1 b2mean 0.045 0.261 "0.300

    std. dev. 0.050 0.165 0.179(0.01, 0.99)

    quantiles ("0.05, 0.18) ("0.08, 0.69) ("0.75, 0.08)

    c1 ES S('2(1)prob)

    mean 0.972 3.721(0.946)

    std. dev. 0.592 3.467(0.01, 0.99)

    quantiles ("0.35, 2.53) (0.0, 14.8)

    V"

    12

    t

    #Yt " Ft

    $= V

    "12

    t ftXtb" V"12

    t ftXt!X1tc1+residual

    substantially reduces the power of the test as depicted in table 5.10. Only between

    25%and50%of the simulations produce evidence of misspecification at the80%confidence level.

    Next, we explore changing variance as a function of both regressors. As demon-

    strated in table 5.11, the power is comprised relative to the proper specification.

    Although better than50%of the simulations produce evidence of misspecificationat the80% confidence level. However, one might be inclined to dropX2c2 andre-estimate.

    Assuming the evidence is against homoskedasticity, we next report simulations

    for the heteroskedastic probit with standard deviationexp (X1*). As reported intable 5.12, on average, ML recovers the parameters of a properly-specified het-

    eroskedastic probit quite effectively. Of course, we may incorrectly conclude the

    data are homoskedastic.

    Now, we explore the extent to which the specification test is inclined to indicate

    heteroskedasticity when the model is homoskedastic. Everything remains the same

    except the DGP is homoskedastic

    Y! =X1 "X2+"," * N(0, 1)

    As before, we first report standard (assumed homoskedastic) binary probit results

    based on1, 000 simulations in table 5.13. On average, the parameter estimatesare recovered effectively.

  • 7/25/2019 Chapter 5 Discrete Choice Models

    15/19

    5.4 Probit models 91

    Table 5.10: BRMR specification test 2 with heteroskedastic DGP

    b0 b1 b2mean 0.029 "0.161 0.177

    std. dev. 0.044 0.186 0.212(0.01, 0.99)

    quantiles ("0.05, 0.16) ("0.65, 0.23) ("0.32, 0.73)

    c2 ES S('2(1)prob)

    mean "0.502 1.757(0.815)

    std. dev. 0.587 2.334(0.01, 0.99)

    quantiles ("1.86, 0.85) (0.0, 10.8)

    V"

    12

    t

    #yt " Ft

    $= V

    "12

    t ftXtb" V"12

    t ftXt!X2tc2+residual

    Table 5.11: BRMR specification test 3 with heteroskedastic DGP

    b0 b1 b2mean 0.050 0.255 !0.306

    std. dev. 0.064 0.347 0.416(0.01, 0.99)

    quantiles (!0.11, 0.23) (!0.56, 1.04) (!1.33, 0.71)

    c1 c2 ESS(%2(2)prob)

    mean 0.973 0.001 4.754(0.907)

    std. dev. 0.703 0.696 3.780(0.01, 0.99)

    quantiles

    (!0.55, 2.87) (!1.69, 1.59) (0.06, 15.6)

    V% 12

    t

    !yt ! Ft

    "= V

    % 12

    t ftXtb! V

    % 12

    t ftXt!

    # X1t X2t

    $ % c1c2

    &+residual

    Next, we explore some variations of the BRMR specification test. Based on table

    5.14, the appropriately specified heteroskedastic test seems, on average, resistant

    to rejecting the null (when it should not reject). Fewer than25%of the simulationsproduce evidence of misspecification at the80%confidence level.Now, leave the DGP unaltered but suppose we suspect the variance changes due to

    another variable, say X2. Even though the source of heteroskedasticity is misiden-tified, the test reported in table 5.15 produces similar results, on average. Again,

    fewer than 25% of the simulations produce evidence of misspecification at the80%confidence level.Next, we explore changing variance as a function of both regressors. Again, we

    find similar specification results, on average, as reported in table 5.16. That is,

    fewer than 25% of the simulations produce evidence of misspecification at the80% confidence level. Finally, assuming the evidence is against homoskedastic-ity, we next report simulations for the heteroskedastic probit with standard devia-

  • 7/25/2019 Chapter 5 Discrete Choice Models

    16/19

    92 5. Discrete choice models

    Table 5.12: Heteroskedastic probit results with heteroskedastic DGP

    !0 !1 !2 "mean 0.012 1.048 !1.048 1.0756

    std. dev. 0.161 0.394 0.358 0.922(0.01, 0.99)

    quantiles (!0.33, 0.41) (0.44, 2.14) (!2.0,!0.40) (!0.30, 2.82)

    Table 5.13: Homoskedastic probit results with homoskedastic DGP

    !0 !1 !2

    mean "0.002 1.005 "0.999std. dev. 0.110 0.147 0.148

    (0.01, 0.99)quantiles

    ("0.24, 0.25) (0.67, 1.32) ("1.35,"0.67)

    tionexp (X1*). On average, table 5.17 results support the homoskedastic choicemodel as*is near zero. Of course, the risk remains that we may incorrectly con-clude that the data are heteroskedastic.

    5.5 Robust choice models

    A few robust (relaxed distribution or link function) discrete choice models are

    briefly discussed next.

    5.5.1 Mixed logit models

    Mixed logit is a highly flexible model that can approximate any random utility.

    Unlike logit, it allows random taste variation, unrestricted substitution patterns,

    and correlation in unobserved factors (over time). Unlike probit, it is not restricted

    to normal distributions.

    Mixed logit probabilities are integrals of standard logit probabilities over a den-

    sity of parameters. P =3

    L (!) f(!) d! where L (!) is the logit probabilityevaluated at!andf(!)is a density function. In other words, the mixed logit isa weighted average of the logit formula evaluated at different values of! with

    weights given by the densityf(!). The mixing distributionf(!)can be discreteor continuous.

    5.5.2 Semiparametric single index discrete choice modelsAnother "robust" choice model draws on kernel density-based regression (see

    chapter 6 for more details). In particular, the density-weighted average deriva-

    tive estimator from an index function yields E[Yt|Xt] = G (Xtb)whereG ()issome general nonparametric function. As Stoker [1991] suggests the bandwidth

  • 7/25/2019 Chapter 5 Discrete Choice Models

    17/19

    5.5 Robust choice models 93

    Table 5.14: BRMR specification test 1 with homoskedastic DGP

    b0 b1 b2mean "0.001 "0.007 0.007

    std. dev. 0.032 0.193 0.190(0.01, 0.99)

    quantiles ("0.09, 0.08) ("0.46, 0.45) ("0.44, 0.45)

    c1 ES S('2(1)prob)

    mean "0.013 0.983(0.679)

    std. dev. 0.382 1.338(0.01, 0.99)

    quantiles ("0.87, 0.86) (0.0, 6.4)

    V"

    12

    t

    #Yt " Ft

    $= V

    "12

    t ftXtb" V"12

    t ftXt!X1tc1+residual

    Table 5.15: BRMR specification test 2 with homoskedastic DGP

    b0 b1 b2mean "0.001 0.009 "0.010

    std. dev. 0.032 0.187 0.187(0.01, 0.99)

    quantiles ("0.08, 0.09) ("0.42, 0.44) ("0.46, 0.40)

    c2 ES S('2(1)prob)

    mean 0.018 0.947(0.669)

    std. dev. 0.377 1.362(0.01, 0.99)

    quantiles ("0.85, 0.91) (0.0, 6.6)V"

    12

    t

    #yt " Ft

    $= V

    "12

    t ftXtb" V"

    12

    t ftXt!X2tc2+residual

    is chosen based on "critical smoothing." Critical smoothing refers to selecting the

    bandwidth as near the "optimal" bandwidth as possible such that monotonicity of

    probability in the index function is satisfied. Otherwise, the estimated "density"

    function involves negative values.

    5.5.3 Nonparametric discrete choice models

    The nonparametric kernel density regression model can be employed to estimatevery general (without index restrictions) probability models (see chapter 6 for

    more details).

    E[Yt|Xt] = m (Xt)

  • 7/25/2019 Chapter 5 Discrete Choice Models

    18/19

    94 5. Discrete choice models

    Table 5.16: BRMR specification test 3 with homoskedastic DGP

    b0 b1 b2mean !0.000 0.004 !0.006

    std. dev. 0.045 0.391 0.382(0.01, 0.99)

    quantiles (!0.11, 0.12) (!0.91, 0.93) (!0.91, 0.86)

    c1 c2 ESS(%2(2)prob)

    mean -0.006 0.015 1.943(0.621)

    std. dev. 0.450 0.696 3.443(0.01, 0.99)

    quantiles (!1.10, 1.10) (!0.98, 1.06) (0.03, 8.2)

    V% 12

    t

    !yt ! Ft

    "= V

    % 12

    t ftXtb! V

    % 12

    t ftXt!

    # X1t X2t

    $ % c1c2

    &+residual

    Table 5.17: Heteroskedastic probit results with homoskedastic DGP

    !0 !1 !2 "mean 0.001 1.008 !1.009 !0.003

    std. dev. 0.124 0.454 0.259 0.521(0.01, 0.99)

    quantiles (!0.24, 0.29) (0.55, 1.67) (!1.69,!0.50) (!0.97, 0.90)

    5.6 Tobit (censored regression) models

    Sometimes the dependent variable is censored at a value (we assume zero for

    simplicity).

    Y!t =Xt!+"t

    where"* N!0,$2I" and Yt = Y!t ifY!t > 0 and Yt = 0otherwise. Then wehave a mixture of discrete and continuous outcomes. Tobin [1958] proposed writ-

    ing the likelihood function as a combination of a discrete choice model (binomial

    likelihood) and standard regression (normal likelihood), then estimating !and$

    via maximum likelihood. The log-likelihood is

    %yt=0

    log#

    -"Xt!

    $

    .+%yt>0

    log

    11

    $)

    -Yt "Xt!

    $

    .2

    where, as usual,) () ,# ()are the unit normal density and distribution functions,respectively.

    5.7 Bayesian data augmentation

    Albert and Chibs [1993] idea is to treat the latent variable Y! as missing data anduse Bayesian analysis to estimate the missing data. Typically, Bayesian analysis

  • 7/25/2019 Chapter 5 Discrete Choice Models

    19/19

    5.8 Additional reading 95

    draws inferences by sampling from the posterior distribution p (-|Y). However,the marginal posterior distribution in the discrete choice setting is often not recog-

    nizable though the conditional posterior distributions may be. In this case, Markov

    chain Monte Carlo (McMC) methods, in particular a Gibbs sampler, can be em-

    ployed (see chapter 7 for more details onMcMCand the Gibbs sampler).

    Albert and Chib use a Gibbs sampler to estimate a Bayesian probit.

    p (!|Y, X, Y!) * N#

    b1,!

    Q"1 +XTX""1$

    where

    b1 =!

    Q"1 +XTX" !

    "1Q"1b0+XTXb

    "b = !X

    TX""1

    XTY

    b0 = prior means for !and Q =!

    XT0X0""1

    is the prior on the covariance.15 The

    conditional posterior distributions for the latent utility index are

    p (Y!|Y = 1, X, !) * N!XT!, I|Y! >0"p (Y!|Y = 0, X, !) * N!XT!, I|Y! ( 0"

    For the latter two, random draws from a truncated normal (truncated from below

    for the first and truncated from above for the second) are employed.

    5.8 Additional reading

    There is an extensive literature addressing discrete choice models. Some favorite

    references are Train [2003], and McFaddens [2001] Nobel lecture. Connections

    between discrete choice models, nonlinear regression, and related specification

    tests are developed by Davidson and MacKinnon [1993, 2003]. Coslett [1981]

    discusses efficient estimation of discrete choice models with emphasis on choice-

    based sampling. Mullahy [1997] discusses instrumental variable estimation of

    count data models.

    15Bayesian inference works as if we have data from the prior period {Y0, X0}as well as from the

    sample period{Y, X}from which! is estimated (b0 ='

    XT0 X0(%1

    XT0 Y0 as if taken from prior

    sample{X0, Y0}; see Poirier [1995], p. 527) . Applying OLSyields

    b1 = (XT0 X0+ X

    TX)%1(XT0 Y0+ XTY)

    = (Q%1 + XTX)%1(Q%1b0+ XTXb)

    sinceQ%1 = (XT0 X0), XT0 Y0 = Q

    %1b0, andXTY = XTXb.