principles of econometrics, 4t h editionpage 1 chapter 16: qualitative and limited dependent...

Principles of Econometrics, 4th Edition

Chapter 16: Qualitative and Limited Dependent

Variable Models

Chapter 16Qualitative and Limited

Dependent Variable Models

Walter R. Paczkowski Rutgers University



Variable Models

16.1 Models with Binary Dependent Variables16.2 The Logit Model for Binary Choice16.3 Multinomial Logit16.4 Conditional Logit16.5 Ordered Choice Models16.6 Models for Count Data16.7 Limited Dependent Variables

Chapter Contents



Variable Models

In this chapter, we:– Examine models that are used to describe

choice behavior, and which do not have the usual continuous dependent variable

– Introduce a class of models with dependent variables that are limited• They are continuous, but that their range of

values is constrained in some way, and their values not completely observable• Alternatives to least squares estimation are

needed since the least squares estimator is biased and inconsistent



Variable Models

16.1

Models with Binary Dependent Variables



Variable Models

Many of the choices that individuals and firms make are ‘‘either–or’’ in nature– Such choices can be represented by a binary

(indicator) variable that takes the value 1 if one outcome is chosen and the value 0 otherwise

– The binary variable describing a choice is the dependent variable rather than an independent variable

16.1Models with Binary Dependent Variables



Variable Models

Examples:– Models of why some individuals take a second or third

job, and engage in ‘‘moonlighting’’– Models of why some legislators in the U.S. House of

Representatives vote for a particular bill and others do not

– Models explaining why some loan applications are accepted and others are not at a large metropolitan bank

– Models explaining why some individuals vote for increased spending in a school board election and others vote against

– Models explaining why some female college students decide to study engineering and others do not




Variable Models

We represent an individual’s choice by the indicator variable:


1 individual drives to work

0 individual takes bus to worky

Eq. 16.1



Variable Models

If the probability that an individual drives to work is p, then P[y = 1] = p– The probability that a person uses public

transportation is P[y = 0] = 1 – p– The probability function for such a binary

random variable is:

with


Eq. 16.21( ) (1 ) , 0,1y yf y p p y

, var 1E y p y p p



Variable Models

For our analysis, define the explanatory variable as:

x = (commuting time by bus - commuting time by car)




Variable Models

We could model the indicator variable y using the linear model, however, there are several problems:– It implies marginal effects of changes in

continuous explanatory variables are constant, which cannot be the case for a probability model• This feature also can result in predicted

probabilities outside the [0, 1] interval– The linear probability model error term is

heteroskedastic, so that a better estimator is generalized least squares


16.1.1The Linear

Probability Model



Variable Models

In regression analysis we break the dependent variable into fixed and random parts– If we do this for the indicator variable y, we

have:

– Assuming that the relationship is linear:


16.1.1The Linear

Probability Model

( )y E y e p e Eq. 16.3

1 2( )E y p x Eq. 16.4



Variable Models

The linear regression model for explaining the choice variable y is called the linear probability model:


16.1.1The Linear

Probability Model

Eq. 16.5 1 2( )y E y e x e



Variable Models

The probability density functions for y and e are:


16.1.1The Linear

Probability Model



Variable Models

Using these values it can be shown that the variance of the error term e is:

– The estimated variance of the error term is:


16.1.1The Linear

Probability Model

1 2 1 2var 1e x x

21 2 1 2ˆ var 1i i i ie b b x b b x Eq. 16.6



Variable Models

We can transform the data as:

– And estimate the model:

by least squares to produce the feasible generalized least squares estimates– Both least squares and feasible generalized least

squares are consistent estimators of the regression parameters


16.1.1The Linear

Probability Model

*

*

ˆ

ˆi i i

i i i

y y

x x

* 1 * *1 2ˆi i i iy x e



Variable Models

If we estimate the parameters of Eq. 16.5 by least squares, we obtain the fitted model explaining the systematic portion of y– This systematic portion is p

– By substituting alternative values of x,we can easily obtain values that are less than zero or greater than one


16.1.1The Linear

Probability Model

1 2p̂ b b x Eq. 16.7



Variable Models

The underlying feature that causes these problems is that the linear probability model implicitly assumes that increases in x have a constant effect on the probability of choosing to drive


16.1.1The Linear

Probability Model

Eq. 16.8 2

dp

dx



Variable Models

To keep the choice probability p within the interval [0, 1], a nonlinear S-shaped relationship between x and p can be used


16.1.2The Probit Model



Variable Models



FIGURE 16.1 (a) Standard normal cumulative distribution function (b) Standard normal probability density function



Variable Models

A functional relationship that is used to represent such a curve is the probit function– The probit function is related to the standard

normal probability distribution:

– The probit function is:



2.51( )

2zz e

2.51( ) [ ]

2uz

z P Z z e du

Eq. 16.9



Variable Models

The probit statistical model expresses the probability p that y takes the value 1 to be:

– The probit model is said to be nonlinear



Eq. 16.10 1 2 1 2[ ] ( )p P Z x x



Variable Models

We can examine the marginal effect of a one-unit change in x on the probability that y = 1 by considering the derivative:


16.1.3Interpretation of the

Probit Model

Eq. 16.11 1 2 2

( )( )

dp d t dtx

dx dt dx



Variable Models

Eq. 16.11 has the following implications:

1. Since Φ(β1+ β2x) is a probability density function, its value is always positive

2. As x changes, the value of the function Φ(β1+ β2x) changes

3. if β1+ β2x is large, then the probability that the individual chooses to drive is very large and close to one

• Similarly if β1+ β2x is small



Probit Model



Variable Models

We estimate the probability p to be:

– By comparing to a threshold value, like 0.5, we can predict choice using the rule:



Probit Model

1 2ˆ ( )p x Eq. 16.12

ˆ1 0.5ˆ

ˆ0 0.5

py

p



Variable Models

The probability function for y is combined with the probit model to obtain:


16.1.4Maximum Likelihood

Estimation of the Probit Model

Eq. 16.131

1 2 1 2( ) [ ( )] [1 ( )] , 0,1i iy yi i i if y x x y



Variable Models

If the three individuals are independently drawn, then:

– The probability of observing y1 = 1, y2 = 1, and y3 = 0 is:




1 2 3 1 2 3( , , ) ( ) ( ) ( )f y y y f y f y f y

1 2 3[ 1, 1, 0] (1, 1, 0) (1) (1) (0)P y y y f f f f



Variable Models

We now have:

for x1 = 15, x2 = 6, and x3 = 7

– This function, which gives us the probability of observing the sample data, is called the likelihood function

• The notation L(β1, β2) indicates that the likelihood function is a function of the unknown parameters, β1and β2




Eq. 16.14

1 2 3 1 2 1 2

1 2

1 2

[ 1, 1, 0] (15) (6)

1 (7)

β ,β

P y y y

L



Variable Models

In practice, instead of maximizing Eq. 16.14, we maximize the logarithm of Eq. 16.14, which is called the log-likelihood function:

– The maximization of the log-likelihood function is easier than the maximization of Eq. 16.14

– The values that maximize the log-likelihood function also maximize the likelihood function• They are the maximum likelihood estimates




Eq. 16.15

1 2 1 2 1 2 1 2

1 2 1 2 1 2

ln β ,β ln (15) (6) 1 (7)

ln (15) ln (6) ln 1 (7)

L



Variable Models

A feature of the maximum likelihood estimation procedure is that while its properties in small samples are not known, we can show that in large samples the maximum likelihood estimator is normally distributed, consistent and best, in the sense that no competing estimator has smaller variance






Variable Models

Let DTIME = (BUSTIME-AUTOTIME)÷10, which is the commuting time differential in 10-minute increments– The probit model is:

P(AUTO = 1) = Φ(β1+ β2DTIME)

– The maximum likelihood estimates of the parameters are:


16.1.5A Transportation

Example

1 2 0.0644 0.3000

(se) (0.3992) (0.1029) iDTIME DTIME



Variable Models

The marginal effect of increasing public transportation time, given that travel via public transportation currently takes 20 minutes longer than auto travel is:



Example

1 2 2( ) ( 0.0644 0.3000 2)(0.3000)

(0.5355)(0.3000) 0.3456 0.3000 0.1037

dpDTIME

dDTIME



Variable Models

If an individual is faced with the situation that it takes 30 minutes longer to take public transportation than to drive to work, then the estimated probability that auto transportation will be selected is:

– Since 0.7983 > 0.5, we “predict” the individual will choose to drive



Example

1 2ˆ ( ) ( 0.0644 0.3000 3) 0.7983p DTIME



Variable Models

Rather than evaluate the marginal effect at a specific value, or the mean value, the average marginal effect (AME) is often considered:

– For our problem:– The sample standard deviation is: 0.0365– Its minimum and maximum values are 0.0025

and 0.1153


16.1.6Further Post-

Estimation Analysis

1 2 21

1β β β

N

iiAME DTIME

N

0.0484AME



Variable Models

Consider the marginal effect:

– The marginal effect function is nonlinear


16.1.6Further Post-

Estimation Analysis

1 2 2 1 2( ) ,

dpDTIME g

dDTIME



Variable Models

16.2

The Logit Model for Binary Choice



Variable Models

Probit model estimation is numerically complicated because it is based on the normal distribution– A frequently used alternative to the probit

model for binary choice situations is the logit model

– These models differ only in the particular S-shaped curve used to constrain probabilities to the [0, 1] interval

16.2The Logit Model for

Binary Choice



Variable Models

If L is a logistic random variable, then its probability density function is:

– The cumulative distribution function for a logistic random variable is:


Binary Choice

2( ) ,

1

l

l

el l

e

Eq. 16.16

1[ ]

1 ll p L l

e

Eq. 16.17



Variable Models

The probability p that the observed value y takes the value 1 is:


Binary Choice

Eq. 16.18 1 21 2 1 2

1

1 xp P L x x

e



Variable Models

The probability that y = 1 is:

The probability that y = 0 is:


Binary Choice

1 2

1 2

1 2

exp1

1 exp1 x

xp

xe

1 2

11

1 expp

x



Variable Models

The shapes of the logistic and normal probability density functions are somewhat different, and maximum likelihood estimates of β1 and β2 will be different– However, the marginal probabilities and the

predicted probabilities differ very little in most cases


Binary Choice



Variable Models

Consider the Coke example with:


Binary Choice

16.2.1An Empirical Example from

Marketing

1 if Coke is chosen

0 if Pepsi is chosenCOKE



Variable Models

Based on ‘‘scanner’’ data on 1,140 individuals who purchased Coke or Pepsi, the probit and logit models for the choice are:


Binary Choice


Marketing

1 2 3 4

1 2 3 4

β β β _ β _

γ γ γ _ γ _

COKE

COKE

p E COKE PRATIO DISP COKE DISP PEPSI

p E COKE PRATIO DISP COKE DISP PEPSI



Variable Models


Binary Choice


Marketing

Table 16.1 Coke-Pepsi Choice Models



Variable Models

The parameters and their estimates vary across the models and no direct comparison is very useful, but some rules of thumb exist– Roughly:


Binary Choice


Marketing

Logit LPM

Probit LPM

Logit Probit

ˆγ 4β

ˆβ 2.5β

γ 1.6β



Variable Models

If the null hypothesis is H0: βk = c, then the test statistic using the probit model is:

– The t-test is based on the Wald principle


Binary Choice

16.2.2Wald Hypothesis

Tests

β

se β

ak

N K

k

ct t



Variable Models

Using the probit model, consider the two hypotheses:


Binary Choice


Tests

0 3 4 1 3 4

0 3 4 1 3 4

Hypothesis (1) :β β , :β β

Hypothesis (2) :β 0, β 0, : either β or β is not zero

H H

H H



Variable Models

To test hypothesis (1) in a linear model, we would compute:

– Noting that it is a two-tail hypothesis, we reject the null hypothesis at the α = 0.05 level if t ≥ 1.96 or t ≤ -1.96

– The calculated t-value is t = -2.3247, so we reject the null hypothesis• We conclude that the effects of the Coke and

Pepsi displays are not of equal magnitude with opposite sign


Binary Choice


Tests

_ _

1140 4 1136

_ _

β β

se β β

aDISP COKE DISP PEPSI

DISP COKE DISP PEPSI

t t



Variable Models

A generalization of the Wald statistic is used to test the joint null hypothesis (2) that neither the Coke nor Pepsi display affects the probability of choosing Coke


Binary Choice


Tests



Variable Models

When using maximum likelihood estimators, such as probit and logit, tests based on the likelihood ratio principle are generally preferred– The idea is much like the F-test• One test component is the log-likelihood

function value in the unrestricted, full model (ln LU) evaluated at the maximum likelihood estimates• The second ingredient is the log-likelihood

function value from the model that is ‘‘restricted’’ by imposing the condition that the null hypothesis is true (ln LR)


Binary Choice

16.2.3Likelihood Ratio Hypothesis Tests



Variable Models

The restricted probit model is obtained by imposing the condition β3 = -β4:

–We have LR = -713.6595– The likelihood ratio test statistic value is:


Binary Choice


1 2 3 4

1 2 4 4

1 2 4

β β β _ β _

β β β _ β _

β β _β _ _

COKEp E COKE PRATIO DISP COKE DISP PEPSI

PRATIO DISP COKE DISP PEPSI

PRATIO DISP PEPSI DISP COKE

2 ln ln

2 710.9486 713.6595

5.4218

U RLR L L



Variable Models

To test the null hypothesis (2), use the restricted model E(COKE) = Φ(β1 + β2PRATIO)

– The value of the likelihood ratio test statistic is 19.55


Binary Choice




Variable Models

16.3

Multinomial Logit



Variable Models

We are often faced with choices involving more than two alternatives– These are called multinomial choice situations• If you are shopping for a laundry detergent,

which one do you choose? Tide, Cheer, Arm & Hammer, Wisk, and so on• If you enroll in the business school, will you

major in economics, marketing, management, finance, or accounting?

16.3Multinomial Logit



Variable Models

The estimation and interpretation of the models is, in principle, similar to that in logit and probit models– The models go under the names• multinomial logit• conditional logit• multinomial probit




Variable Models

As in the logit and probit models, we will try to explain the probability that the ith person will choose alternative j

– Assume J = 3


16.3.1Multinomial Logit

Choice Probabilities

ij = individual chooses alternative p P i j



Variable Models

For a single explanatory factor, the choice probabilities are:




Eq. 16.19a 112 22 13 23

1, 1

1 exp expii i

p jx x

12 222

12 22 13 23

exp, 2

1 exp expi

ii i

xp j

x x

13 233

12 22 13 23

exp, 3

1 exp expi

ii i

xp j

x x

Eq. 16.19b

Eq. 16.19c



Variable Models

A distinguishing feature of the multinomial logit model in Eq. 16.19 is that there is a single explanatory variable that describes the individual, not the alternatives facing the individual– Such variables are called individual specific– To distinguish the alternatives, we give them

different parameter values






Variable Models

Suppose that we observe three individuals, who choose alternatives 1, 2, and 3, respectively– Assuming that their choices are independent,

then the probability of observing this outcome is:


16.3.2Maximum Likelihood Estimation

11 22 33 11 22 331, 1, 1P y y y p p p



Variable Models

Or



11 22 3312 22 1 13 23 1

12 22 2

12 22 2 13 23 2

13 23 3

12 22 3 13 23 3

12 22 13 23

11, 1, 1

1 exp exp

exp

1 exp exp

exp

1 exp exp

, , ,

P y y yx x

x

x x

x

x x

L



Variable Models

Maximum likelihood estimation seeks those values of the parameters that maximize the likelihood or, more specifically, the log-likelihood function, which is easier to work with mathematically





Variable Models

For the value of the explanatory variable x0, we can calculate the predicted probabilities of each outcome being selected– For alternative 1:

– Similarly for alternatives 2 and 3


16.3.3Post-Estimation

Analysis

01

12 22 0 13 23 0

1

1 exp expp

x x



Variable Models

The βs are not ‘‘slopes’’– The marginal effect is the effect of a change in

x, everything else held constant, on the probability that an individual chooses alternative m = 1, 2, or 3:



Analysis

3

2 21all else constant

im imim m j ij

ji i

p pp p

x x

Eq. 16.20



Variable Models

Alternatively, and somewhat more simply, the difference in probabilities can be calculated for two specific values of xi



Analysis

1 1 1

12 22 13 23

12 22 13 23

1

1 exp exp

1

1 exp exp

b a

b b

a a

p p p

x x

x x



Variable Models

Another useful interpretive device is the probability ratio– It shows how many times more likely category

j is to be chosen relative to the first category

– The effect on the probability ratio of changing the value of xi is given by the derivative:



Analysis

1 2

1

exp 2,31

ijij j i

i i

pP y jx j

P y p

1

2 1 2exp 2,3ij i

j j j ii

p px j

x

Eq. 16.21

Eq. 16.22



Variable Models

An interesting feature of the probability ratio Eq. 16.21 is that it does not depend on how many alternatives there are in total– There is the implicit assumption in logit models

that the probability ratio between any pair of alternatives is independent of irrelevant alternatives (IIA)• This is a strong assumption, and if it is

violated, multinomial logit may not be a good modeling choice• It is especially likely to fail if several

alternatives are similar



Analysis



Variable Models

Tests for the IIA assumption work by dropping one or more of the available options from the choice set and then re-estimating the multinomial model– If the IIA assumption holds, then the estimates

should not change very much– A statistical comparison of the two sets of

estimates, one set from the model with a full set of alternatives, and the other from the model using a reduced set of alternatives, is carried out using a Hausman contrast test proposed by Hausman and McFadden



Analysis



Variable Models


16.3.4An Example

Table 16.2 Maximum Likelihood Estimates of PSE Choice



Variable Models


16.3.4An Example

Table 16.3 Effects of Grades on Probability of PSE Choice



Variable Models

16.4

Conditional Logit



Variable Models

Variables like PRICE are individual- and alternative-specific because they vary from individual to individual and are different for each choice the consumer might make– This type of information is very different from

what we assumed was available in the multinomial logit model, where the explanatory variable xi was individual-specific; it did not change across alternatives

16.4Conditional Logit



Variable Models

Consider a model for the probability that individual i chooses alternative j:

The conditional logit model specifies these probabilities as:


16.4.1Conditional Logit


individual chooses alternative ijp P i j

1 2

11 2 1 12 2 2 13 2 3

exp

exp exp expj ij

iji i i

PRICEp

PRICE PRICE PRICE

Eq. 16.23



Variable Models

Set β13 = 0

– Estimation of the unknown parameters is by maximum likelihood• Suppose that we observe three individuals,

who choose alternatives one, two, and three, respectively






Variable Models

We have:




11 22 33 11 22 33

11 2 11

11 2 11 12 2 12 2 13

12 2 22

11 2 21 12 2 22 2 23

2 33

11 2 31 12 2

1, 1, 1

exp

exp exp exp

exp

exp exp exp

exp

exp exp

P y y y p p p

PRICE

PRICE PRICE PRICE

PRICE

PRICE PRICE PRICE

PRICE

PRICE PRICE

32 2 33

12 22 2

exp

, ,

PRICE

L



Variable Models

The own price effect is:

The change in probability of alternative j being selected if the price of alternative k changes (k ≠ j) is:



Analysis

21ijij ij

ij

pp p

PRICE

2ij

ij ikik

pp p

PRICE

Eq. 16.24

Eq. 16.25



Variable Models

An important feature of the conditional logit model is that the probability ratio between alternatives j and k is:

– The probability ratio depends on the difference in prices, but not on the prices themselves



Analysis

1 2

1 1 21 2

expexp

expj ijij

j k ij ikik k ik

PRICEpPRICE PRICE

p PRICE



Variable Models



Analysis

Table 16.4a Conditional Logit Parameter Estimates



Variable Models



Analysis

Table 16.4b Marginal Effect of Price on Probability of Pepsi Choice



Variable Models

Models that do not require the IIA assumption have been developed, but they are difficult– These include the multinomial probit model,

which is based on the normal distribution, and the nested logit and mixed logit models



Analysis



Variable Models

The predicted probability of a Pepsi purchase, given that the price of Pepsi is $1.00, the price of 7-Up is $1.25 and the price of Coke is $1.10 is:


16.4.3An Example

11 2

1

11 2 12 2 2

exp 1.00ˆ 0.4832

exp 1.00 exp 1.25 exp 1.10ip



Variable Models

The standard error for this predicted probability is 0.0154, which is computed via ‘‘the delta method.’’ – If we raise the price of Pepsi to $1.10, we estimate that

the probability of its purchase falls to 0.4263 (se = 0.0135)

– If the price of Pepsi stays at $1.00 but we increase the price of Coke by 15 cents, then we estimate that the probability of a consumer selecting Pepsi rises by 0.0445 (se = 0.0033)

– These numbers indicate to us the responsiveness of brand choice to changes in prices, much like elasticities


16.4.3An Example



Variable Models

16.5

Ordered Choice Models



Variable Models

The choice options in multinomial and conditional logit models have no natural ordering or arrangement– However, in some cases choices are ordered in

a specific way

16.5Ordered Choice

Models



Variable Models

Examples:

1. Results of opinion surveys in which responses can be strongly disagree, disagree, neutral, agree or strongly agree

2. Assignment of grades or work performance ratings

3. Standard and Poor’s rates bonds as AAA, AA, A, BBB and so on

4. Levels of employment are unemployed, part-time, or full-time

16.5Ordered Choice

Models



Variable Models

When modeling these types of outcomes numerical values are assigned to the outcomes, but the numerical values are ordinal, and reflect only the ranking of the outcomes– In the first example, we might assign a

dependent variable y the values:

16.5Ordered Choice

Models

1 strongly disagree

2 disagree

3 neutral

4 agree

5 strongly agree

y



Variable Models

There may be a natural ordering to college choice–We might rank the possibilities as:

– The usual linear regression model is not appropriate for such data, because in regression we would treat the y values as having some numerical meaning when they do not

16.5Ordered Choice

Models

3 4-year college (the full college experience)

2 2-year college (a partial college experience)

1 no college

y

Eq. 16.26



Variable Models

When faced with a ranking problem, we develop a ‘‘sentiment’’ about how we feel concerning the alternative choices, and the higher the sentiment, the more likely a higher-ranked alternative will be chosen– This sentiment is, of course, unobservable to

the econometrician– Unobservable variables that enter decisions are

called latent variables

16.5Ordered Choice

Models

16.5.1Ordered Probit




Variable Models

For college choice, a latent variable may be grades:

– This model is not a regression model, because the dependent variable is unobservable• Consequently it is sometimes called an

index model

16.5Ordered Choice

Models



*i i iy GRADES e



Variable Models

16.5Ordered Choice

Models



FIGURE 16.2 Ordinal choices relative to thresholds



Variable Models

We can now specify:

16.5Ordered Choice

Models



*2*

1 2*

1

3 (4-year college) if

2 (2-year college) if

1 (no college) if

i

i

i

y

y y

y



Variable Models

If we assume that the errors have the standard normal distribution, N(0, 1), an assumption that defines the ordered probit model, then we can calculate the following:

16.5Ordered Choice

Models



*1 1

1

1

1i i i i

i i

i

P y P y P GRADES e

P e GRADES

GRADES



Variable Models

Also:

16.5Ordered Choice

Models



*1 2 1 2

1 2

2 1

2i i i i

i i i

i i

P y P y P GRADES e

P GRADES e GRADES

GRADES GRADES



Variable Models

Finally:

16.5Ordered Choice

Models



*2 2

2

2

3

1

i i i i

i i

i

P y P y P GRADES e

P e GRADES

GRADES



Variable Models

If we observe a random sample of N = 3 individuals, with the first not going to college (y1 = 1), the second attending a two-year college (y2 = 2), and the third attending a four-year college (y3 = 3), then the likelihood function is:

16.5Ordered Choice

Models

16.5.2Estimation and Interpretation

1 2 1 2 3, , 1 2 3L P y P y P y



Variable Models

Econometric software includes options for both ordered probit, which depends on the errors being standard normal, and ordered logit, which depends on the assumption that the random errors follow a logistic distribution–Most economists will use the normality

assumption–Many other social scientists use the logistic– There is little difference between the results

16.5Ordered Choice

Models




Variable Models

The types of questions we can answer with this model are the following:

1. What is the probability that a high school graduate with GRADES = 2.5 (on a 13-point scale, with one being the highest) will attend a two-year college?

16.5Ordered Choice

Models


2 1ˆ 2 | 2.5 2.5 2.5P y GRADES



Variable Models

The types of questions (Continued):

2. What is the difference in probability of attending a four-year college for two students, one with GRADES = 2.5 and another with GRADES = 4.5?

16.5Ordered Choice

Models


ˆ ˆ2 | 4.5 2 | 2.5P y GRADES P y GRADES



Variable Models

The types of questions (Continued):3. If we treat GRADES as a continuous variable,

what is the marginal effect on the probability of each outcome, given a one-unit change in GRADES?

16.5Ordered Choice

Models


1

1 2

2

1

2

3

P yGRADES

GRADES

P yGRADES GRADES

GRADES

P yGRADES

GRADES



Variable Models

16.5Ordered Choice

Models

16.5.3An Example

Table 16.5 Ordered Probit Parameter Estimates



Variable Models

16.6

Models for Count Data



Variable Models

When the dependent variable in a regression model is a count of the number of occurrences of an event, the outcome variable is y = 0, 1, 2, 3, …– These numbers are actual counts, and thus

different from ordinal numbers

16.6Models for Count

Data



Variable Models

Examples:– The number of trips to a physician a person

makes during a year– The number of fishing trips taken by a person

during the previous year– The number of children in a household– The number of automobile accidents at a

particular intersection during a month– The number of televisions in a household– The number of alcoholic drinks a college

student takes in a week


Data



Variable Models

The probability distribution we use as a foundation is the Poisson, not the normal or the logistic– If Y is a Poisson random variable, then its

probability function is:

where


Data

, 0,1,2,!

yef y P Y y y

y

Eq. 16.27

! 1 2 1y y y y



Variable Models

In a regression model, we try to explain the behavior of E(Y) as a function of some explanatory variables–We do the same here, keeping the value of E(Y)

≥ 0 by defining:

– This choice defines the Poisson regression model for count data


Data

Eq. 16.28 1 2expE Y x



Variable Models

Suppose we randomly select N = 3 individuals from a population and observe that their counts are y1 = 0, y2 = 2, and y3 = 2, indicating 0, 2, and 2 occurrences of the event for these three individuals– The likelihood function is the joint probability

function of the observed data is:


Data


1 2, 0 2 2L P Y P Y P Y



Variable Models

The log-likelihood function is:

– Using Eq. 16.28 for λ, the log of the probability function is:


Data


1 2ln , ln 0 ln 2 ln 2L P Y P Y P Y

1 2 1 2

ln ln!

ln ln !

exp ln !

yeP Y y

y

y y

x y x y



Variable Models

Then the log-likelihood function, given a sample of N observations, becomes:


Data


1 2 1 2 1 21

ln , exp ln !N

i i i ii

L x y x y



Variable Models

Prediction of the conditional mean of y is straightforward:

The probability of a particular number of occurrences can be estimated by inserting the estimated conditional mean into the probability function, as:


Data

16.6.2Interpretation of the Poisson Regression

Model

0 0 1 2 0expE y x

0 0expPr , 0,1,2,

!

y

Y y yy



Variable Models

The marginal effect is:

– This can be expressed as a percentage, which can be useful:


Data


Model

2

ii

i

E y

x

Eq. 16.29

2

%100 100 %i i

i i

E y E y E y

x x



Variable Models

Suppose the conditional mean function contains a indicator variable, how do we calculate its effect?

– If E(yi) = λi = exp(β1 + β2xi + δDi), we can examine the conditional expectation when D = 0 and when D = 1


Data


Model

1 2

1 2

| 0 exp

| 1 exp

i i i

i i i

E y D x

E y D x



Variable Models

The percentage change in the conditional mean is:


Data


Model

1 2 1 2

1 2

exp exp100 % 100 1 %

expi i

i

x xe

x



Variable Models


Data

16.6.3An Example

Table 16.6 Poisson Regression Estimates



Variable Models

16.7

Limited Dependent Variables



Variable Models

When a model has a discrete dependent variable, the usual regression methods we have studied must be modified– Now we present another case in which standard

least squares estimation of a regression model fails

16.7Limited Dependent

Variables



Variable Models


Variables

16.7.1Censored Data

FIGURE 16.3 Histogram of wife’s hours of work in 1975



Variable Models

This is an example of censored data, meaning that a substantial fraction of the observations on the dependent variable take a limit value, which is zero in the case of market hours worked by married women


Variables

16.7.1Censored Data



Variable Models

We previously showed the probability density functions for the dependent variable y, at different x-values, centered on the regression function

– This leads to sample data being scattered along the regression function

– Least squares regression works by fitting a line through the center of a data scatter, and in this case such a strategy works fine, because the true regression function also fits through the middle of the data scatter


Variables

16.7.1Censored Data

1 2|E y x x Eq. 16.30



Variable Models

For our new problem when a substantial number of observations have dependent variable values taking the limit value of zero, the regression function E(y|x) is no longer given by Eq. 16.30– Instead E(y|x) is a complicated nonlinear

function of the regression parameters β1 and β2, the error variance σ2, and x

– The least squares estimators of the regression parameters obtained by running a regression of y on x are biased and inconsistent—least squares estimation fails


Variables

16.7.1Censored Data



Variable Models

In this example we give the parameters the specific values β1 = -9 and β2 = 1

– The observed sample is obtained within the framework of an index or latent variable model:

–We assume:


Variables

16.7.2A Monte Carlo

Experiment

*1 2 9i i i i iy x e x e Eq. 16.31

2~ 0, 16ie N

*

* *

0 if 0

if 0

i i

i i i

y y

y y y



Variable Models


Variables

16.7.2A Monte Carlo

Experiment

FIGURE 16.4 Uncensored sample data and regression function



Variable Models

In Figure 16.5 we show the estimated regression function for the 200 observed y-values, which is given by:

– If we restrict our sample to include only the 100 positive y-values, the fitted regression is:


Variables

16.7.2A Monte Carlo

Experiment

ˆ 2.1477 0.5161

(se) (.3706) (0.0326)

y x Eq. 16.32a

ˆ 3.1399 0.6388

(se) (1.2055) (0.0827)

y x Eq. 16.32b



Variable Models


Variables

16.7.2A Monte Carlo

Experiment

FIGURE 16.5 Censored sample data, and latent regression function and least squares fitted line



Variable Models

We can compute the average values of the estimates, which is the Monte Carlo ‘‘expected value’’:

where bk(m) is the estimate of βk in the mth Monte Carlo sample


Variables

16.7.2A Monte Carlo

Experiment

Eq. 16.33 ( )1

1 NSAM

MC k k mm

E b bNSAM



Variable Models

If the dependent variable is censored, having a lower limit and/or an upper limit, then the least squares estimators of the regression parameters are biased and inconsistent–We can apply an alternative estimation

procedure, which is called Tobit


Variables




Variable Models

Tobit is a maximum likelihood procedure that recognizes that we have data of two sorts:

1. The limit observations (y = 0)

2. The nonlimit observations (y > 0)– The two types of observations that we observe,

the limit observations and those that are positive, are generated by the latent variable y* crossing the zero threshold or not crossing that threshold


Variables




Variable Models

The (probit) probability that y = 0 is:


Variables


1 20 [ 0] 1i i iP y P y x



Variable Models

The full likelihood function is the product of the probabilities that the limit observations occur times the probability density functions for all the positive, nonlimit, observations:

– The maximum likelihood estimator is consistent and asymptotically normal, with a known covariance matrix.


Variables


1

221 2 21 2 1 22

0 0

1, , 1 2 exp

2i i

ii i

y y

xL y x



Variable Models

For artificial data, we estimate:


Variables


10.2773 1.0487

(se) (1.0970) (0.0790)i iy x

Eq. 16.34



Variable Models


Variables


Table 16.7 Censored Data Monte Carlo Results



Variable Models

In the Tobit model the parameters β1and β2 are the intercept and slope of the latent variable model Eq. 16.31– In practice we are interested in the marginal

effect of a change in x on either the regression function of the observed data E(y|x) or the regression function conditional on y > 0, E(y|x, y > 0)


Variables

16.7.4Tobit Model

Interpretation



Variable Models

The slope of E(y|x) is:


Variables

16.7.4Tobit Model

Interpretation

1 22

|E y x x

x

Eq. 16.35



Variable Models

The marginal effect can be decomposed into two factors called the ‘‘McDonald-Moffit’’ decomposition:

– The first factor accounts for the marginal effect of a change in x for the portion of the population whose y-data is observed already

– The second factor accounts for changes in the proportion of the population who switch from the y-unobserved category to the y-observed category when x changes


Variables

16.7.4Tobit Model

Interpretation

| | , 0 Prob 0Prob 0 | , 0

E y x E y x y yy E y x y

x x x



Variable Models


Variables

16.7.4Tobit Model

Interpretation

FIGURE 16.6 Censored sample data, and regression functions for observed and positive y-values



Variable Models

Consider the regression model:


Variables

16.7.5An Example

1 2 3 4 5 6HOURS EDUC EXPER AGE KIDSL e Eq. 16.36



Variable Models


Variables

16.7.5An Example

Table 16.8 Estimates of Labor Supply Function



Variable Models

The calculated scale factor is– The marginal effect on observed hours of work

of another year of education is:

• Another year of education will increase a wife’s hours of work by about 26 hours, conditional upon the assumed values of the explanatory variables


Variables

16.7.5An Example

0.3638

2 73.29 0.3638 26.34

E HOURS

EDUC



Variable Models

If the data are obtained by random sampling, then classic regression methods, such as least squares, work well– However, if the data are obtained by a sampling

procedure that is not random, then standard procedures do not work well

– Economists regularly face such data problems


Variables

16.7.6Sample Selection



Variable Models

If we wish to study the determinants of the wages of married women, we face a sample selection problem–We only observe data on market wages when

the woman chooses to enter the workforce– If we observe only the working women, then

our sample is not a random sample• The data we observe are ‘‘selected’’ by a

systematic process for which we do not account


Variables




Variable Models

A solution to this problem is a technique called Heckit– This procedure uses two estimation steps:

1. A probit model is first estimated explaining why a woman is in the labor force or not–The selection equation

2. A least squares regression is estimated relating the wage of a working woman to education, experience, and so on, and a variable called the ‘‘inverse Mills ratio,’’ or IMR–The IMR is created from the first step probit

estimation and accounts for the fact that the observed sample of working women is not random


Variables




Variable Models

The selection equation

– It is expressed in terms of a latent variable z*I

that depends on one or more explanatory variables wi, and is given by:

– The latent variable is not observed, but we do observe the indicator variable:


Variables

16.7.6aThe Econometric

Model

*1 2 1, ,i i iz w u i N Eq. 16.37

*1 0

0 otherwise

i

i

zz

Eq. 16.38



Variable Models

The second equation is the linear model of interest:

– A selectivity problem arises when yi is observed only when zi = 1 and if the errors of the two equations are correlated• In such a situation the usual least squares

estimators of β1and β2 are biased and inconsistent


Variables


Model

Eq. 16.39 1 2 1, , ,i i iy x e i n N n



Variable Models

Consistent estimators are based on the conditional regression function:

where the additional variable λi is the ‘‘inverse Mills ratio”:


Variables


Model

Eq. 16.40 *1 2| 0 , 1, ,i i i iE y z x i n

1 2

1 2

ii

i

w

w

Eq. 16.41



Variable Models

The parameters γ1 and γ2 can be estimated using a probit model, based on the observed binary outcome zi so that the estimated IMR:

– Therefore:


Variables


Model

1 2

1 2

ii

i

w

w

1 2 , 1, ,i i i iy x v i n Eq. 16.42



Variable Models

An estimated model is:

The Heckit procedure starts by estimating a probit model:

The inverse Mills ratio is:


Variables

16.7.6bHeckit Example: Wages of Married

Women

Eq. 16.43 2ln 0.4002 0.1095 0.0157 0.1484

(t) ( 2.10) (7.73) (3.90)

WAGE EDUC EXPER R

1 1.1923 0.0206 0.0838 0.3139 1.3939

( ) ( 2.93) (3.61) ( 2.54) ( 2.26)

P LFP AGE EDUC KIDS MTR

t

1.1923 0.0206 0.0838 0.3139 1.3939

1.1923 0.0206 0.0838 0.3139 1.3939

AGE EDUC KIDS MTRIMR

AGE EDUC KIDS MTR



Variable Models

The final combined model is:


Variables


Women

Eq. 16.44

ln 0.8105 0.0585 0.0163 0.8664

( ) (1.64) (2.45) (4.08) ( 2.65)

( -adj) (1.33) (1.97) (3.88)

WAGE EDUC EXPER IMR

t

t

( 2.17)



Variable Models

In most instances it is preferable to estimate the full model, both the selection equation and the equation of interest, jointly by maximum likelihood– The maximum likelihood estimated wage

equation is:

– The standard errors based on the full information maximum likelihood procedure are smaller than those yielded by the two-step estimation method


Variables


Women

ln 0.6686 0.0658 0.0118

( ) (2.84) (3.96) (2.87)

WAGE EDUC EXPER

t



Variable Models

Key Words



Variable Models

Keywords

binary choice models

censored data

conditional logit

count data models

feasible generalized least squares

Heckit

identification problem

independence of irrelevant alternatives (IIA)

index models

individual and alternative specific variables

individual specific variables

latent variables

likelihood function

limited dependent variables

linear probability model

logistic random variable

logit

log-likelihood function

marginal effect

maximum likelihood estimation

multinomial choice models

multinomial logit



Variable Models

Keywords

odds ratio

ordered choice models

ordered probit

ordinal variables

Poisson random variable

Poisson regression model

probit

selection bias

Tobit model

truncated data



Variable Models

Appendices



Variable Models

Consider the probit model p = Φ(β1 + β2x)

– The marginal effect at x = x0 is:

– The estimator of the marginal effect, based on maximum likelihood, is:

16AProbit Marginal Effects: Details

0

1 2 0 2 1 2β β β β ,βx x

dpx g

dx

1 2β ,βg

16A.1Standard Error

of Marginal Effect at a Given

Point



Variable Models

The variance is:


2 2

1 2 1 21 2 1 2

1 2

1 2 1 21 2

1 2

β ,β β ,βvar β ,β var β var β

β β

β ,β β ,β 2 cov β ,β

β β

g gg

g g

Eq. 16A.1

16A.1Standard Error


Point



Variable Models

To implement the delta method we require the derivative:


1 2 0 21 2

1 1

1 2 0 22 1 2 0

1 1

1 2 0 1 2 0 2

β β ββ ,β

β β

β β ββ β β

β β

β β β β β

xg

xx

x x

16A.1Standard Error


Point



Variable Models

To obtain the final result, we used and:

We then obtain the key derivative:


21 2 0

21 2 0

1β β1 2 0 2

1 1

1β β

21 2 0

1 2 0 1 2 0

β β 1

β β 2

1 12 β β

22

β β β β

x

x

xe

e x

x x

2 1β β 0

1 21 2 0 1 2 0 2 0

2

β ,ββ β 1 β β β

β

gx x x

16A.1Standard Error


Point



Variable Models

Using the transportation data, we get:


1 1 2

1 2 2

var β cov β ,β 0.1593956 0.0003261

0.0003261 0.0105817cov β ,β var β

16A.1Standard Error


Point



Variable Models

For DTIME = 2 (x0 = 2), the calculated values of the derivatives are:

The estimated variance and standard error of the marginal effect are:


1 2 1 2

1 2

β ,β β ,β0.055531 and 0.2345835

β β

g g

1 2 1 2var β ,β 0.0010653 and se β ,β 0.0326394g g

16A.1Standard Error


Point



Variable Models

The average marginal effect of this continuous variable is:

We require the derivatives:


1 2 2 2 1 21

1β β β β ,βN

iiAME DTIME g

N

16A.2Standard Error

of Average Marginal Effect

2 1 21 2 21

1 1

1 2 211

1 2

11

β ,β 1β β β

β β

1β β β

β

β ,β1

β

N

ii

N

ii

N

i

gDTIME

N

DTIMEN

g

N



Variable Models

Similarly:


16A.2Standard Error


2 1 21 2 21

2 2

1 2 212

1 2

12

β ,β 1β β β

β β

1β β β

β

β ,β1

β

N

ii

N

ii

N

i

gDTIME

N

DTIMEN

g

N



Variable Models

For the transportation data:

The estimated variance and standard error of the average marginal effect are:


16A.2Standard Error


2 1 2 2 1 2

1 2

β ,β β ,β0.00185 and 0.032366

β β

g g

2 1 2 2 1 2var β ,β 0.0000117 and β ,β 0.003416g se g

principles of econometrics, 4t h editionpage 1 chapter 16: qualitative and limited dependent...

Documents

binary variable

class of models

moonlighting models

binary dependent variables

ordered choice models

binary indicator variable

binary random variable

principles of econometrics