dummy variable classification with two categories example: the cost of running a school depends on...

15
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES Example: the cost of running a school depends on the number of pupils, but it also depends on whether the school is an occupational school. Dummy variables always have two values, 0 or 1. If OCC is equal to 0, the cost function becomes that for regular schools. If OCC is equal to 1, the cost function becomes 11 N COST O ccupationalschools R egularschools 1 1 + Combined equation COST = 1 + OCC + 2 N + u OCC = 0 Regular school COST = 1 + 2 N + u OCC = 1 Occupational school COST = 1 + + 2 N + u

Upload: blaze-mccarthy

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Example: the cost of running a school depends on the number of pupils, but it also depends on whether the school is an occupational school.

Dummy variables always have two values, 0 or 1. If OCC is equal to 0, the cost function becomes that for regular schools. If OCC is equal to 1, the cost function becomes that for occupational schools. 11

N

CO

ST

Occupational schools

Regular schools

1

1+

Combined equation COST = 1 + OCC + 2N + u

OCC = 0 Regular school COST = 1 + 2N + u

OCC = 1 Occupational school COST = 1 + + 2N + u

COST = 1+ TTECH + WWORKER + VVOC + 2N + u

General School COST = 1+ 2N + u(TECH = WORKER = VOC = 0)

Technical School COST = (1+ T) + 2N + u(TECH = 1; WORKER = VOC = 0)

Skilled Workers’ School COST = (1+ W) + 2N + u(WORKER = 1; TECH = VOC = 0)

Vocational School COST = (1+ V) + 2N + u(VOC = 1; TECH = WORKER = 0)

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

Now the qualitative variable has four categories. The standard procedure is to choose one category as the reference category and to define dummy variables for each of the others.

Note: you must leave out the reference category, otherwise your model will be perfectly collinear!

16

CO

ST

N

1+T

1+W

1+V

1

Workers’Vocational

W

V

T

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

The diagram illustrates the model graphically. The coefficients are the extra overhead costs of running technical, skilled workers’, and vocational schools, relative to the overhead cost of general schools.

17

Technical

General

CO

ST

N

1+T

1+W

1+V

1

Workers’Vocational

W

V

T

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

We chose general academic schools as the reference (omitted) category and defined dummy variables for the other categories. This means that we can only compare other schools to general schools, and not to each other.

17

Technical

General

CO

ST

N

1+T

1+W

1+V

1

Workers’Vocational

W

V

T

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

However, suppose that we were interested in testing whether the overhead costs of skilled workers’ schools were different from those of the other types of school. How could we do this? It is simplest to re-run the regression making skilled workers’ schools the reference category.

17

Technical

General

7

TWO SETS OF DUMMY VARIABLES

The explanatory variables in a regression model may include multiple sets of dummy variables. Now you need to think about every combination, and the reference category is the one in which all dummy variables are zero.

COST = 1+ OCC + RES + 2N + u

Regular, nonresidential COST = 1+ 2N + u(OCC = RES = 0)

Regular, residential COST = (1+ ) + 2N + u(OCC = 0; RES = 1)

Occupational, nonresidential COST = (1+ ) + 2N + u(OCC = 1; RES = 0)

Occupational, residential COST = (1+ + ) + 2N + u(OCC = RES = 1)

7

TWO SETS OF DUMMY VARIABLES

In the case of a non-residential occupational school, RES is 0 and OCC is 1, so the overhead cost increases by . If the school is both occupational and residential, it increases by ( + ).

COST = 1+ OCC + RES + 2N + u

Regular, nonresidential COST = 1+ 2N + u(OCC = RES = 0)

Regular, residential COST = (1+ ) + 2N + u(OCC = 0; RES = 1)

Occupational, nonresidential COST = (1+ ) + 2N + u(OCC = 1; RES = 0)

Occupational, residential COST = (1+ + ) + 2N + u(OCC = RES = 1)

CO

ST

N

1++

1+

1+1

Occupational, residential

Regular, nonresidential

+

8

Occupational,nonresidential

Regular,residential

TWO SETS OF DUMMY VARIABLES

The diagram illustrates the model graphically. Note that the effects of the different components of the model are assumed to be separate and additive in this specification. In particular, we are assuming that the extra overhead cost of a residential school is the same for regular and occupational schools: there is no interaction effect.

SLOPE DUMMY VARIABLES

2

The specification of the model incorporates the assumption that the marginal cost per student is the same for occupational and regular schools. Hence the cost functions have the same slope: the same coefficient on N. This is a restriction we have placed on the model.

-100000

0

100000

200000

300000

400000

500000

600000

700000

0 200 400 600 800 1000 1200 1400

N

CO

ST

Occupational schools Regular schools

SLOPE DUMMY VARIABLES

3

This is not a realistic assumption. Occupational schools incur expenditure on training materials that is related to the number of students. Also, the staff-student ratio has to be higher in occupational schools.

-100000

0

100000

200000

300000

400000

500000

600000

700000

0 200 400 600 800 1000 1200 1400

N

CO

ST

Occupational schools Regular schools

SLOPE DUMMY VARIABLES

5

Looking at the scatter diagram, you can see that the cost function for the occupational schools should be steeper, and that for the regular schools should be flatter. The two lines should have different slopes.

-100000

0

100000

200000

300000

400000

500000

600000

700000

0 200 400 600 800 1000 1200 1400

N

CO

ST

Occupational schools Regular schools

SLOPE DUMMY VARIABLES

We will relax the assumption of the same marginal cost by introducing what is known as a slope dummy variable. This is NOCC, defined as the product of N and OCC.

For example, in the case of an occupational school, OCC is equal to 1 and NOCC is equal to N. The equation simplifies as shown.

8

COST = 1+ OCC + 2N + NOCC + u

Regular school COST = 1+ 2N + u(OCC = NOCC = 0)

Occupational school COST = (1+ ) + (2+ N + u(OCC = 1; NOCC = N)

CO

ST

N

1 +

1

Occupational

Regular

SLOPE DUMMY VARIABLES

The diagram illustrates the model graphically.

10

7

INTERACTING DUMMY VARIABLES

If we interact dummy variables, we get new dummy variables, but we must interpret carefully. The reference category is obtained by setting all dummies equal to zero. Then write down the earnings function for each subgroup separately to make the effects of various coefficients clear.

LGEARN = 1+ 2S + F + W + FW + u

Black male LGEARN = 1+ 2S + u(F = W = 0)

White male LGEARN = 1+ 2S + W + u(F = 0; W = 1)

Black female LGEARN = 1+ 2S + F + u(F = 1; W = 0)

White female LGEARN = 1+ 2S + F + W + FW + u(F = W = 1)

Copyright Christopher Dougherty 2000–2006. This slideshow may be freely copied for personal use.

24.06.06