tesis de magÍstereconomia.uc.cl/wp-content/uploads/2015/07/tesis_wmullins.pdfdocumento de trabajo...

D O C U M E N T O D E T R A B A J O

Instituto de EconomíaTESIS d

e MA

GÍSTER

I N S T I T U T O D E E C O N O M Í A

w w w . e c o n o m i a . p u c . c l

One for the Road - Estimating the Drunk-Driving Externality in Chile

William Mullins.

2004

Economics M.Sc. Thesis – William Mullins – 2nd

Semester 2004

1

One for the Road – Estimating the Drunk-Driving

Externality in Chile

Economics M.Sc. Thesis in Public Policy

William Mullins

Drink-driving is a classic negative externality. Nonetheless, it has failed to attract

economic attention in Chile. This study estimates the relative risk of drunk drivers in causing

serious accidents1, and the aggregate externality generated by drunk-driving in Chile.

As an epidemiological phenomenon drunk-driving warrants attention: between the ages of 10 and

45 it is the joint second highest ranking cause of death for Chileans. Conaset puts the number of

accidents caused by drink-driving (DD) between 2001 and 2003 at 8,137, in which 472 people

died and 2,240 were seriously injured. However, such estimates lack a clear methodological

grounding, confusing the simple presence of alcohol with a causal role. This study aims to

separate alcohol‟s causal effect from the baseline serious accident risk faced by all drivers.

The methodology used in this study is taken from Stephen Levitt and Jack Porter‟s 2001 JPE

article “How Dangerous are Drinking Drivers?” They find that drinking drivers in the US

(including those not legally classified as drunk) are at least 7 times more likely to cause a fatal

crash than sober drivers (θ ≥ 7), while for legally drunk drivers θ ≥ 13. The estimation of a lower

bound for this relative risk, and an upper bound for the proportion of drunk drivers on the roads,

are the parameters estimated in this paper, allowing an approximate calculation of the aggregate

externality caused by DD in Chile.

Levitt and Porter estimate the lower bound of the aggregate US DD externality associated with

lost lives (no other costs are considered) to be around USD 9 billion in 1993. Chile‟s only

approximately comparable estimate is from a study commissioned by the Ministry of Public

1 Serious accidents are defined as accidents that result in at least one death or serious injury.


Semester 2004

2

Works (MOP) from the consulting firm CITRA in 1996. They estimate that the total annual cost

of all road accidents in Chile is around USD 6-700 million, or between 7 and 8 percent of the US

external cost estimate for drink driving alone. This number is used within government as the sole

basis for public investment proposals, thus according it a policy importance far beyond that of

most studies. This paper aims to provide a more rigorous estimate of the external costs that will

allow some perspective as to the magnitude estimated by CITRA.

The study begins with a review of the theoretical issues that bear on drunk-driving, and follows

with a review of the evidence on alcohol and crash risk. Sections 4, 5 and 6 detail methodology,

data and results respectively. Section 7 considers which deaths and serious injuries are rightly

classed as externalities, and calculates the aggregate spillover. Section 8 concludes.


Semester 2004

3

2 – Economic Theory and Drunk Driving

General Considerations

The accident literature in the US and West European countries often prefaces its remarks on

alcohol with comments such as “alcohol consumption is involved in x% of fatal crashes” and

conveys the impression that alcohol causes all accidents it is “involved” in. However, without an

estimate of the number of drunk drivers on the roads, this figure is meaningless – if the same

percentage (x%) of drivers have been drinking then alcohol is no more a crash risk factor than

orange juice. This tendency to demonize alcohol in terms of its crash causation must first be laid

aside if we are to consider objectively the external cost of drink driving in Chile.

Moreover, drinking is only one risk factor among many. As Borkenstein et al. note in their

seminal 1974 study:

“traffic accidents are the result of interactions among drivers, vehicles and the physical environment. No

single cause of traffic accident exists. It is not possible to consider a separate element of the accident

complex in the abstract. These elements operate only in the context of the remaining elements.”2

Speeding is an example of one such “remaining element.” It also increases the relative risk of

crashing, and to an extent comparable to drink driving: “driving 65 mph when the speed limit is

55 mph increases risk of involvement in a fatal crash by a factor of 2.0, similar to the risk

increase associated with driving with BAC = 0.08% compared to driving at BAC = 0.”3

Moreover, drivers below the US legal limit of 0.08% are also extremely dangerous, making up

70% of drivers with a measured BAC in the 2002 US Fatal Accident Reporting System (FARS)

data. In the 2000-2004 Chilean data, approximately 78% of drivers involved in accidents

resulting in serious injuries or deaths are recorded as being sober. While this is an over-estimate,

as will be discussed, most accidents are not caused by alcohol.4 What is also true however, is that

2 p17 Borkenstein et al. 1974

3 L. Evans, Ch 10 (2004). The WHO report (2004) cites similar figures (Ch 3 p77)

4 L. Evans (2004) notes that even if alcohol miraculously disappeared from the roads, 66% of US fatalities in 2002

would remain.


Semester 2004

4

alcohol, and perhaps speeding – uniquely among crash risk factors – are perceived by lawmakers

to be particularly reckless ways to endanger the lives of others, and are directly chosen by the

drivers involved. As a result, the law assigns property rights to sober, non-speeding drivers. Other

significant risk factors such as sex and age also increase relative serious crash risk, with young

men unsurprisingly emerging as the highest risk group: Levitt and Porter report that sober drivers

under 25 years old pose a fatal crash risk 2.78 times greater than sober drivers over 25, while the

comparable sober male-female relative risk is 1.36.

Public policy cannot, of course, focus on removing male drivers from the roads. It focuses instead

on reducing drink driving and speeding. Moreover, alcohol and speeding dwarf other risk factors

in terms of the magnitude of the increase in relative risk they provoke. At the Chilean legal DWI

(Driving While Intoxicated) limit (BAC 0.1%) a driver has a relative crash risk of 4.79, while at

BAC 0.2% it is approximately 82 times that of a sober driver.5

A model of the consumption of risky goods

Thus it should be clear that alcohol is not the sole cause of the devastation often caused by traffic

accidents. Driving is a dangerous activity per se, in the same way that extreme sports are

dangerous activities: they increase the risk of death.

A simple model, developed in Rosen (1981), formalizes how agents determine their optimal

consumption of risky goods (those that increase risk of death) and beneficial goods (reduce risk

of death). Define the probability of surviving a single period as q, and utility conditional on

survival as U(C1,…Cn) for the n available consumption goods. If we consider that consumption

of certain goods (such as drunk driving) can affect survival probability we can write q =

q(a1C1,… anCn), where a1…an are non-negative constants. For a good whose consumption

reduces survival probability the partial derivative qi is negative. If we assume a budget constraint

5 Compton et al. 2002, p42


Semester 2004

5

of j jY p C and maximize expected utility (q(..)U(..)) we obtain the following optimality

condition6:

i ii i

n n

U Pa qV

U P ; V= value of a statistical life

The relevant point here is that the rational consumer „self-regulates‟, in Rosen‟s words. If good i

is drunk driving (assumed to reduce survival probability by increasing crash risk) then qi is

negative, making the entire second term positive. This indicates that the ratio of marginal

consumption utilities must be higher than in the case where consumption of good i does not affect

risk. In short, this model illustrates the fact that a rational agent takes into account all risks to

himself: he consumes less of good i given its negative health effects. What the model omits is the

fact that when this agent crashes while “consuming” drunk driving, the risk is borne in part by his

passengers and himself, and in part by unfortunate pedestrians or occupants of other cars. The

risk to these others constitutes a negative externality, and is not factored into the consumption

decision of our drinking agent.

This model can also be used to highlight the offsetting effects that result from rational

consumers‟ reactions to any change in road safety, something that should not be overlooked in

any cost study such as this. Consider a change that makes driving safer, such as the introduction

of superior safety technology (e.g. crumple zones, airbags)7, or better enforcement of drunk

driving laws. The latter reduces the dangerousness of interacting with drunk drivers at night –

who are present in greater proportions than in daytime hours – and ceteris paribus reduces the

overall risk of night driving. This safer driving environment should induce more night-time

driving. In terms of the model, if night driving is good j, then aj will fall (as night driving is less

dangerous per unit) and the total amount of night driving by sober drivers will rise. If, as has been

supposed, night driving is an activity that reduces the agent‟s survival probability8, then the

6 See Appendix 1 for derivation of this result and for derivation of V, the value of a statistical life

7 Peltzman 1975 notes that “safety regulation has had no effect on the highway death toll…[it] may have increased

the share of this toll borne by pedestrians and increased the total number of accidents” p677 This is because new

safety devices have resulted in responses from drivers – such as riskier or faster driving – almost completely

offsetting the increase in safety brought about by regulation. 8 This is a reasonable assumption: “in times of economic growth, traffic volumes increase, along with the number of

crashes and injuries…reductions in alcohol-related crashes have also been observed to coincide with periods of

economic depression” p72 WHO (2004).


Semester 2004

6

number of fatal accidents resulting from superior enforcement of traffic laws will be partially

offset by more accidents caused by sober drivers‟ increased night driving. Thus any study that

purports to show the lives saved if drink driving were eliminated is implicitly holding offsetting

activity by other drivers to zero, leading to an overestimate of the benefits of such an outcome9.

The Economic Issue

The economic issue at the heart of this paper is the negative accident externality generated by

drink driving (DD). The externality – defined as a net cost to other members of society not borne

by the causing agent10

– results from the higher crash risk of drinking drivers relative to sober

drivers. Crashes often involve third parties (other drivers, passengers, pedestrians) or their

property, and given that the law assigns “property rights” over the road to sober drivers, a higher

crash risk causes a negative spillover effect11

. This is not to say that only drunk drivers crash –

we all face a risk of crashing when driving, a risk that depends on numerous characteristics such

as tiredness, age, experience, and road conditions. This is termed the baseline crash risk. The

negative externality caused by drink driving is the additional crash risk beyond the baseline level.

If drinking drivers do not bear the full cost of their actions (because they are not required to or

cannot fully compensate their victims) then they will choose an individually optimal amount of

drink driving that is excessive (and thus inefficient) from society‟s viewpoint: for their marginal

units of drink driving the cost to society is greater than the benefits obtained by such drivers. In

the Rosen model above this can be seen by noting that the agent considers only the impact that

consuming DD will have on his health, not on others‟. This is the economic reason behind the

legal penalties for drunk driving: an optimal tax reduces the individually optimal amount of DD

9 However, the Rosen model also shows that the marginal willingness to pay for small changes in ai is

i i n

i

dYq CVp

da . Thus even if a complete offset ensures that P(survival) does not change, the willingness to pay

may be positive and large, making the exercise worthwhile. 10

This definition should include the caveat that another agent‟s actions do not constitute an externality if they change

market prices (this is a pecuniary „externality‟ and generates no inefficiency). A related definition links spillovers to

the absence of functional markets. 11

Both drivers are equally responsible for the accident from an economic point of view: were either of them to have

stayed at home, the accident would not have occurred. It is the legal definition of property rights that establishes the

blame with one party; such is the case with alcohol-involved driving.


Semester 2004

7

so that it coincides with the socially optimal amount, by forcing the driver to internalize the costs

of his dangerous driving.

Its external effects notwithstanding12

, drink driving - like speeding - creates private benefits for

the drivers involved because its avoidance can be costly in terms of time or money. Indeed, this

discussion is not intended to make the point that drunk driving should be eliminated: it is possible

that the socially optimal amount of drink driving is not zero – and the fact that the legal BAC

limit is 0.05 and not zero is a testament to this fact13

.

Does Insurance make a (theoretical) difference? Do Private Lawsuits?

The issue of insurance is important and must be considered: if a person injured by a drunk driver

is insured then does an external cost exist? The answer: almost certainly, as only if the drunk

driver is successfully sued by the victim will the externality be fully eliminated. A system with

perfectly defined and enforced property rights would ensure this, but as most authors consider

that the probability of a successful private suit is low in the US, it can be confidently assumed

that it is even lower in Chile. Moreover, even in the most favourable case in which the private

suit is successful and substantial damages are awarded, it seems unlikely that any financial

compensation can fully restore the utility lost by dying – the agent himself has disappeared. If the

basic unit of society is held to be the household the question becomes: can money fully replace a

lost family member? While the answer depends on the dead individual, some uncompensated

external cost must surely remain, whatever the payout.

Another relevant limit to the role of private legal suits is that the wealth level of the driver is a

binding upper limit to judicially dictated compensation. Given that most plausible estimates of

the statistical value of life in Chile range from 0.3 to 1.4 million USD, the average drinking

driver is in no financial position to fully compensate the victim(s).

12

Driving involves continuous interaction with other drivers, making it rife with non alcohol-related externalities,

most notably congestion and accident spillovers. These refer to the fact that an additional driver adds to the overall

congestion level, increasing the travel time of all drivers, while also increasing the general accident risk. 13

If the optimal amount of DD is in fact zero then any tax above the marginal damage that DD causes will attain the

optimal internalizing outcome.


Semester 2004

8

In short, insuring victims does not solve the problem posed by the DD externality, as the social

cost of the activity remains above the social benefits despite the existence of insurance14

.

Moreover, the possibility of effective private lawsuits does not provide the necessary deterrence.

Hence, a potential drunk driver may be under-deterred by such a system.

Optimal Law Enforcement

The „optimal tax‟ that internalizes the spillover is a deceptively simple term for what is in fact a

complex instrument made up of 2 broad policy tools: the penalty paid when an offender is

apprehended and the probability of detection or apprehension.

The standard textbook solution to a negative externality is the Pigou tax, in which the probability

of detection of an “offence” (p) is approximately one and the optimal fine (penalty) that offenders

face is equal to the marginal damage caused by their actions. However, in the real world, the cost

of a p approximately equal to one is likely to outweigh the damage done by DD: it would require

huge expenditures on police and surveillance equipment, and severe violations of individual

liberties.

The economic theory of law enforcement (see Polinsky & Shavell, 2000) makes use of an

intuitive and simple result: for risk neutral agents a combination of a high p and a low penalty

(assume it is a fine) results in the same level of deterrence as a low p, high fine combination. As

it is costly to catch offenders (i.e. Drinking Drivers, DD) with a high probability then the latter

combination is more cost effective way to generate deterrence.

Moreover, deterrence in this context is exactly what is required, as if it is set at the right level it

makes the expected penalties the DD will have to pay equal to the harm caused to society by their

externality. If we define F*RN as the optimal fine for a risk neutral DD, and h as the harm he

does to society, then the following equation illustrates the efficient solution in a static context:

p F*RN = h i.e. F*RN = h/p

14

Moreover, liability insurance for drivers removes even the slight deterrent effect of possible lawsuits from victims.


Semester 2004

9

An accident caused by a DD also causes costs to society as a whole, such as the (judicial and

police) costs of imposing the fine (k) and those of investigating and prosecuting the accident (s).

Moreover, given that many cases do not result in fines, we must also include the probability that

a fine will be imposed as a result of the prosecution stage (q). Incorporating these costs to the

model results in a new, larger optimal fine, as drunk drivers also generate these costs in addition

to the direct externality:

F* RN = (h/pq) + (s/q) + k

However, if p were slightly reduced from the level that generates the equality above, then no first

order social costs would ensue, as the marginal drunk drivers induced to drive because of the

change generate only slightly higher social costs than benefits. The advantage of reducing p

however, is that enforcement costs are be saved. Thus, with costly enforcement, some under-

deterrence is optimal (i.e. in the simple version p F*RN < h). How much p should be lowered

depends on the balance of savings in enforcement in comparison to the costs of under-deterrence.


Semester 2004

10

3 – How Alcohol Affects Driver Risk – a Review of the Literature

The effects of alcohol on drivers can be usefully divided into three main categories: survivability,

performance, and behaviour15

. Survivability refers to the fact that vehicle occupants with positive

BAC are more likely to die from the same physical impact than occupants with zero BAC.16

As

this only affects drinkers, it does not constitute an externality. The second effect – performance –

refers to the functioning of driving relevant skills under the influence of alcohol. There is little

room for doubt, after hundreds of laboratory experiments that alcohol reduces driver performance

in terms of coordination, reaction time, spatial orientation and other relevant skills, and that it

does so to an increasing extent as BAC rises17

.

The third category of effect produced by alcohol is that of a detrimental effect on driver

behaviour. It appears that drinking, by reducing social inhibitions, also encourages more

aggressive and riskier driving. It is an empirical regularity that alcohol is present in an increasing

fraction of drivers as severity increases. This suggests that it contributes most to severe crashes.

In the Chilean 2000-2004 data (which significantly under-reports alcohol involvement as will be

shown later) this relationship can be observed in the following table:

Source: Carabineros data 2000-Sept 2004 % of Crashes and Collisions involving at least 1 drinking driver

Crashes and Collisions involving:

all outcomes: injuries & no injuries 6.7%

any kind of injury 9.0%

medium severity injuries or worse 11.9%

serious and fatal injuries only 13.4%

fatal injuries only 18.1%

As noted by Evans, this relationship suggests that alcohol‟s most salient effect is in changing

driver behaviour towards taking greater risks and driving at higher speeds. If the performance

effect were most important, then drinking drivers„ increased driver error would be present at all

crash severities, and alcohol prevalence would not increase with crash severity. “It appears that

15

I owe this structure to Evans, op.cit. 16

According to Evans (ibid), a vehicle occupant with a 0.08 BAC is 73% more likely to die from the same crash than

one with zero BAC. 17

Moskowitz et al. (2000) have demonstrated that alcohol significantly affects some driving skills for some subjects at BACs as low as 0.02%


Semester 2004

11

drivers do things when they are drunk that they would not attempt when sober, rather than merely

executing poorly the same things they would do more skilfully when sober.”18

Thus behavioural

effects, although neglected by the literature, appear to be a key factor behind the observed BAC–

crash relationship in accident data19

.

Is the alcohol-crash association due to other factors correlated with drinking?

The possibility that alcohol is not the causal agent behind actual (as opposed to laboratory

simulated) crashes has yet to be fully discarded. It is possible that a high level of relative risk for

drinking relative to non-drinking drivers could result from an association between drinking

behaviour and other dangerous driving characteristics. That is, dangerous drivers tend to drink,

but drinking itself is not what causes increased crash risk. While implausible in the light of

laboratory and behavioural evidence cited above, such a possibility must be discarded if we are to

have confidence in the analysis undertaken by this paper, given that additional driver

characteristics (beyond age, sex and licence type) are not collected in Chile, and thus cannot be

controlled for. To this end the epidemiological literature is briefly reviewed to make manifest the

causal effect of alcohol on crash risk in real driving situations

The Three Main Methods of Determining the Contribution of Alcohol to Crash Risk

A – The Case – Control method

Three main methods exist for determining alcohol crash risk. The first is the case-control

method20

, in which the BAC levels of drivers involved in traffic crashes are compared with those

of a control group of drivers matching the accident drivers as closely as possible. From a

comparison of these groups – and controlling for possibly confounding covariates – a relative risk

(RR) curve is estimated:

18

Evans, op cit. 19

Evans notes that much work remains to be done regarding the effect of alcohol on speed choice, citing an Australian study that found that drivers with BACs over 0.05% were driving faster when apprehended. 20

A more detailed review of this method is provided in Appendix 2.


Semester 2004

12

Compton et al 2002 covariate adjusted Relative Risk curve

0123456789

10111213

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3

BAC

RR

of

cra

sh

in

vo

lve

me

nt

If US drivers have a similar RR to Chilean drivers, then at the Chilean DUI limit of 0.5 BAC the

RR is 1.38, while drivers are 4.79 times more likely to cause a crash (of any severity) at the DWI

limit of 1.0 BAC21

.

B – Combining Accident data with Roadside Surveys

A similar but more general methodology is that which uses national accident data (such as the

Carabineros data used in this study, or the US FARS data) and a roadside survey to provide

exposure data (e.g. Zador 1991). The key differences between the case control and the roadside

survey-FARS methodologies are in the selection of control drivers and their „representative-ness‟.

Clearly the case control studies have a more reliable measure of exposure, as the drivers in their

control groups are driving in exactly the same place as the accident drivers. However, in their

precision lies their weakness: they may simply (albeit accurately) reflect local driving. Moreover,

using national data rather than data from a single locality may be more reliable in that it can work

with degrees of freedom orders of magnitude above those available in case control studies.

The most recent of such studies in the US is Zador et al (2000), which uses FARS data and the

1996 National Roadside Survey. Their results are somewhat above the case control studies, and

21

We cannot be sure that they are: a relative risk curve compares sober to drunk drivers. Chilean drivers may be very

different - both when sober and when drunk, and to different degrees – to US drivers.


Semester 2004

13

are somewhat suspect as a result: the major case control studies have obtained similar relative

risk curves, suggesting that the RR curves they generate are more worthy of confidence.

C – The Levitt-Porter Methodology

The third and final methodology is that used by Levitt and Porter, in which no separate set of

control data is used. Instead, the proportion of drivers in each group is estimated once the RR (θ)

has been estimated. Identification strategy is discussed more closely in the methodology section.

Levitt and Porter note that the case-control studies fit well with their results given reasonable

distributions of BACs across drinking drivers.

This methodology is unable to separate RR by fine gradations of BAC because of degrees of

freedom restrictions in smaller countries such as Chile, and because the BAC level is often not

exactly recorded. This methodology has not, to my knowledge22

, been applied elsewhere. Hence

only their results for the US, as cited above, are available for direct comparison. Moreover, no

estimates for Chile are available, either of the externality, or of the relative risk of drinking as

opposed to sober drivers – or of any other RR curve.

22

No relevant papers cite the Levitt and Porter paper according to IDEAS.


Semester 2004

14

4 - Methodology

The Levitt-Porter model‟s first virtue is the minimal data required for its estimation. All that is

required is fatal crash data such as the date, time, type of accident, alcohol-involvement (i.e.

whether drivers had been drinking), and the number of deaths and injuries. At first, inferring both

the relative risk θ, and N from this data alone appears impossible. As the authors note:

“Separately identifying the fraction of drinking drivers on the road and their relative risk of a

fatal crash using only the fraction of drinking drivers in fatal crashes is ostensibly equivalent…to

separately identifying per capita income and population on the basis of only aggregate income

data.”23

They achieve the apparently impossible by recognizing that for 2 car crashes (hereafter called

collisions) the relative frequency of accidents involving 2 sober, 2 drinking or 1-drinking-and-1-

sober drivers contains enough information to estimate θ. The idea is that the number of fatal

collision opportunities is given by the trinomial distribution – equivalent to randomly drawing

coloured balls from a bag. If drinking does not increase RR then the crash data will closely mimic

the trinomial distribution of crash opportunities. If the data for actual crashes differs from that

given by crash opportunities (trinomial distribution) we are able to identify how much more

dangerous drinking drivers are (θ) as N can be eliminated from the model. Once we have θ, then

we are able to obtain N.

The basic assumptions of the Levitt-Porter (2001) model24

Notation:

Ni = the total number of drivers of type i

I = an indicator variable equal to 1 if two cars interact25

A = an indicator variable equal to 1 if two cars collide, resulting in a fatal crash

P(i,,j | I=1)= Probability that the drivers are of type i and j, given that an interaction takes place.

Assumptions

23

Levitt and Porter (2001), p1199-1200 24

This section follows the Levitt and Porter 2001 paper very closely, for obvious reasons. A few sentences are taken

verbatim and are not explicitly quoted to avoid uninteresting footnotes. 25

An interaction is a 2 car crash opportunity: 2 cars pass on the street. Given an interaction, a single driver error can

cause a collision


Semester 2004

15

1. There are 2 driver types: D and S

a. This is easily generalized to more types

b. Thus ND + NS = NTotal

2. There is equal mixing of D and S drivers on the roads, i.e.

a. The number of interactions a driver has with other cars is independent of the driver‟s type:

( | 1)( )

i

D S

NP i I

N N

b. A driver‟s type does not affect the composition of the driver types with which he interacts:

( , | 1) ( | 1) ( | 1)P i j I P i I P j I

3. A fatal crash results from a single driver‟s error

4. The composition of driver types in a crash is independent of the composition of driver types in other

crashes

5. A drinking driver is at least slightly more likely to make an error resulting in a crash than a sober

driver, i.e. θD > θS

The assumption doing the most work is assumption 2: equal mixing of sober and drunk drivers.

Over a small enough area and time period, it is reasonable. Over an entire country and year it

becomes less so. Assumption 2 gives the joint distribution for a pair of driver types, conditional

on an interaction between two drivers:

2( , | 1)

( )

i j

D S

N NP i j I

N N

(1)

Assumption 3 implies that the likelihood of a fatal crash is the sum of the probabilities that either

driver makes a fatal error, minus the probability that both drivers make a mistake. The latter

probability is extremely small, and is ignored:

( 1| 1, , ) i j i j i jP A I i j (2)

Developing the model

Multiplying equations (1) and (2), we obtain the joint probability of driver types and a fatal crash

conditional on an interaction between two drivers is as follows

2

( )( , , 1| 1)

( )

i j i j

D S

N NP i j A I

N N

(3)


Semester 2004

16

In words, given that two random drivers interact, the probability that a fatal crash occurs and that

the drivers involved are of the specified types, is simply equal to the likelihood that two drivers

passing on the road are of the specified types multiplied by the probability that a fatal crash

occurs when these drivers interact.

The key relationship we seek is the probability of driver types conditional on a fatal accident

occurring, rather than conditional on an interaction taking place. That value can be obtained from

equation (3) through an application of Bayes‟ Theorem (dropping the I=1 condition):

From the definition of conditional probability ( , , 1) ( 1) ( , | 1)P i j A P A P i j A - and we want

to isolate the final term. To this end note that,

( 1) ( 1| , ) ( , )i j

P A P A i j P i j , i.e. P(A=1) is

simply equation (1) in all possible combinations of i and j.

Thus,

,

( 1, , )( , | 1)

( 1| , ) ( , )i j

P A i jP i j A

P A i j P i j

and we obtain:

2 2

( )( , | 1)

2[ ( ) ( ) ( ) ]

j i j i

D D D S D S S S

N NP i j A

N N N N

(4)

Let Pij represent the probability that the drivers are of type i and j given that a fatal crash occurs.

We can explicitly state the values of Pij by simply substituting for i and j in equation (4):

2

2 2

( )( , | 1)

( ) ( ) ( )

D DDD

D D D S D S S S

NP P i D j D A

N N N N

(5)

2 2

( )( , | 1) ( , | 1)

( ) ( ) ( )

D S D SDS

D D D S D S S S

N NP P i D j S A P i S j D A

N N N N

(6)

2

2 2

( )( , | 1)

( ) ( ) ( )

S SSS

D D D S D S S S

NP P i S j S A

N N N N

(7)

Note that the ordering of the driver types does not matter. Thus, in equation (6) the probability of

a mixed drinking-sober crash is the sum of the probability that i is sober and j is drinking plus the

probability that j is sober and i is drinking.


Semester 2004

17

Examination of equations (5)-(7) reveals that there are only three equations, but four unknown

parameters (θD, θS, ND, NS). As a result, all four parameters cannot be separately identified - only

the ratios are identifiable. Therefore, let θ=θD /θS and N=NS /ND. θ is the relative likelihood that a

drinking driver will cause a fatal two-car crash compared to a sober driver, and N is the ratio of

sober to drinking drivers on the road at a particular place and time. Dividing both numerator and

denominator of equations (5)-(7) by 2

1

S SN expresses them in terms of θ and N as follows:

2

2( , | )

( 1) 1DD

NP N A

N N

(8)

2

( 1)( , | )

( 1) 1DS

NP N A

N N

(9)

2

1( , | )

( 1) 1SSP N A

N N

(10)

The next step is to derive the likelihood function.

Aij is defined as the number of fatal crashes involving one type j and one type i driver. Given

assumption 4 (independence across fatal collisions) and the total number of fatal crashes the joint

distribution of driver types is given by the trinomial distribution:

( )!( , , ¦ ) ( ) ( ) ( )

! ! !DS SSDD A AADD DS SS

DD DS SS TOTAL DD DS SS

DD DS SS

A A AP A A A A P P P

A A A

(11)

Substituting PSS, PDD and PDS into the equation above produces the likelihood function, and the

equation is estimated by maximum likelihood, with the following fairly self-evident result:

DD DS SSP ; P ; P

DS SSDD

TOTAL TOTAL TOTAL

A AA

A A A

Levitt and Porter then take advantage of the fact that in the binomial distribution ADS2 is in fixed

proportion to the product of ADD and ADS to eliminate N from the equation:

2 2 2

2

( ) ( 1) 12DS DS

DD SS SS DD

A P N

A A NP P

(12)


Semester 2004

18

Thus we can estimate θ from the observed distribution of crashes ONLY. Defining 2

DS

DD SS

A

A A as R,

and multiplying by θ we obtain a quadratic equation:

2 (2 ) 1 0R . If R = 4 then θ = 1, that is, the observed distribution of collisions matches

the distribution of fatal crash opportunities: drinking does not affect driving. If R4 then 2 solutions always exist, one with θ1. By

assumption 5 the former is discarded, and we have an estimate of θ.

Estimating N

One car crashes are incorporated into the model, but their identification depends on having first

estimated N (the proportion of drinking drivers) from collisions. N is estimated from the

following FOC of the likelihood function, substituting the ̂ estimated above for θ:

[ ( ) ]1

1[ ( ) ]

1

DS DD

DS SS

A A

N

A A

(13)

Standard errors are derived using the delta method, as described in Appendix 3.

Estimating the RR of one car crashes, λ

Lambda (λ), or the relative risk of one car crashes for drunk drivers (analogous to θ), is estimated

in similar fashion. Let QD and QS denote the probabilities that a drunk or a sober driver is

involved in a given one car crash:

( | 1)

j j

j

D D S S

NQ P i j Crash

N N

; with j=D,S (14)

We can define λ as λD/ λS and equation (14) for both j= D and S can be combined to give:

D

S

QN

Q , while both QD and Qs are obtained from the accident data. As can be observed,

lambda can be estimated only by using the estimator of N.


Semester 2004

19

Violations of the assumptions

The next step is to consider violations of the assumptions. The key point is that violations of the

assumptions generate downward biases for θ, making it a reliable lower bound. Possible

violations are discussed in more detail in Appendix 4, but violations of Assumption 2 are

sufficiently important to merit consideration here.

Relaxing the Equal Mixing assumption

The model requires an equal mixing assumption (Assumption 2), which holds that over a given

geographical and temporal area (for example on weekend nights in Santiago) drivers of both

types are homogeneously distributed. This assumption, as you may recall, has two parts:

a. The number of interactions a driver has with other cars is independent of driver type:

( | 1)( )

i

D S

NP i I

N N

A2(a)

b. A driver‟s type does not affect the composition of the driver types with which he

interacts: ( , | 1) ( | 1) ( | 1)P i j I P i I P j I A2(b)

Combining the two results in equation (1), as noted earlier: 2

( , | 1)( )

i j

D S

N NP i j I

N N

A2 is a demanding assumption: we must consider only very small space time areas to be sure of

its holding, as driving conditions change from hour to hour, and between neighbourhoods: the

Suecia district on a Saturday night between 4 and 5 am is very different to the Alameda area on a

weekday at 9 pm – it is likely that Suecia would have a far higher share of drunk drivers, as

would a night time period as opposed to a daytime period, for example. As Borkenstein et al note:

“within a fairly short period of time (not exceeding one hour) the driving conditions or exposure

for a driver using a specific “block” tended to remain relatively constant.”26

Thus ideally the

model should be estimated for 50-60 minute periods, over a 5-10 city block area – something that

real world data simply cannot make possible.

26

Borkenstein et al. (1974) p 22


Semester 2004

20

Thus we can be relatively sure that A2 will not hold perfectly in the data. That is, drunk drivers

will be somewhat concentrated in certain space-time regions (near bars and at night), and the

same will hold for sober drivers in other areas. This results in a lower number of drunk-sober

interactions (crash opportunities) than predicted by the model under the equal mixing assumption,

and a higher number of same type (DD and SS) interactions, exerting a downward bias on our

estimate of θ (R falls) and an upward bias on our estimated N (as more DD crashes occur, and θ

is lower, N must rise). As the units of temporal and spatial analysis shrink, violations of this

assumption will be less severe, and the downward bias it exerts on estimates of θ should fall.

This prediction is borne out by Levitt and Porter‟s results. Applying their model to US fatal

accident data assuming equal mixing over the whole US for their entire sample (1983-93, 8pm-

5am) they estimate θ = 3.79 (s.d. = 0.14). As they reduce the space-time areas over which they

assume equal mixing – weakening the bias caused by violation of A2 – their estimated θ rises to

7.51.

Levitt and Porter suggest amending the model to allow an increased probability of same-type

interaction, while still maintaining the reasonable assumption that the number of interactions a

driver type has is proportional to the percentage it makes up in the overall driver population, i.e.

that: ( | 1)( )

i

D S

NP i I

N N

should still hold

27. This does not wholly eliminate the problem

posed by the violation of the equal mixing assumption, but it does allow a reduction in the

downward bias it produces.

They do not explicitly solve the amended model. They simply note that for Δ = 0.1 (a 10%

increase in DD interactions), their estimates of both one and two car relative risks rise by

approximately 25%.

27

This is a reasonable requirement, as otherwise we world be requiring that either drunk or sober drivers have

proportionately more interactions than the other type i.e. that one type passes more cars than the other by driving in

more congested areas or simply by driving longer distances. Such a requirement would necessarily be entirely

arbitrary.


Semester 2004

21

Developing the Model with Unequal Mixing

Define the parameter representing the increased probability of a Drinking-Drinking (DD)

interaction as Δ. Thus equation (1) changes from:

2 2

2 2( , | 1) D D

D S Total

N NP D D I

N N N

to:

2

2

1( , | 1)

D

Total

NP D D I

N

(15)

Similarly, for Sober-Sober (SS) and Drunk–Sober (DS) interactions:

2

2

1( , | 1)

S

Total

N xP S S I

N

(16)

2

1( , | 1)

D S

Total

N N zP D S I

N

(17)

The amended model has 3 additional parameters (Δ, x and z), as described by equations (15-17).

x and Δ should be positive, reflecting the fact that the clustering of same-type drivers leads to

more same-type interactions and z should be negative. To solve for these in terms of N, θ and Δ

we must impose the condition mentioned earlier regarding P(i | I=1). This term reflects the

probability that an interaction involves a driver of type i and can be expressed as:

( | 1) ( , | 1)j

P i I P i j I . Thus:

( | 1) ( , | 1) ( , | 1)P D I P D S I P D D I (18)

( | 1) ( , | 1) ( , | 1)P S I P S D I P S S I (19)

Substituting (i) , (ii) and (iii) in (iv) and (v) along with A2(a) yields:

2x N

z N

Solving the model with these modifications produces an equation for R in which N does not

cancel out, unlike the standard model:

2 2 2 2

2

( 1) (1 ) ( 1) (1 )

(1 )(1 ) (1 )(1 )

z NR

x N

(20)

Define 2

2

(1 )

(1 )(1 )

NB

N


Semester 2004

22

Then:

2 2 2

2

2

( 1) (1 ) ( 1)

(1 )(1 )

(2 ) 0

NR B

N

B B R B

Recalling that the value of R is obtained from the dataset:

2 2( 2 ) (2 ) 4

2

R B B R B

B

(21)

Thus θ cannot be estimated without a value for N and Δ, as these are components of B.

Identification of delta from the model is impossible, and a value for this parameter must be

assumed. However, N was estimated using the following iterative procedure.

First, equation (20) was solved for an arbitrary value of N28

and a given delta, obtaining an

estimate of θ ( î ). Then, using î , N was estimated using equation (13) (unchanged by the

amendments to the model):

[ ( ) ]1ˆ

1[ ( ) ]

1

DS DD

DS SS

A A

N

A A

(13)

This estimate for ˆ iN was then used to re-estimate θ using equation (20), resulting in 1î , and the

whole procedure was repeated until î converged, with this final î used to obtain a final

îN .

28

The initial value of N makes no difference to the final estimates of N and θ at any relevant level of accuracy.


Semester 2004

23

5 - The Data

Two complementary datasets are used in this paper. The first is the Carabineros de Chile dataset

on all reported traffic accidents in Chile from 1 January 2000 to 30th

September 2004. Earlier

data either does not exist or is of no use as drinking status and rut identification numbers were not

recorded.

Restricting the Dataset

Because our interest is in drunk-driving we limit the sample to the period in which drunk driving

is most common: night-time driving. There are three reasons for this decision. The first is

consistency with the literature: for this study to be comparable to the relative risk estimates that

have gone before, the focus must be exclusively on night driving. All the studies – without

exception – referred to in the review of the literature focus on night driving, including Levitt and

Porter29

. Some – unlike this study – narrow the focus further, concentrating exclusively on

weekend nights30

, when the concentration of drinking drivers reaches its peak. Nonetheless, a

relative risk curve based solely on night driving raises the suspicion that such a curve might not

be readily applicable to drunk driving in the daytime.

This is a serious issue, and one that admits of no easy solution, as during the day the proportion

of accident-involved drinking drivers fall dramatically, as it does on weekdays, making

estimation difficult outside of these timetables:

Occurs: at night in daytime on a weekend on a weekday

% of accidents

involving at least 1 DD 18.6% 2.7% 11% 3.2%

Source: Carabineros data 2000-2004

As can be seen from the table, during the day on a weekday there are proportionately very few

drinking drivers. In fact, in the whole dataset Carabineros recorded only 3 Drinking-Drinking 2

29

Levitt and Porter (2001) p 1213 “we limit our sample to those hours (8:00 p.m – 5:00 a.m.) in which drinking and

driving is most common” 30

see Lund & Wolfe (1991) and Zador (1991)


Semester 2004

24

car collisions over the 2000 - September 2004 period. Thus, the second reason for this restriction

is that estimation is virtually impossible without it.

The third and final reason for this restriction is that the night is essentially a distinct driving

environment to the daytime, with a much lower traffic density, different traffic light settings, and

a different accident distribution. This is reflected in the fact that only 14.7% of serious accidents

occur in the daytime, as opposed to 25% at night31

.

To avoid arbitrarily choosing at exactly what time “the night” begins and ends, the UOCT

(Unidad Operative de Control de Tránsito) for the metropolitan region provided an average

timetable for the night programming of traffic lights, which are based on detailed traffic flow

studies. This is designed to capture the marked change in traffic conditions from night to day, and

provides a useful reference point. Interestingly, the UOCT timetable closely resembles the

timetable that maximizes the proportion of drinking drivers.

Metropolitan Region The rest of Chile

UOCT definition

Weekdays 2300 – 0630 2130 - 0730

Friday night - Saturday 2300 – 0900 2130 - 0900

Saturday night - Sunday 2200 – 1000 2100 - 1000

Sunday night - Monday 2100 – 0630 2100 - 0730

Definition used in this study

Weekdays 2300 - 0630 2200 - 0700

Friday night - Saturday 2300 - 0900 2200 - 0900

Saturday night - Sunday 2300 - 1000 2200 - 1000

Sunday night - Monday 2200 - 0630 2200 - 0700

The definition used in this study is slightly different to both the UOCT timetable and the

timetable that maximizes the proportion of drunk drivers in the accident data. The UOCT

timetable has the virtue of capturing a large proportion of drivers, while the maximizing timetable

ensures a high proportion of drinking drivers in the data. The definition used in this paper was

chosen as a middle path between the two: it stays close to the UOCT timetable (capturing more

31

The last two figures exclude non passenger vehicles and buses – the final restriction discussed.


Semester 2004

25

drivers than the maximizing timetable), while also ensuring a high proportion of drinking

drivers.32

. As can be observed from the table above, differences between the definition used and

the UOCT timetable are slight.

The final restriction on our dataset involves excluding trucks, buses, tractors, motorcycles and

bicycles. In short, we examine only car crashes. The logic for this restriction is direct: any crash

involving such a non-car road vehicle is extremely likely to be serious, due to the mass (either

high or low) of the vehicles involved. Thus for these vehicles almost every crash is serious

(exaggerating somewhat), making serious crashes much more numerous and much less related to

poor driver performance such as that caused by alcohol. Moreover, drivers of such vehicles

(excluding bi- and motor-cycles) represent a sub-group of the driver population with entirely

distinct characteristics: they are likely to have more driving experience, drive far greater

distances and for different purposes, and value their licences more highly – hence they are less

likely to be drunk. As this study aims to find a RR for the general driver population, this sub-

group represents a confounding influence and is removed33

.

An examination of the effects of these restrictions on the data is available in Appendix 5.

The Alcohol Involvement variable

The measure of alcohol involvement is classified in the following categories:

This table also shows the BAC levels

that correspond to each category in

theory. In practice, the relatively few

breathalyzer units [220 in the whole

country, 59 in MR] possessed by

Carabineros de Chile are rarely on hand

for an exact measure. Instead, the officer‟s assessment is used to gauge impairment – the same

measure used by Levitt and Porter for their main results. It is likely that this measure is biased,

32

While the definition used is a middle path between the two, using the UOCT timetable hardly changes the RR

estimates, while the max DD timetable raises them slightly. 33

Levitt & Porter and all other studies also eliminate such drivers, even eliminating taxis, which is not done in this

study, although this makes little difference to our results.

Alcohol Involvement measure thresholds

Physically unimpaired BAC = 0

Physically deficient driving condition 0 < BAC < 0.5

Under the influence of alcohol (DUI) 0.5 ≤ BAC < 1.0

Intoxicated (alcohol DWI) BAC ≥ 1.0

Under the influence of drugs -


Semester 2004

26

although the extent and direction of the bias is impossible to predict a priori: for example,

Carabineros could classify all drivers with traces of alcohol on their breath as drunk, or

systematically sub-report alcohol involvement as Lund and Wolfe (1991) report.

A more accurate measure of alcohol involvement is obtained from the blood samples that are

legally required for all drivers after every accident involving serious injuries. These are carried

out by the Metropolitan Region‟s Servicio Médico Legal (SML) or State Coroner, and regional

SML‟s or ad hoc coroners in more remote areas. The Metropolitan Region (MR) SML covers

approximately 35% of all accidents34

, and over 70% of random breath test samples. It is the MR

SML‟s database of rut and BAC data covering the period 1st January 2000 to 30

th December 2003

that is used in this paper.

Combining Carabineros and SML data

As the SML database contains only the date the sample was received, BAC and rut, and includes

many random breath test (i.e. not from an accident) samples, combining it with the Carabineros

database is essential: it ties the BAC measures to specific accidents and drivers. However, for a

successful match to occur many not inconsequential hurdles must be surmounted. Firstly,

individual rut data must be correctly collected and correctly entered into the database. For

Carabineros and the SML each of these tasks is a formidable obstacle35

. For example, in the case

of data entry many illegible numbers result in 8,7 or even 3 digit rut identification numbers in the

database instead of the required 9 digits. Furthermore, poor transportation ensures that many

blood samples are broken in transit, resulting in congealed samples and hence missing BAC

measures.

In addition to unusable or missing data, the date of the accident and the date the BAC sample was

received by the SML are rarely exact matches. This is because the date the sample was received

or taken is often the day after the accident, given that many such accidents occur before midnight.

34

That is, they measure BAC for all dead traffic victims (drivers, passengers and pedestrians) and drivers‟ BAC in

all accidents involving at least some degree of serious injury. Percentages are from SML data for 2000 and 2001. 35

The fact that the rut identification number is 9 digits long, and must be deciphered from an often illegible

(according to the SML Statistics department) hand-written accident or sample data form is especially conducive to

error. Outdated, non 21st century compliant or error ridden database software at both the SML and Carabineros de

Chile is an additional hindrance.


Semester 2004

27

Moreover, if the crash victim dies, then several days may elapse before the blood sample arrives

at the SML. This is especially true of cases in which the crash victim dies after days or weeks in

hospital. In these cases it is usual to have 2 BAC samples: one taken relatively soon after the

accident occurred, and one taken near the time of death. The former is preferred in the measure

used in this paper, for obvious reasons.

Thus it should be clear that formidable missing or corrupted data problems hinder a simple and

direct merging of the datasets. Moreover, exact date matching cannot be done because of the

nature of the data. The final combination of the two databases was the result of a two stage

process. The first used the following match criteria: an exactly matching rut and a match on the

month and year of the accident. Thus, for a BAC sample to be matched to a particular driver, the

SML and Carabineros databases‟ recorded ruts had to be the same and both the blood sample and

the accident had to be recorded on the same month of the same year.

There is an obvious source of error: that the same person might have been involved in two

accidents in the same month and year, and that the blood sample is being associated with the

wrong accident. This is certainly possible, and appears to occur remarkably frequently in the

dataset. To tackle this, those records in the Carabineros dataset that have more than a 3 day lag

between the date of the accident and reception of the sample are removed from the “matched”

group, to make it extremely unlikely that they do not come from the same accident.

Another source of error is that by combining the databases by month, we are ignoring all crashes

occurring near the end of each month, in which the sample was taken or received a few days later,

in the following month. To this end a second matching process is used, attempting to match the

Carabineros data with SML data using the same criteria as before, but changing the month in the

SML data to the month before. This second stage uses only records not matched in the first stage

from both databases to avoid re-assigning a blood sample that has already been matched to an

accident36

.

36

More refinements to the matching process were in fact used, such as making the match conditional on

driver/passenger/pedestrian status, and only using records in the second stage within a 2 week period of each other.


Semester 2004

28

This discussion has two main aims. Firstly it has highlighted the difficulties inherent in working

with multiple databases, and the care that has been taken in circumventing them. It is highly

unlikely that matches between SML and Carabineros data that result from this procedure are

spurious, as they have the same recorded rut (a nine digit code!), occur in the same month and

year and the dating discrepancy is at most 3 days for non-fatal blood samples. Nonetheless,

mismatches are possible. No matching process can realistically claim otherwise, given the data.

However, the likelihood of the same driver being involved in two crashes within 3 days of each

other, and having the wrong crash associated with a blood sample is extremely low, making the

error safely negligible.37

The second aim of the discussion is to make clear why the match rate of drivers to SML records

is 25% of all drivers, making up approximately 40% of MR drivers. This figure, although

apparently low, is made up of matches that are to be trusted. Relaxing any of the matching

criteria increases the proportion of records matched, but to the detriment of accuracy. Moreover,

it must be considered that blood samples are taken only in accidents involving serious injuries,

which make up 28.5% of all accidents. Thus the figure of 62,144 uncorrupted blood samples,

although apparently low, makes up a much larger proportion of drivers involved in serious

accidents.

The final numbers of matched driver records from the databases are the following:

Sample: 2000-3003 number % of all drivers % of MR drivers

Total matched drivers 63,967 25.0% 41.9%

Total matched drivers with uncorrupted BAC data 62,144 24.3% 40.7%

Biases in the Data

SML data

37

Despite a visual inspection of over 1000 matches I found no obvious discrepancies in SML and Carabineros

recorded dates. A separate error involving duplicate assignment of a single BAC to multiple crashes, though

unavoidable, has been identified and eliminated almost completely from the results presented. Any further effects

from this problem are negligible.


Semester 2004

29

The fact that we have reliable (i.e. SML) alcohol involvement measures for only a quarter of

crash-involved drivers (all of which are in the metropolitan region) means that the sample is not

directly representative of the country as a whole in one crucial sense: it is highly skewed towards

urban crashes. Only approximately 5% of metropolitan region crashes are classed as having

occurred in rural areas – the same percent as in the sample.

Along other dimensions of the data there is no reason to suspect a systematic bias in the sample,

given that it is composed of those observations that are free from data collection and entry errors

in both databases. By way of example, the number of observations by year that are included in

the sample indicates no bias:

Year 2000 2001 2002 2003 Total

Un-matched drivers 44833 50256 46485 50417 191,991

matched drivers 14375 15,381 17020 17191 63,967

% of year's data 24.3% 23.4% 26.8% 25.4% 25.0%

Along other observable dimensions such as sex38

there are no clear biases, making it possible to

cautiously infer that inclusion in the sample is effectively random within the urban subset of the

data.

The urban bias cannot be directly corrected. However, for the purpose of extrapolating the urban

SML sample to rural areas it is necessary to prove – at the very minimum – that rural crashes are

at least as likely to involve drinking drivers as urban crashes. This is in fact the case: Carabineros

records indicate that 6% of urban crashes involve at least one drinking driver, as opposed to 11%

of rural crashes. Moreover, the percentage of drinking drivers in severe urban car crashes is 9%,

somewhat lower than the rural 12.5% rate. Obviously we cannot be sure how closely matched

urban and rural accident data are in terms of alcohol involvement. With the evidence at hand it

seems that if anything rural accidents involve more drinking drivers than urban crashes. This

permits tentative extrapolation of urban data to rural accidents, but the extent to which this is

justified is unclear.

38

For example in both matched and non-matched driver groups women make up approximately 13% of the sample.


Semester 2004

30

Under-reporting of fatalities

Before moving on a bias that is present in all the data, and not just the sample, is the systematic

under-reporting of fatalities by Carabineros, as they report as fatalities only those victims that die

within 24 hours of the crash. Naturally, many more die in the days following the crash, and these

are classed as serious injuries, not deaths, by Carabineros. For example, between January 2000

and the end of 2002 Carabineros recorded 4974 traffic deaths, while the Ministry of Health put

the figure at 627939

– 26% more than Carabineros. Thus, when considering the drink driving

externality we would do well to consider this systematic under-reporting of deaths in the database.

This issue also strengthens the case for examining accidents resulting in both fatalities and

serious injuries, as many of the latter are effectively deaths. Moreover, serious injuries are absent

from US FARS data, making the availability of this data an advantage of this study over studies

using US data.

Under-reporting of Alcohol-Involvement

The combination of the Carabineros and SML databases provides an ideal opportunity to

compare the alcohol measures in each. The SML measure is obviously more trustworthy.

However, it is well to note that even the SML measure is not exact: given that at least one and

often many more hours transpire between the accident occurring and a blood sample being taken,

the sample is a downwardly biased measure. This is because alcohol is eliminated from the

bloodstream at approximately 0.15 BAC units per hour, a rate fast enough to tip many DUI

(0.49


Semester 2004

31

Legally classified as Drunk 253 2,270 2,523

Total 57,202 4,839 62,041

The number of cases in which Carabineros fails to realize that a driver is in fact drunk is

highlighted in red and is larger than those that are accurately identified. The number in blue is the

opposite, upward bias: cases in which Carabineros identifies drivers as legally drunk (DUI or

DWI), when in fact they are not according to the SML. The explanation for at least a third of

these cases is direct: these had alcohol in their blood and their legal alcohol reading is likely to

reflect the delay between the accident and the taking of the blood sample40

, i.e. they were drunk

at the time of the accident, but by the time the sample was taken their BAC had fallen to

permissible levels.

The extent of the downward bias in the Carabineros alcohol measure (identifying a drunk driver

as sober) is such that Carabineros correctly identified only 47% of drunk drivers. This is

surprising, but not unprecedented. US measures of police under-reporting of alcohol have found

similar figures: 71% correctly identified (Soderstrom et al 1990), 57.1% (Maull et al 1984) and

51.7% (Dischinger et al 1989). Moreover, the Dischinger study found an even lower

identification rate of 28.6% for drivers with a BAC below 1.0, which overlap with our illegal

measures, suggesting that the Chilean rate of 47% is well within the expected range. A final and

more recent reference point is Blincoe et al (2002) which found that in the US state of Maryland

police identified 74% of cases with BAC ≥ 1.0 and only 46% of cases where BACs were between

1.0 and zero.

A closer look at the misclassified cases suggests some explanations: compared to correctly

identified drivers, double the proportion (15%) of those mistakenly reported as sober suffered

either fatal or very serious injuries, and approximately a quarter suffered serious injuries. Thus it

is likely that Carabineros were unlikely to have had close access to these drivers as they may

have been immediately transported to hospital. The remaining 75% of misclassified drivers pose

a problem: how were they misclassified given that they are overwhelmingly (70%) registered as

DWI (BAC>1), rather than the less serious DUI (0.49


Semester 2004

32

explanation is a combination of Carabineros lack of equipment and training coupled with the fact

that detection of even quite severe impairment (BAC >1) might be somewhat harder than it

appears.

In terms of rural alcohol under-reporting the SML data cannot provide many cases. Nonetheless,

some areas of the Greater Santiago Metropolitan region are effectively rural, and the alcohol

detection table closely parallels that for the whole MR:

Carabineros Data 2000-03 (rural) SML data (rural)

Legally Sober (BAC< 0.5) Legally Drunk Total

Legally Sober (BAC< 0.5) 2,881 164 3,045

Legally Drunk 24 179 203

Total 2905 343 3248

This provides some support for our extrapolation of SML data to the whole country. However, as

the data collection is exclusively by MR police precincts, it is possible that the biases in the rural

accident data are completely different to that in the MR.

Hit and Run accidents

So-called “hit and run” accidents are cases in which one or more drivers flee the scene of an

accident, resulting in missing alcohol-involvement data for those drivers. The evidence

available41

suggests that these drivers have high BACs (which is perhaps why they flee) and thus

are of direct relevance to our study, making up yet another source of downward bias to our

parameters. However, any assumptions made about such a group must be arbitrary and they have

thus been ignored.

41

The Compton et al study pursued hit-and-run drivers, apprehending 94 of 603. Of these, over 69% had positive BACs, typically at high levels. Moskowitz et al 2002 report police in La Puente, California made an effort to

apprehend hit and run drivers, reporting that 65% of those apprehended had positive BACs. However, those caught

are not necessarily representative, as higher BACs may make successful escape more unlikely.


Semester 2004

33

6 – Results

The 2000-2003 database was used to identify the following serious injury or fatality causing 2-

car night-time crashes:

Period: 2000-2003 drunk-sober crashes drunk-drunk crashes sober-sober crashes

Carabineros data 2000-Sept2004 200 34 539

SML data (reduced sample) 81 15 72

SML data (extrapolating for

whole sample) 317 59 282

In terms of the model the following results were obtained:

Period: 2000-2003 θ N % of drinking drivers on the road

Carabineros data 2000-Sept2004 - - -

SML data (reduced sample) 3.8 0.23

18.96% (2.35) (0.094)

SML data (extrapolating for

whole sample)

3.8 0.23

(1.181) (0.047)

It is immediately evident that the Carabineros data by itself is of no use in estimating the RR

parameter, θ: the data is simply incompatible with the binomial distribution - no parameters exist

that could have generated the data. This is the direct result of the downward bias in the

Carabineros alcohol measure, and makes the model unworkable.

Moreover, this incompatibility with the model is not due to the equal mixing assumption not

holding, as the problem here is that the number of Drunk-Sober and Drunk-Drunk crashes are

extremely low, not just that Drunk-Sober crashes are too low for the model to function.

The more trustworthy, although still somewhat downwardly biased SML data do provide a large

enough value for R (R ≥4) and provide an estimate of θ: 3.8. That is, drunk drivers are 3.8 times

more likely to cause a serious injury or fatality causing crash than sober drivers42

. However, the

standard error is too large in the penultimate line of the table for the parameter to be statistically

significant. This is because in this case we are using only those cases in which we have a direct

42

Using the Compton et al RR curve suggests an average BAC for Chilean night-time car drivers of approximately

0.95, similar to the average BAC for dead drivers, 0.75. However, here we are deriving a RR of causing a severe or

fatal accident, while the Compton et al curve is for accidents of all severities. As alcohol is implicated to an

increasing extent as BAC rises, we can infer that the average Chilean BAC implied by the RR estimate is above 0.95.


Semester 2004

34

BAC measurement, and as this is only 25% of the drivers in the sample, the variance of the

estimate is too large. Extrapolating (saying, effectively, that our sample is representative, and

simulating the data) provides a much lower s.e. – though of course this is a mere simulation.

It is important to note the context of this result. In addition to the data problems that have resulted

in our having to use a much reduced sample – the SML data – this is a result that has been

estimated for Chilean car drivers, at night. Even without the small sample size that implies the

result is not statistically significant, it cannot be strictly interpreted as applying to non-car drivers,

or as applying in the daytime. In the sense that drink-driving is principally a night-time

phenomenon, this caveat is not as serious as it first appears. However, the fact that we cannot be

sure that it applies to vehicles that are not cars (trucks, buses etc) is more serious were any policy

relevance to be given to this result, given such vehicles are involved in 38% of serious night-time

crashes.

What the result would provide us with – if standard errors were not a problem – would be a

trustworthy relative risk estimate of a fatal or serious injury causing crash for night car drivers,

which make up the vast majority of night drivers, and an even larger proportion of potential

night-time drivers. As such, this result would be of great value to driver education programs and

would be of substantial use in determining penalties for night-time drink driving. At present it

would appear that no relative risk calculation is involved in the penalty-setting process.

Estimating the parameters taking Under-Reported Alcohol-Involvement into account

It should be clear that the Carabineros dataset by itself cannot yield a parameter estimate.

However, if the degree of alcohol under-reporting can be estimated, then an approximate RR

parameter can be estimated for the whole dataset, and not just the SML sample.

To estimate the extent of alcohol under-reporting we focus on the cases where we have both SML

and Carabineros data on alcohol. These are almost all (157 of 164) the cases used in the SML-

only estimation of θ in the preceding section. These cases are then examined to determine the


Semester 2004

35

extent of the downward bias in the alcohol measure, as we have the more accurate SML measure

at hand43

.

There are two polar cases by which a biased alcohol measure could affect the number of cases in

each of the three crucial collision categories: Drunk-Drunk (DDC), Drunk-Sober (DSC) and

Sober-Sober crashes (SSC). The first is the least plausible: if a police officer misreports one

driver (as being sober when he is in fact drunk) at a particular crash site, then he will never

misreport the other. This polar case results in a lowering of the value of R44

and hence cannot

serve as a useful guide, as it makes estimating θ even more difficult. The other case is when a

police officer has perfectly correlated reporting errors: if he misreports one driver at a crash then

he always misreports the other. This increases the value of R because in this case the number of

SSC in the data are artificially inflated: many SSC are actually DSC or DDC.

While the latter scenario (correlated police reporting errors) is more plausible it is unlikely to

hold fully: sometimes a police officer will misreport only one of the two drivers at a particular

crash. An examination of the data reveals this to be exactly the case: there are approximately the

same number of cases in which Carabineros misreport one driver and not the other as cases where

both are misreported. However, if we assume that reporting errors are perfectly correlated then

we under-estimate the downward bias in the alcohol measure caused by misreporting: if we

assume an intermediate correlation of reporting errors then the downward bias required to obtain

the same data error-laden data is greater.

Thus, in the simulation below we have assumed perfectly correlated reporting errors. Under this

assumption, the downward alcohol reporting bias required to generate the Carabineros alcohol

measure for those 157 cases in which we have the more accurate SML measure is 51%. That is, if

51% of drunk drivers are misclassified as sober, and reporting errors are perfectly correlated, then

real accident data like the 157 cases under consideration will be reported as Carabineros has

reported them. It is important to note the earlier point: this is the lowest possible value for the

43

By themselves they deliver an R of 1.73 - well below the minimum level for estimation 44

The key issue is whether Drunk-Drunk collisions are shifted to the Sober-Sober or Sober-Drunk categories. If a

police officer never misreports both drivers then all misreported DDC are reported as being DSC, and thus the

number of DSC is artificially inflated and must be lowered, reducing R.


Semester 2004

36

downward bias in the alcohol measure, and the evidence indicates that the assumption of

perfectly correlated reporting errors does not hold. In all probability the downward bias is higher.

Using this degree of downward bias, and applying it to the 2000-September 2004 dataset, we

obtain the following simulated parameter values, with simulated s.e. in parentheses:

Thus, the implied θ if the degree of downward

bias in the alcohol measure is in fact 51% and

Carabineros misreporting is perfectly correlated is

6: drinking drivers are on average 6 times more

likely45

to cause a fatal or serious injury causing

crash than sober drivers in Chile. If the downward

bias is higher, then the relative risk (θ) will also

be higher, as can be seen from the table, and vice

versa.

45

According to the Compton et al. RR curve for US drivers, this suggests that the average BAC for Chilean night-

time car drivers is above 1.15.

Carabineros data 2000-Sept2004

extent of downward bias Implied θ implied N

32% 1.2 0.31

(6.89) (0.13)

40% 2.8 0.23

(1.11) (0.773)

51% 6.0 0.20

(1.975) (0.001)

60% 13.6 0.18

(6.2) (0.0005)


Semester 2004

37

Taking Unequal Mixing into account

As noted in the Methodology section, violations of the equal mixing assumption (i.e. that drivers

have proportionately more interactions with other drivers of their same type) bias θ downwards

and N upwards. Estimating the model for different values of the parameter Δ46

(reflecting degrees

of unequal mixing) we obtain the following table:

As can be observed in the table above, as the

Δ we use rises the estimated value of θ

increases (by 26% in moving from Δ=0 to

Δ=0.1) and the value of N falls. A table

containing more values of delta is available in

the appendix. Obviously, we cannot know

which value of Δ most closely approximates

the degree of unequal mixing found in practice, but as Δ rises our estimated θ becomes

statistically significant.

An attempt to choose the value of Δ that best fits the data by maximizing the value of the

likelihood function (V) with respect to Δ47

was unsuccessful: the difference between the highest

and lowest values of V (each of which is the probability of observing the sample) using a

thousand values of delta between 0 and 0.6 is a mere 0.000000001730901. Moreover, as a

numerical optimization method is being used, as the number of iterations is increased this

difference shrinks. In short, a graph of V (on the Y axis) against Δ would be flat, implying that

this method is of no practical use, and that the true value of delta must be approximated by some

other, perhaps experimental, method.

The Alcohol-Involvement of Dead Pedestrians and Dead Drivers

See Appendix 6 for details.

46

A Δ of 0.1 implies that drunk drivers are 10% more likely to interact (not crash, just interact) with other drunk

drivers than with sober drivers. 47

This is because the value of the likelihood function (V)represents the probability of observing this particular

sample. Hence the value of Δ that results in the highest value of V maximizes this probability.

Δ θ (relative risk) N % of drinking drivers

0 3.81 0.23

19.46% (2.35) (0.09)

0.1 4.63 0.21

17.05% (2.48) (0.07)

0.2 5.54 0.18

15.26% (2.68) (0.06)

0.5 8.19 0.13

11.71% (3.34) (0.04)


Semester 2004

38

7- Estimating the External Costs of Drink-Driving

What counts as an externality?

The debate as to the policy relevant costs of alcohol is one of long standing, with estimates of the

total annual social cost of US alcohol use ranging from USD 9.3bn to over 130bn (Heien 1995-6).

Much of the debate revolves indirectly around the issue of whether abuse of alcohol is

appropriately considered rational behaviour.

The Becker & Murphy theory of rational addiction holds that addiction is in no way conclusive

evidence of irrational behaviour – in fact it can be rational. Habits and addictions (extreme habits)

stem from consumption preferences being connected intertemporally. Several effects are in play:

among them what is termed „reinforcement‟ – where past consumption of the good increases the

marginal utility of present consumption. Secondly, the good itself may raise already high

discount rates48

, rationally transforming a habit into an addiction. In short, a coherent and

empirically successful theory exists to justify that consumers are rational when the consume

alcohol in „excess.‟

Standard economic theory holds that the relevant social costs of alcohol involved driving are only

those that can be classified as spillovers. If drinking drivers kill themselves while driving,

causing no other harm then there is no social cost involved. If their passengers die, they are not

externalities either, as they exercise their free will, and internalize the risk in choosing to ride

with such a driver49

. Moreover, drinking drivers, even those who die, receive positive utility from

their consumption choices ex ante.

Some economists have argued that a rigid adherence to consumer sovereignty (the consumer is

rational and takes an optimal consumption path – who are we to say what is best for him?) is not

convincing in this case, and that strict externalities do not capture the full social cost. Pogue and

Sgontz (1989) argue that the optimal tax on alcohol rises markedly if alcoholism is considered to

48

People with high discount rates are more likely to develop habits that may become addictions as they weigh the

future risk of becoming an addict less heavily. 49

Heien notes that Perrine et al 1988 indicates that 83.3% of the passenge

tesis de magÍstereconomia.uc.cl/wp-content/uploads/2015/07/tesis_wmullins.pdfdocumento de trabajo...

Documents