tesis de magÍstereconomia.uc.cl/wp-content/uploads/2015/07/tesis_wmullins.pdfdocumento de trabajo...
TRANSCRIPT
-
D O C U M E N T O D E T R A B A J O
Instituto de EconomíaTESIS d
e MA
GÍSTER
I N S T I T U T O D E E C O N O M Í A
w w w . e c o n o m i a . p u c . c l
One for the Road - Estimating the Drunk-Driving Externality in Chile
William Mullins.
2004
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
1
One for the Road – Estimating the Drunk-Driving
Externality in Chile
Economics M.Sc. Thesis in Public Policy
William Mullins
Drink-driving is a classic negative externality. Nonetheless, it has failed to attract
economic attention in Chile. This study estimates the relative risk of drunk drivers in causing
serious accidents1, and the aggregate externality generated by drunk-driving in Chile.
As an epidemiological phenomenon drunk-driving warrants attention: between the ages of 10 and
45 it is the joint second highest ranking cause of death for Chileans. Conaset puts the number of
accidents caused by drink-driving (DD) between 2001 and 2003 at 8,137, in which 472 people
died and 2,240 were seriously injured. However, such estimates lack a clear methodological
grounding, confusing the simple presence of alcohol with a causal role. This study aims to
separate alcohol‟s causal effect from the baseline serious accident risk faced by all drivers.
The methodology used in this study is taken from Stephen Levitt and Jack Porter‟s 2001 JPE
article “How Dangerous are Drinking Drivers?” They find that drinking drivers in the US
(including those not legally classified as drunk) are at least 7 times more likely to cause a fatal
crash than sober drivers (θ ≥ 7), while for legally drunk drivers θ ≥ 13. The estimation of a lower
bound for this relative risk, and an upper bound for the proportion of drunk drivers on the roads,
are the parameters estimated in this paper, allowing an approximate calculation of the aggregate
externality caused by DD in Chile.
Levitt and Porter estimate the lower bound of the aggregate US DD externality associated with
lost lives (no other costs are considered) to be around USD 9 billion in 1993. Chile‟s only
approximately comparable estimate is from a study commissioned by the Ministry of Public
1 Serious accidents are defined as accidents that result in at least one death or serious injury.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
2
Works (MOP) from the consulting firm CITRA in 1996. They estimate that the total annual cost
of all road accidents in Chile is around USD 6-700 million, or between 7 and 8 percent of the US
external cost estimate for drink driving alone. This number is used within government as the sole
basis for public investment proposals, thus according it a policy importance far beyond that of
most studies. This paper aims to provide a more rigorous estimate of the external costs that will
allow some perspective as to the magnitude estimated by CITRA.
The study begins with a review of the theoretical issues that bear on drunk-driving, and follows
with a review of the evidence on alcohol and crash risk. Sections 4, 5 and 6 detail methodology,
data and results respectively. Section 7 considers which deaths and serious injuries are rightly
classed as externalities, and calculates the aggregate spillover. Section 8 concludes.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
3
2 – Economic Theory and Drunk Driving
General Considerations
The accident literature in the US and West European countries often prefaces its remarks on
alcohol with comments such as “alcohol consumption is involved in x% of fatal crashes” and
conveys the impression that alcohol causes all accidents it is “involved” in. However, without an
estimate of the number of drunk drivers on the roads, this figure is meaningless – if the same
percentage (x%) of drivers have been drinking then alcohol is no more a crash risk factor than
orange juice. This tendency to demonize alcohol in terms of its crash causation must first be laid
aside if we are to consider objectively the external cost of drink driving in Chile.
Moreover, drinking is only one risk factor among many. As Borkenstein et al. note in their
seminal 1974 study:
“traffic accidents are the result of interactions among drivers, vehicles and the physical environment. No
single cause of traffic accident exists. It is not possible to consider a separate element of the accident
complex in the abstract. These elements operate only in the context of the remaining elements.”2
Speeding is an example of one such “remaining element.” It also increases the relative risk of
crashing, and to an extent comparable to drink driving: “driving 65 mph when the speed limit is
55 mph increases risk of involvement in a fatal crash by a factor of 2.0, similar to the risk
increase associated with driving with BAC = 0.08% compared to driving at BAC = 0.”3
Moreover, drivers below the US legal limit of 0.08% are also extremely dangerous, making up
70% of drivers with a measured BAC in the 2002 US Fatal Accident Reporting System (FARS)
data. In the 2000-2004 Chilean data, approximately 78% of drivers involved in accidents
resulting in serious injuries or deaths are recorded as being sober. While this is an over-estimate,
as will be discussed, most accidents are not caused by alcohol.4 What is also true however, is that
2 p17 Borkenstein et al. 1974
3 L. Evans, Ch 10 (2004). The WHO report (2004) cites similar figures (Ch 3 p77)
4 L. Evans (2004) notes that even if alcohol miraculously disappeared from the roads, 66% of US fatalities in 2002
would remain.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
4
alcohol, and perhaps speeding – uniquely among crash risk factors – are perceived by lawmakers
to be particularly reckless ways to endanger the lives of others, and are directly chosen by the
drivers involved. As a result, the law assigns property rights to sober, non-speeding drivers. Other
significant risk factors such as sex and age also increase relative serious crash risk, with young
men unsurprisingly emerging as the highest risk group: Levitt and Porter report that sober drivers
under 25 years old pose a fatal crash risk 2.78 times greater than sober drivers over 25, while the
comparable sober male-female relative risk is 1.36.
Public policy cannot, of course, focus on removing male drivers from the roads. It focuses instead
on reducing drink driving and speeding. Moreover, alcohol and speeding dwarf other risk factors
in terms of the magnitude of the increase in relative risk they provoke. At the Chilean legal DWI
(Driving While Intoxicated) limit (BAC 0.1%) a driver has a relative crash risk of 4.79, while at
BAC 0.2% it is approximately 82 times that of a sober driver.5
A model of the consumption of risky goods
Thus it should be clear that alcohol is not the sole cause of the devastation often caused by traffic
accidents. Driving is a dangerous activity per se, in the same way that extreme sports are
dangerous activities: they increase the risk of death.
A simple model, developed in Rosen (1981), formalizes how agents determine their optimal
consumption of risky goods (those that increase risk of death) and beneficial goods (reduce risk
of death). Define the probability of surviving a single period as q, and utility conditional on
survival as U(C1,…Cn) for the n available consumption goods. If we consider that consumption
of certain goods (such as drunk driving) can affect survival probability we can write q =
q(a1C1,… anCn), where a1…an are non-negative constants. For a good whose consumption
reduces survival probability the partial derivative qi is negative. If we assume a budget constraint
5 Compton et al. 2002, p42
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
5
of j jY p C and maximize expected utility (q(..)U(..)) we obtain the following optimality
condition6:
i ii i
n n
U Pa qV
U P ; V= value of a statistical life
The relevant point here is that the rational consumer „self-regulates‟, in Rosen‟s words. If good i
is drunk driving (assumed to reduce survival probability by increasing crash risk) then qi is
negative, making the entire second term positive. This indicates that the ratio of marginal
consumption utilities must be higher than in the case where consumption of good i does not affect
risk. In short, this model illustrates the fact that a rational agent takes into account all risks to
himself: he consumes less of good i given its negative health effects. What the model omits is the
fact that when this agent crashes while “consuming” drunk driving, the risk is borne in part by his
passengers and himself, and in part by unfortunate pedestrians or occupants of other cars. The
risk to these others constitutes a negative externality, and is not factored into the consumption
decision of our drinking agent.
This model can also be used to highlight the offsetting effects that result from rational
consumers‟ reactions to any change in road safety, something that should not be overlooked in
any cost study such as this. Consider a change that makes driving safer, such as the introduction
of superior safety technology (e.g. crumple zones, airbags)7, or better enforcement of drunk
driving laws. The latter reduces the dangerousness of interacting with drunk drivers at night –
who are present in greater proportions than in daytime hours – and ceteris paribus reduces the
overall risk of night driving. This safer driving environment should induce more night-time
driving. In terms of the model, if night driving is good j, then aj will fall (as night driving is less
dangerous per unit) and the total amount of night driving by sober drivers will rise. If, as has been
supposed, night driving is an activity that reduces the agent‟s survival probability8, then the
6 See Appendix 1 for derivation of this result and for derivation of V, the value of a statistical life
7 Peltzman 1975 notes that “safety regulation has had no effect on the highway death toll…[it] may have increased
the share of this toll borne by pedestrians and increased the total number of accidents” p677 This is because new
safety devices have resulted in responses from drivers – such as riskier or faster driving – almost completely
offsetting the increase in safety brought about by regulation. 8 This is a reasonable assumption: “in times of economic growth, traffic volumes increase, along with the number of
crashes and injuries…reductions in alcohol-related crashes have also been observed to coincide with periods of
economic depression” p72 WHO (2004).
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
6
number of fatal accidents resulting from superior enforcement of traffic laws will be partially
offset by more accidents caused by sober drivers‟ increased night driving. Thus any study that
purports to show the lives saved if drink driving were eliminated is implicitly holding offsetting
activity by other drivers to zero, leading to an overestimate of the benefits of such an outcome9.
The Economic Issue
The economic issue at the heart of this paper is the negative accident externality generated by
drink driving (DD). The externality – defined as a net cost to other members of society not borne
by the causing agent10
– results from the higher crash risk of drinking drivers relative to sober
drivers. Crashes often involve third parties (other drivers, passengers, pedestrians) or their
property, and given that the law assigns “property rights” over the road to sober drivers, a higher
crash risk causes a negative spillover effect11
. This is not to say that only drunk drivers crash –
we all face a risk of crashing when driving, a risk that depends on numerous characteristics such
as tiredness, age, experience, and road conditions. This is termed the baseline crash risk. The
negative externality caused by drink driving is the additional crash risk beyond the baseline level.
If drinking drivers do not bear the full cost of their actions (because they are not required to or
cannot fully compensate their victims) then they will choose an individually optimal amount of
drink driving that is excessive (and thus inefficient) from society‟s viewpoint: for their marginal
units of drink driving the cost to society is greater than the benefits obtained by such drivers. In
the Rosen model above this can be seen by noting that the agent considers only the impact that
consuming DD will have on his health, not on others‟. This is the economic reason behind the
legal penalties for drunk driving: an optimal tax reduces the individually optimal amount of DD
9 However, the Rosen model also shows that the marginal willingness to pay for small changes in ai is
i i n
i
dYq CVp
da . Thus even if a complete offset ensures that P(survival) does not change, the willingness to pay
may be positive and large, making the exercise worthwhile. 10
This definition should include the caveat that another agent‟s actions do not constitute an externality if they change
market prices (this is a pecuniary „externality‟ and generates no inefficiency). A related definition links spillovers to
the absence of functional markets. 11
Both drivers are equally responsible for the accident from an economic point of view: were either of them to have
stayed at home, the accident would not have occurred. It is the legal definition of property rights that establishes the
blame with one party; such is the case with alcohol-involved driving.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
7
so that it coincides with the socially optimal amount, by forcing the driver to internalize the costs
of his dangerous driving.
Its external effects notwithstanding12
, drink driving - like speeding - creates private benefits for
the drivers involved because its avoidance can be costly in terms of time or money. Indeed, this
discussion is not intended to make the point that drunk driving should be eliminated: it is possible
that the socially optimal amount of drink driving is not zero – and the fact that the legal BAC
limit is 0.05 and not zero is a testament to this fact13
.
Does Insurance make a (theoretical) difference? Do Private Lawsuits?
The issue of insurance is important and must be considered: if a person injured by a drunk driver
is insured then does an external cost exist? The answer: almost certainly, as only if the drunk
driver is successfully sued by the victim will the externality be fully eliminated. A system with
perfectly defined and enforced property rights would ensure this, but as most authors consider
that the probability of a successful private suit is low in the US, it can be confidently assumed
that it is even lower in Chile. Moreover, even in the most favourable case in which the private
suit is successful and substantial damages are awarded, it seems unlikely that any financial
compensation can fully restore the utility lost by dying – the agent himself has disappeared. If the
basic unit of society is held to be the household the question becomes: can money fully replace a
lost family member? While the answer depends on the dead individual, some uncompensated
external cost must surely remain, whatever the payout.
Another relevant limit to the role of private legal suits is that the wealth level of the driver is a
binding upper limit to judicially dictated compensation. Given that most plausible estimates of
the statistical value of life in Chile range from 0.3 to 1.4 million USD, the average drinking
driver is in no financial position to fully compensate the victim(s).
12
Driving involves continuous interaction with other drivers, making it rife with non alcohol-related externalities,
most notably congestion and accident spillovers. These refer to the fact that an additional driver adds to the overall
congestion level, increasing the travel time of all drivers, while also increasing the general accident risk. 13
If the optimal amount of DD is in fact zero then any tax above the marginal damage that DD causes will attain the
optimal internalizing outcome.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
8
In short, insuring victims does not solve the problem posed by the DD externality, as the social
cost of the activity remains above the social benefits despite the existence of insurance14
.
Moreover, the possibility of effective private lawsuits does not provide the necessary deterrence.
Hence, a potential drunk driver may be under-deterred by such a system.
Optimal Law Enforcement
The „optimal tax‟ that internalizes the spillover is a deceptively simple term for what is in fact a
complex instrument made up of 2 broad policy tools: the penalty paid when an offender is
apprehended and the probability of detection or apprehension.
The standard textbook solution to a negative externality is the Pigou tax, in which the probability
of detection of an “offence” (p) is approximately one and the optimal fine (penalty) that offenders
face is equal to the marginal damage caused by their actions. However, in the real world, the cost
of a p approximately equal to one is likely to outweigh the damage done by DD: it would require
huge expenditures on police and surveillance equipment, and severe violations of individual
liberties.
The economic theory of law enforcement (see Polinsky & Shavell, 2000) makes use of an
intuitive and simple result: for risk neutral agents a combination of a high p and a low penalty
(assume it is a fine) results in the same level of deterrence as a low p, high fine combination. As
it is costly to catch offenders (i.e. Drinking Drivers, DD) with a high probability then the latter
combination is more cost effective way to generate deterrence.
Moreover, deterrence in this context is exactly what is required, as if it is set at the right level it
makes the expected penalties the DD will have to pay equal to the harm caused to society by their
externality. If we define F*RN as the optimal fine for a risk neutral DD, and h as the harm he
does to society, then the following equation illustrates the efficient solution in a static context:
p F*RN = h i.e. F*RN = h/p
14
Moreover, liability insurance for drivers removes even the slight deterrent effect of possible lawsuits from victims.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
9
An accident caused by a DD also causes costs to society as a whole, such as the (judicial and
police) costs of imposing the fine (k) and those of investigating and prosecuting the accident (s).
Moreover, given that many cases do not result in fines, we must also include the probability that
a fine will be imposed as a result of the prosecution stage (q). Incorporating these costs to the
model results in a new, larger optimal fine, as drunk drivers also generate these costs in addition
to the direct externality:
F* RN = (h/pq) + (s/q) + k
However, if p were slightly reduced from the level that generates the equality above, then no first
order social costs would ensue, as the marginal drunk drivers induced to drive because of the
change generate only slightly higher social costs than benefits. The advantage of reducing p
however, is that enforcement costs are be saved. Thus, with costly enforcement, some under-
deterrence is optimal (i.e. in the simple version p F*RN < h). How much p should be lowered
depends on the balance of savings in enforcement in comparison to the costs of under-deterrence.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
10
3 – How Alcohol Affects Driver Risk – a Review of the Literature
The effects of alcohol on drivers can be usefully divided into three main categories: survivability,
performance, and behaviour15
. Survivability refers to the fact that vehicle occupants with positive
BAC are more likely to die from the same physical impact than occupants with zero BAC.16
As
this only affects drinkers, it does not constitute an externality. The second effect – performance –
refers to the functioning of driving relevant skills under the influence of alcohol. There is little
room for doubt, after hundreds of laboratory experiments that alcohol reduces driver performance
in terms of coordination, reaction time, spatial orientation and other relevant skills, and that it
does so to an increasing extent as BAC rises17
.
The third category of effect produced by alcohol is that of a detrimental effect on driver
behaviour. It appears that drinking, by reducing social inhibitions, also encourages more
aggressive and riskier driving. It is an empirical regularity that alcohol is present in an increasing
fraction of drivers as severity increases. This suggests that it contributes most to severe crashes.
In the Chilean 2000-2004 data (which significantly under-reports alcohol involvement as will be
shown later) this relationship can be observed in the following table:
Source: Carabineros data 2000-Sept 2004 % of Crashes and Collisions involving at least 1 drinking driver
Crashes and Collisions involving:
all outcomes: injuries & no injuries 6.7%
any kind of injury 9.0%
medium severity injuries or worse 11.9%
serious and fatal injuries only 13.4%
fatal injuries only 18.1%
As noted by Evans, this relationship suggests that alcohol‟s most salient effect is in changing
driver behaviour towards taking greater risks and driving at higher speeds. If the performance
effect were most important, then drinking drivers„ increased driver error would be present at all
crash severities, and alcohol prevalence would not increase with crash severity. “It appears that
15
I owe this structure to Evans, op.cit. 16
According to Evans (ibid), a vehicle occupant with a 0.08 BAC is 73% more likely to die from the same crash than
one with zero BAC. 17
Moskowitz et al. (2000) have demonstrated that alcohol significantly affects some driving skills for some subjects at BACs as low as 0.02%
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
11
drivers do things when they are drunk that they would not attempt when sober, rather than merely
executing poorly the same things they would do more skilfully when sober.”18
Thus behavioural
effects, although neglected by the literature, appear to be a key factor behind the observed BAC–
crash relationship in accident data19
.
Is the alcohol-crash association due to other factors correlated with drinking?
The possibility that alcohol is not the causal agent behind actual (as opposed to laboratory
simulated) crashes has yet to be fully discarded. It is possible that a high level of relative risk for
drinking relative to non-drinking drivers could result from an association between drinking
behaviour and other dangerous driving characteristics. That is, dangerous drivers tend to drink,
but drinking itself is not what causes increased crash risk. While implausible in the light of
laboratory and behavioural evidence cited above, such a possibility must be discarded if we are to
have confidence in the analysis undertaken by this paper, given that additional driver
characteristics (beyond age, sex and licence type) are not collected in Chile, and thus cannot be
controlled for. To this end the epidemiological literature is briefly reviewed to make manifest the
causal effect of alcohol on crash risk in real driving situations
The Three Main Methods of Determining the Contribution of Alcohol to Crash Risk
A – The Case – Control method
Three main methods exist for determining alcohol crash risk. The first is the case-control
method20
, in which the BAC levels of drivers involved in traffic crashes are compared with those
of a control group of drivers matching the accident drivers as closely as possible. From a
comparison of these groups – and controlling for possibly confounding covariates – a relative risk
(RR) curve is estimated:
18
Evans, op cit. 19
Evans notes that much work remains to be done regarding the effect of alcohol on speed choice, citing an Australian study that found that drivers with BACs over 0.05% were driving faster when apprehended. 20
A more detailed review of this method is provided in Appendix 2.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
12
Compton et al 2002 covariate adjusted Relative Risk curve
0123456789
10111213
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3
BAC
RR
of
cra
sh
in
vo
lve
me
nt
If US drivers have a similar RR to Chilean drivers, then at the Chilean DUI limit of 0.5 BAC the
RR is 1.38, while drivers are 4.79 times more likely to cause a crash (of any severity) at the DWI
limit of 1.0 BAC21
.
B – Combining Accident data with Roadside Surveys
A similar but more general methodology is that which uses national accident data (such as the
Carabineros data used in this study, or the US FARS data) and a roadside survey to provide
exposure data (e.g. Zador 1991). The key differences between the case control and the roadside
survey-FARS methodologies are in the selection of control drivers and their „representative-ness‟.
Clearly the case control studies have a more reliable measure of exposure, as the drivers in their
control groups are driving in exactly the same place as the accident drivers. However, in their
precision lies their weakness: they may simply (albeit accurately) reflect local driving. Moreover,
using national data rather than data from a single locality may be more reliable in that it can work
with degrees of freedom orders of magnitude above those available in case control studies.
The most recent of such studies in the US is Zador et al (2000), which uses FARS data and the
1996 National Roadside Survey. Their results are somewhat above the case control studies, and
21
We cannot be sure that they are: a relative risk curve compares sober to drunk drivers. Chilean drivers may be very
different - both when sober and when drunk, and to different degrees – to US drivers.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
13
are somewhat suspect as a result: the major case control studies have obtained similar relative
risk curves, suggesting that the RR curves they generate are more worthy of confidence.
C – The Levitt-Porter Methodology
The third and final methodology is that used by Levitt and Porter, in which no separate set of
control data is used. Instead, the proportion of drivers in each group is estimated once the RR (θ)
has been estimated. Identification strategy is discussed more closely in the methodology section.
Levitt and Porter note that the case-control studies fit well with their results given reasonable
distributions of BACs across drinking drivers.
This methodology is unable to separate RR by fine gradations of BAC because of degrees of
freedom restrictions in smaller countries such as Chile, and because the BAC level is often not
exactly recorded. This methodology has not, to my knowledge22
, been applied elsewhere. Hence
only their results for the US, as cited above, are available for direct comparison. Moreover, no
estimates for Chile are available, either of the externality, or of the relative risk of drinking as
opposed to sober drivers – or of any other RR curve.
22
No relevant papers cite the Levitt and Porter paper according to IDEAS.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
14
4 - Methodology
The Levitt-Porter model‟s first virtue is the minimal data required for its estimation. All that is
required is fatal crash data such as the date, time, type of accident, alcohol-involvement (i.e.
whether drivers had been drinking), and the number of deaths and injuries. At first, inferring both
the relative risk θ, and N from this data alone appears impossible. As the authors note:
“Separately identifying the fraction of drinking drivers on the road and their relative risk of a
fatal crash using only the fraction of drinking drivers in fatal crashes is ostensibly equivalent…to
separately identifying per capita income and population on the basis of only aggregate income
data.”23
They achieve the apparently impossible by recognizing that for 2 car crashes (hereafter called
collisions) the relative frequency of accidents involving 2 sober, 2 drinking or 1-drinking-and-1-
sober drivers contains enough information to estimate θ. The idea is that the number of fatal
collision opportunities is given by the trinomial distribution – equivalent to randomly drawing
coloured balls from a bag. If drinking does not increase RR then the crash data will closely mimic
the trinomial distribution of crash opportunities. If the data for actual crashes differs from that
given by crash opportunities (trinomial distribution) we are able to identify how much more
dangerous drinking drivers are (θ) as N can be eliminated from the model. Once we have θ, then
we are able to obtain N.
The basic assumptions of the Levitt-Porter (2001) model24
Notation:
Ni = the total number of drivers of type i
I = an indicator variable equal to 1 if two cars interact25
A = an indicator variable equal to 1 if two cars collide, resulting in a fatal crash
P(i,,j | I=1)= Probability that the drivers are of type i and j, given that an interaction takes place.
Assumptions
23
Levitt and Porter (2001), p1199-1200 24
This section follows the Levitt and Porter 2001 paper very closely, for obvious reasons. A few sentences are taken
verbatim and are not explicitly quoted to avoid uninteresting footnotes. 25
An interaction is a 2 car crash opportunity: 2 cars pass on the street. Given an interaction, a single driver error can
cause a collision
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
15
1. There are 2 driver types: D and S
a. This is easily generalized to more types
b. Thus ND + NS = NTotal
2. There is equal mixing of D and S drivers on the roads, i.e.
a. The number of interactions a driver has with other cars is independent of the driver‟s type:
( | 1)( )
i
D S
NP i I
N N
b. A driver‟s type does not affect the composition of the driver types with which he interacts:
( , | 1) ( | 1) ( | 1)P i j I P i I P j I
3. A fatal crash results from a single driver‟s error
4. The composition of driver types in a crash is independent of the composition of driver types in other
crashes
5. A drinking driver is at least slightly more likely to make an error resulting in a crash than a sober
driver, i.e. θD > θS
The assumption doing the most work is assumption 2: equal mixing of sober and drunk drivers.
Over a small enough area and time period, it is reasonable. Over an entire country and year it
becomes less so. Assumption 2 gives the joint distribution for a pair of driver types, conditional
on an interaction between two drivers:
2( , | 1)
( )
i j
D S
N NP i j I
N N
(1)
Assumption 3 implies that the likelihood of a fatal crash is the sum of the probabilities that either
driver makes a fatal error, minus the probability that both drivers make a mistake. The latter
probability is extremely small, and is ignored:
( 1| 1, , ) i j i j i jP A I i j (2)
Developing the model
Multiplying equations (1) and (2), we obtain the joint probability of driver types and a fatal crash
conditional on an interaction between two drivers is as follows
2
( )( , , 1| 1)
( )
i j i j
D S
N NP i j A I
N N
(3)
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
16
In words, given that two random drivers interact, the probability that a fatal crash occurs and that
the drivers involved are of the specified types, is simply equal to the likelihood that two drivers
passing on the road are of the specified types multiplied by the probability that a fatal crash
occurs when these drivers interact.
The key relationship we seek is the probability of driver types conditional on a fatal accident
occurring, rather than conditional on an interaction taking place. That value can be obtained from
equation (3) through an application of Bayes‟ Theorem (dropping the I=1 condition):
From the definition of conditional probability ( , , 1) ( 1) ( , | 1)P i j A P A P i j A - and we want
to isolate the final term. To this end note that,
( 1) ( 1| , ) ( , )i j
P A P A i j P i j , i.e. P(A=1) is
simply equation (1) in all possible combinations of i and j.
Thus,
,
( 1, , )( , | 1)
( 1| , ) ( , )i j
P A i jP i j A
P A i j P i j
and we obtain:
2 2
( )( , | 1)
2[ ( ) ( ) ( ) ]
j i j i
D D D S D S S S
N NP i j A
N N N N
(4)
Let Pij represent the probability that the drivers are of type i and j given that a fatal crash occurs.
We can explicitly state the values of Pij by simply substituting for i and j in equation (4):
2
2 2
( )( , | 1)
( ) ( ) ( )
D DDD
D D D S D S S S
NP P i D j D A
N N N N
(5)
2 2
( )( , | 1) ( , | 1)
( ) ( ) ( )
D S D SDS
D D D S D S S S
N NP P i D j S A P i S j D A
N N N N
(6)
2
2 2
( )( , | 1)
( ) ( ) ( )
S SSS
D D D S D S S S
NP P i S j S A
N N N N
(7)
Note that the ordering of the driver types does not matter. Thus, in equation (6) the probability of
a mixed drinking-sober crash is the sum of the probability that i is sober and j is drinking plus the
probability that j is sober and i is drinking.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
17
Examination of equations (5)-(7) reveals that there are only three equations, but four unknown
parameters (θD, θS, ND, NS). As a result, all four parameters cannot be separately identified - only
the ratios are identifiable. Therefore, let θ=θD /θS and N=NS /ND. θ is the relative likelihood that a
drinking driver will cause a fatal two-car crash compared to a sober driver, and N is the ratio of
sober to drinking drivers on the road at a particular place and time. Dividing both numerator and
denominator of equations (5)-(7) by 2
1
S SN expresses them in terms of θ and N as follows:
2
2( , | )
( 1) 1DD
NP N A
N N
(8)
2
( 1)( , | )
( 1) 1DS
NP N A
N N
(9)
2
1( , | )
( 1) 1SSP N A
N N
(10)
The next step is to derive the likelihood function.
Aij is defined as the number of fatal crashes involving one type j and one type i driver. Given
assumption 4 (independence across fatal collisions) and the total number of fatal crashes the joint
distribution of driver types is given by the trinomial distribution:
( )!( , , ¦ ) ( ) ( ) ( )
! ! !DS SSDD A AADD DS SS
DD DS SS TOTAL DD DS SS
DD DS SS
A A AP A A A A P P P
A A A
(11)
Substituting PSS, PDD and PDS into the equation above produces the likelihood function, and the
equation is estimated by maximum likelihood, with the following fairly self-evident result:
DD DS SSP ; P ; P
DS SSDD
TOTAL TOTAL TOTAL
A AA
A A A
Levitt and Porter then take advantage of the fact that in the binomial distribution ADS2 is in fixed
proportion to the product of ADD and ADS to eliminate N from the equation:
2 2 2
2
( ) ( 1) 12DS DS
DD SS SS DD
A P N
A A NP P
(12)
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
18
Thus we can estimate θ from the observed distribution of crashes ONLY. Defining 2
DS
DD SS
A
A A as R,
and multiplying by θ we obtain a quadratic equation:
2 (2 ) 1 0R . If R = 4 then θ = 1, that is, the observed distribution of collisions matches
the distribution of fatal crash opportunities: drinking does not affect driving. If R4 then 2 solutions always exist, one with θ1. By
assumption 5 the former is discarded, and we have an estimate of θ.
Estimating N
One car crashes are incorporated into the model, but their identification depends on having first
estimated N (the proportion of drinking drivers) from collisions. N is estimated from the
following FOC of the likelihood function, substituting the ̂ estimated above for θ:
[ ( ) ]1
1[ ( ) ]
1
DS DD
DS SS
A A
N
A A
(13)
Standard errors are derived using the delta method, as described in Appendix 3.
Estimating the RR of one car crashes, λ
Lambda (λ), or the relative risk of one car crashes for drunk drivers (analogous to θ), is estimated
in similar fashion. Let QD and QS denote the probabilities that a drunk or a sober driver is
involved in a given one car crash:
( | 1)
j j
j
D D S S
NQ P i j Crash
N N
; with j=D,S (14)
We can define λ as λD/ λS and equation (14) for both j= D and S can be combined to give:
D
S
QN
Q , while both QD and Qs are obtained from the accident data. As can be observed,
lambda can be estimated only by using the estimator of N.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
19
Violations of the assumptions
The next step is to consider violations of the assumptions. The key point is that violations of the
assumptions generate downward biases for θ, making it a reliable lower bound. Possible
violations are discussed in more detail in Appendix 4, but violations of Assumption 2 are
sufficiently important to merit consideration here.
Relaxing the Equal Mixing assumption
The model requires an equal mixing assumption (Assumption 2), which holds that over a given
geographical and temporal area (for example on weekend nights in Santiago) drivers of both
types are homogeneously distributed. This assumption, as you may recall, has two parts:
a. The number of interactions a driver has with other cars is independent of driver type:
( | 1)( )
i
D S
NP i I
N N
A2(a)
b. A driver‟s type does not affect the composition of the driver types with which he
interacts: ( , | 1) ( | 1) ( | 1)P i j I P i I P j I A2(b)
Combining the two results in equation (1), as noted earlier: 2
( , | 1)( )
i j
D S
N NP i j I
N N
A2 is a demanding assumption: we must consider only very small space time areas to be sure of
its holding, as driving conditions change from hour to hour, and between neighbourhoods: the
Suecia district on a Saturday night between 4 and 5 am is very different to the Alameda area on a
weekday at 9 pm – it is likely that Suecia would have a far higher share of drunk drivers, as
would a night time period as opposed to a daytime period, for example. As Borkenstein et al note:
“within a fairly short period of time (not exceeding one hour) the driving conditions or exposure
for a driver using a specific “block” tended to remain relatively constant.”26
Thus ideally the
model should be estimated for 50-60 minute periods, over a 5-10 city block area – something that
real world data simply cannot make possible.
26
Borkenstein et al. (1974) p 22
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
20
Thus we can be relatively sure that A2 will not hold perfectly in the data. That is, drunk drivers
will be somewhat concentrated in certain space-time regions (near bars and at night), and the
same will hold for sober drivers in other areas. This results in a lower number of drunk-sober
interactions (crash opportunities) than predicted by the model under the equal mixing assumption,
and a higher number of same type (DD and SS) interactions, exerting a downward bias on our
estimate of θ (R falls) and an upward bias on our estimated N (as more DD crashes occur, and θ
is lower, N must rise). As the units of temporal and spatial analysis shrink, violations of this
assumption will be less severe, and the downward bias it exerts on estimates of θ should fall.
This prediction is borne out by Levitt and Porter‟s results. Applying their model to US fatal
accident data assuming equal mixing over the whole US for their entire sample (1983-93, 8pm-
5am) they estimate θ = 3.79 (s.d. = 0.14). As they reduce the space-time areas over which they
assume equal mixing – weakening the bias caused by violation of A2 – their estimated θ rises to
7.51.
Levitt and Porter suggest amending the model to allow an increased probability of same-type
interaction, while still maintaining the reasonable assumption that the number of interactions a
driver type has is proportional to the percentage it makes up in the overall driver population, i.e.
that: ( | 1)( )
i
D S
NP i I
N N
should still hold
27. This does not wholly eliminate the problem
posed by the violation of the equal mixing assumption, but it does allow a reduction in the
downward bias it produces.
They do not explicitly solve the amended model. They simply note that for Δ = 0.1 (a 10%
increase in DD interactions), their estimates of both one and two car relative risks rise by
approximately 25%.
27
This is a reasonable requirement, as otherwise we world be requiring that either drunk or sober drivers have
proportionately more interactions than the other type i.e. that one type passes more cars than the other by driving in
more congested areas or simply by driving longer distances. Such a requirement would necessarily be entirely
arbitrary.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
21
Developing the Model with Unequal Mixing
Define the parameter representing the increased probability of a Drinking-Drinking (DD)
interaction as Δ. Thus equation (1) changes from:
2 2
2 2( , | 1) D D
D S Total
N NP D D I
N N N
to:
2
2
1( , | 1)
D
Total
NP D D I
N
(15)
Similarly, for Sober-Sober (SS) and Drunk–Sober (DS) interactions:
2
2
1( , | 1)
S
Total
N xP S S I
N
(16)
2
1( , | 1)
D S
Total
N N zP D S I
N
(17)
The amended model has 3 additional parameters (Δ, x and z), as described by equations (15-17).
x and Δ should be positive, reflecting the fact that the clustering of same-type drivers leads to
more same-type interactions and z should be negative. To solve for these in terms of N, θ and Δ
we must impose the condition mentioned earlier regarding P(i | I=1). This term reflects the
probability that an interaction involves a driver of type i and can be expressed as:
( | 1) ( , | 1)j
P i I P i j I . Thus:
( | 1) ( , | 1) ( , | 1)P D I P D S I P D D I (18)
( | 1) ( , | 1) ( , | 1)P S I P S D I P S S I (19)
Substituting (i) , (ii) and (iii) in (iv) and (v) along with A2(a) yields:
2x N
z N
Solving the model with these modifications produces an equation for R in which N does not
cancel out, unlike the standard model:
2 2 2 2
2
( 1) (1 ) ( 1) (1 )
(1 )(1 ) (1 )(1 )
z NR
x N
(20)
Define 2
2
(1 )
(1 )(1 )
NB
N
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
22
Then:
2 2 2
2
2
( 1) (1 ) ( 1)
(1 )(1 )
(2 ) 0
NR B
N
B B R B
Recalling that the value of R is obtained from the dataset:
2 2( 2 ) (2 ) 4
2
R B B R B
B
(21)
Thus θ cannot be estimated without a value for N and Δ, as these are components of B.
Identification of delta from the model is impossible, and a value for this parameter must be
assumed. However, N was estimated using the following iterative procedure.
First, equation (20) was solved for an arbitrary value of N28
and a given delta, obtaining an
estimate of θ ( ˆi ). Then, using ˆi , N was estimated using equation (13) (unchanged by the
amendments to the model):
[ ( ) ]1ˆ
1[ ( ) ]
1
DS DD
DS SS
A A
N
A A
(13)
This estimate for ˆ iN was then used to re-estimate θ using equation (20), resulting in 1ˆi , and the
whole procedure was repeated until ˆi converged, with this final ˆi used to obtain a final
ˆiN .
28
The initial value of N makes no difference to the final estimates of N and θ at any relevant level of accuracy.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
23
5 - The Data
Two complementary datasets are used in this paper. The first is the Carabineros de Chile dataset
on all reported traffic accidents in Chile from 1 January 2000 to 30th
September 2004. Earlier
data either does not exist or is of no use as drinking status and rut identification numbers were not
recorded.
Restricting the Dataset
Because our interest is in drunk-driving we limit the sample to the period in which drunk driving
is most common: night-time driving. There are three reasons for this decision. The first is
consistency with the literature: for this study to be comparable to the relative risk estimates that
have gone before, the focus must be exclusively on night driving. All the studies – without
exception – referred to in the review of the literature focus on night driving, including Levitt and
Porter29
. Some – unlike this study – narrow the focus further, concentrating exclusively on
weekend nights30
, when the concentration of drinking drivers reaches its peak. Nonetheless, a
relative risk curve based solely on night driving raises the suspicion that such a curve might not
be readily applicable to drunk driving in the daytime.
This is a serious issue, and one that admits of no easy solution, as during the day the proportion
of accident-involved drinking drivers fall dramatically, as it does on weekdays, making
estimation difficult outside of these timetables:
Occurs: at night in daytime on a weekend on a weekday
% of accidents
involving at least 1 DD 18.6% 2.7% 11% 3.2%
Source: Carabineros data 2000-2004
As can be seen from the table, during the day on a weekday there are proportionately very few
drinking drivers. In fact, in the whole dataset Carabineros recorded only 3 Drinking-Drinking 2
29
Levitt and Porter (2001) p 1213 “we limit our sample to those hours (8:00 p.m – 5:00 a.m.) in which drinking and
driving is most common” 30
see Lund & Wolfe (1991) and Zador (1991)
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
24
car collisions over the 2000 - September 2004 period. Thus, the second reason for this restriction
is that estimation is virtually impossible without it.
The third and final reason for this restriction is that the night is essentially a distinct driving
environment to the daytime, with a much lower traffic density, different traffic light settings, and
a different accident distribution. This is reflected in the fact that only 14.7% of serious accidents
occur in the daytime, as opposed to 25% at night31
.
To avoid arbitrarily choosing at exactly what time “the night” begins and ends, the UOCT
(Unidad Operative de Control de Tránsito) for the metropolitan region provided an average
timetable for the night programming of traffic lights, which are based on detailed traffic flow
studies. This is designed to capture the marked change in traffic conditions from night to day, and
provides a useful reference point. Interestingly, the UOCT timetable closely resembles the
timetable that maximizes the proportion of drinking drivers.
Metropolitan Region The rest of Chile
UOCT definition
Weekdays 2300 – 0630 2130 - 0730
Friday night - Saturday 2300 – 0900 2130 - 0900
Saturday night - Sunday 2200 – 1000 2100 - 1000
Sunday night - Monday 2100 – 0630 2100 - 0730
Definition used in this study
Weekdays 2300 - 0630 2200 - 0700
Friday night - Saturday 2300 - 0900 2200 - 0900
Saturday night - Sunday 2300 - 1000 2200 - 1000
Sunday night - Monday 2200 - 0630 2200 - 0700
The definition used in this study is slightly different to both the UOCT timetable and the
timetable that maximizes the proportion of drunk drivers in the accident data. The UOCT
timetable has the virtue of capturing a large proportion of drivers, while the maximizing timetable
ensures a high proportion of drinking drivers in the data. The definition used in this paper was
chosen as a middle path between the two: it stays close to the UOCT timetable (capturing more
31
The last two figures exclude non passenger vehicles and buses – the final restriction discussed.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
25
drivers than the maximizing timetable), while also ensuring a high proportion of drinking
drivers.32
. As can be observed from the table above, differences between the definition used and
the UOCT timetable are slight.
The final restriction on our dataset involves excluding trucks, buses, tractors, motorcycles and
bicycles. In short, we examine only car crashes. The logic for this restriction is direct: any crash
involving such a non-car road vehicle is extremely likely to be serious, due to the mass (either
high or low) of the vehicles involved. Thus for these vehicles almost every crash is serious
(exaggerating somewhat), making serious crashes much more numerous and much less related to
poor driver performance such as that caused by alcohol. Moreover, drivers of such vehicles
(excluding bi- and motor-cycles) represent a sub-group of the driver population with entirely
distinct characteristics: they are likely to have more driving experience, drive far greater
distances and for different purposes, and value their licences more highly – hence they are less
likely to be drunk. As this study aims to find a RR for the general driver population, this sub-
group represents a confounding influence and is removed33
.
An examination of the effects of these restrictions on the data is available in Appendix 5.
The Alcohol Involvement variable
The measure of alcohol involvement is classified in the following categories:
This table also shows the BAC levels
that correspond to each category in
theory. In practice, the relatively few
breathalyzer units [220 in the whole
country, 59 in MR] possessed by
Carabineros de Chile are rarely on hand
for an exact measure. Instead, the officer‟s assessment is used to gauge impairment – the same
measure used by Levitt and Porter for their main results. It is likely that this measure is biased,
32
While the definition used is a middle path between the two, using the UOCT timetable hardly changes the RR
estimates, while the max DD timetable raises them slightly. 33
Levitt & Porter and all other studies also eliminate such drivers, even eliminating taxis, which is not done in this
study, although this makes little difference to our results.
Alcohol Involvement measure thresholds
Physically unimpaired BAC = 0
Physically deficient driving condition 0 < BAC < 0.5
Under the influence of alcohol (DUI) 0.5 ≤ BAC < 1.0
Intoxicated (alcohol DWI) BAC ≥ 1.0
Under the influence of drugs -
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
26
although the extent and direction of the bias is impossible to predict a priori: for example,
Carabineros could classify all drivers with traces of alcohol on their breath as drunk, or
systematically sub-report alcohol involvement as Lund and Wolfe (1991) report.
A more accurate measure of alcohol involvement is obtained from the blood samples that are
legally required for all drivers after every accident involving serious injuries. These are carried
out by the Metropolitan Region‟s Servicio Médico Legal (SML) or State Coroner, and regional
SML‟s or ad hoc coroners in more remote areas. The Metropolitan Region (MR) SML covers
approximately 35% of all accidents34
, and over 70% of random breath test samples. It is the MR
SML‟s database of rut and BAC data covering the period 1st January 2000 to 30
th December 2003
that is used in this paper.
Combining Carabineros and SML data
As the SML database contains only the date the sample was received, BAC and rut, and includes
many random breath test (i.e. not from an accident) samples, combining it with the Carabineros
database is essential: it ties the BAC measures to specific accidents and drivers. However, for a
successful match to occur many not inconsequential hurdles must be surmounted. Firstly,
individual rut data must be correctly collected and correctly entered into the database. For
Carabineros and the SML each of these tasks is a formidable obstacle35
. For example, in the case
of data entry many illegible numbers result in 8,7 or even 3 digit rut identification numbers in the
database instead of the required 9 digits. Furthermore, poor transportation ensures that many
blood samples are broken in transit, resulting in congealed samples and hence missing BAC
measures.
In addition to unusable or missing data, the date of the accident and the date the BAC sample was
received by the SML are rarely exact matches. This is because the date the sample was received
or taken is often the day after the accident, given that many such accidents occur before midnight.
34
That is, they measure BAC for all dead traffic victims (drivers, passengers and pedestrians) and drivers‟ BAC in
all accidents involving at least some degree of serious injury. Percentages are from SML data for 2000 and 2001. 35
The fact that the rut identification number is 9 digits long, and must be deciphered from an often illegible
(according to the SML Statistics department) hand-written accident or sample data form is especially conducive to
error. Outdated, non 21st century compliant or error ridden database software at both the SML and Carabineros de
Chile is an additional hindrance.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
27
Moreover, if the crash victim dies, then several days may elapse before the blood sample arrives
at the SML. This is especially true of cases in which the crash victim dies after days or weeks in
hospital. In these cases it is usual to have 2 BAC samples: one taken relatively soon after the
accident occurred, and one taken near the time of death. The former is preferred in the measure
used in this paper, for obvious reasons.
Thus it should be clear that formidable missing or corrupted data problems hinder a simple and
direct merging of the datasets. Moreover, exact date matching cannot be done because of the
nature of the data. The final combination of the two databases was the result of a two stage
process. The first used the following match criteria: an exactly matching rut and a match on the
month and year of the accident. Thus, for a BAC sample to be matched to a particular driver, the
SML and Carabineros databases‟ recorded ruts had to be the same and both the blood sample and
the accident had to be recorded on the same month of the same year.
There is an obvious source of error: that the same person might have been involved in two
accidents in the same month and year, and that the blood sample is being associated with the
wrong accident. This is certainly possible, and appears to occur remarkably frequently in the
dataset. To tackle this, those records in the Carabineros dataset that have more than a 3 day lag
between the date of the accident and reception of the sample are removed from the “matched”
group, to make it extremely unlikely that they do not come from the same accident.
Another source of error is that by combining the databases by month, we are ignoring all crashes
occurring near the end of each month, in which the sample was taken or received a few days later,
in the following month. To this end a second matching process is used, attempting to match the
Carabineros data with SML data using the same criteria as before, but changing the month in the
SML data to the month before. This second stage uses only records not matched in the first stage
from both databases to avoid re-assigning a blood sample that has already been matched to an
accident36
.
36
More refinements to the matching process were in fact used, such as making the match conditional on
driver/passenger/pedestrian status, and only using records in the second stage within a 2 week period of each other.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
28
This discussion has two main aims. Firstly it has highlighted the difficulties inherent in working
with multiple databases, and the care that has been taken in circumventing them. It is highly
unlikely that matches between SML and Carabineros data that result from this procedure are
spurious, as they have the same recorded rut (a nine digit code!), occur in the same month and
year and the dating discrepancy is at most 3 days for non-fatal blood samples. Nonetheless,
mismatches are possible. No matching process can realistically claim otherwise, given the data.
However, the likelihood of the same driver being involved in two crashes within 3 days of each
other, and having the wrong crash associated with a blood sample is extremely low, making the
error safely negligible.37
The second aim of the discussion is to make clear why the match rate of drivers to SML records
is 25% of all drivers, making up approximately 40% of MR drivers. This figure, although
apparently low, is made up of matches that are to be trusted. Relaxing any of the matching
criteria increases the proportion of records matched, but to the detriment of accuracy. Moreover,
it must be considered that blood samples are taken only in accidents involving serious injuries,
which make up 28.5% of all accidents. Thus the figure of 62,144 uncorrupted blood samples,
although apparently low, makes up a much larger proportion of drivers involved in serious
accidents.
The final numbers of matched driver records from the databases are the following:
Sample: 2000-3003 number % of all drivers % of MR drivers
Total matched drivers 63,967 25.0% 41.9%
Total matched drivers with uncorrupted BAC data 62,144 24.3% 40.7%
Biases in the Data
SML data
37
Despite a visual inspection of over 1000 matches I found no obvious discrepancies in SML and Carabineros
recorded dates. A separate error involving duplicate assignment of a single BAC to multiple crashes, though
unavoidable, has been identified and eliminated almost completely from the results presented. Any further effects
from this problem are negligible.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
29
The fact that we have reliable (i.e. SML) alcohol involvement measures for only a quarter of
crash-involved drivers (all of which are in the metropolitan region) means that the sample is not
directly representative of the country as a whole in one crucial sense: it is highly skewed towards
urban crashes. Only approximately 5% of metropolitan region crashes are classed as having
occurred in rural areas – the same percent as in the sample.
Along other dimensions of the data there is no reason to suspect a systematic bias in the sample,
given that it is composed of those observations that are free from data collection and entry errors
in both databases. By way of example, the number of observations by year that are included in
the sample indicates no bias:
Year 2000 2001 2002 2003 Total
Un-matched drivers 44833 50256 46485 50417 191,991
matched drivers 14375 15,381 17020 17191 63,967
% of year's data 24.3% 23.4% 26.8% 25.4% 25.0%
Along other observable dimensions such as sex38
there are no clear biases, making it possible to
cautiously infer that inclusion in the sample is effectively random within the urban subset of the
data.
The urban bias cannot be directly corrected. However, for the purpose of extrapolating the urban
SML sample to rural areas it is necessary to prove – at the very minimum – that rural crashes are
at least as likely to involve drinking drivers as urban crashes. This is in fact the case: Carabineros
records indicate that 6% of urban crashes involve at least one drinking driver, as opposed to 11%
of rural crashes. Moreover, the percentage of drinking drivers in severe urban car crashes is 9%,
somewhat lower than the rural 12.5% rate. Obviously we cannot be sure how closely matched
urban and rural accident data are in terms of alcohol involvement. With the evidence at hand it
seems that if anything rural accidents involve more drinking drivers than urban crashes. This
permits tentative extrapolation of urban data to rural accidents, but the extent to which this is
justified is unclear.
38
For example in both matched and non-matched driver groups women make up approximately 13% of the sample.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
30
Under-reporting of fatalities
Before moving on a bias that is present in all the data, and not just the sample, is the systematic
under-reporting of fatalities by Carabineros, as they report as fatalities only those victims that die
within 24 hours of the crash. Naturally, many more die in the days following the crash, and these
are classed as serious injuries, not deaths, by Carabineros. For example, between January 2000
and the end of 2002 Carabineros recorded 4974 traffic deaths, while the Ministry of Health put
the figure at 627939
– 26% more than Carabineros. Thus, when considering the drink driving
externality we would do well to consider this systematic under-reporting of deaths in the database.
This issue also strengthens the case for examining accidents resulting in both fatalities and
serious injuries, as many of the latter are effectively deaths. Moreover, serious injuries are absent
from US FARS data, making the availability of this data an advantage of this study over studies
using US data.
Under-reporting of Alcohol-Involvement
The combination of the Carabineros and SML databases provides an ideal opportunity to
compare the alcohol measures in each. The SML measure is obviously more trustworthy.
However, it is well to note that even the SML measure is not exact: given that at least one and
often many more hours transpire between the accident occurring and a blood sample being taken,
the sample is a downwardly biased measure. This is because alcohol is eliminated from the
bloodstream at approximately 0.15 BAC units per hour, a rate fast enough to tip many DUI
(0.49
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
31
Legally classified as Drunk 253 2,270 2,523
Total 57,202 4,839 62,041
The number of cases in which Carabineros fails to realize that a driver is in fact drunk is
highlighted in red and is larger than those that are accurately identified. The number in blue is the
opposite, upward bias: cases in which Carabineros identifies drivers as legally drunk (DUI or
DWI), when in fact they are not according to the SML. The explanation for at least a third of
these cases is direct: these had alcohol in their blood and their legal alcohol reading is likely to
reflect the delay between the accident and the taking of the blood sample40
, i.e. they were drunk
at the time of the accident, but by the time the sample was taken their BAC had fallen to
permissible levels.
The extent of the downward bias in the Carabineros alcohol measure (identifying a drunk driver
as sober) is such that Carabineros correctly identified only 47% of drunk drivers. This is
surprising, but not unprecedented. US measures of police under-reporting of alcohol have found
similar figures: 71% correctly identified (Soderstrom et al 1990), 57.1% (Maull et al 1984) and
51.7% (Dischinger et al 1989). Moreover, the Dischinger study found an even lower
identification rate of 28.6% for drivers with a BAC below 1.0, which overlap with our illegal
measures, suggesting that the Chilean rate of 47% is well within the expected range. A final and
more recent reference point is Blincoe et al (2002) which found that in the US state of Maryland
police identified 74% of cases with BAC ≥ 1.0 and only 46% of cases where BACs were between
1.0 and zero.
A closer look at the misclassified cases suggests some explanations: compared to correctly
identified drivers, double the proportion (15%) of those mistakenly reported as sober suffered
either fatal or very serious injuries, and approximately a quarter suffered serious injuries. Thus it
is likely that Carabineros were unlikely to have had close access to these drivers as they may
have been immediately transported to hospital. The remaining 75% of misclassified drivers pose
a problem: how were they misclassified given that they are overwhelmingly (70%) registered as
DWI (BAC>1), rather than the less serious DUI (0.49
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
32
explanation is a combination of Carabineros lack of equipment and training coupled with the fact
that detection of even quite severe impairment (BAC >1) might be somewhat harder than it
appears.
In terms of rural alcohol under-reporting the SML data cannot provide many cases. Nonetheless,
some areas of the Greater Santiago Metropolitan region are effectively rural, and the alcohol
detection table closely parallels that for the whole MR:
Carabineros Data 2000-03 (rural) SML data (rural)
Legally Sober (BAC< 0.5) Legally Drunk Total
Legally Sober (BAC< 0.5) 2,881 164 3,045
Legally Drunk 24 179 203
Total 2905 343 3248
This provides some support for our extrapolation of SML data to the whole country. However, as
the data collection is exclusively by MR police precincts, it is possible that the biases in the rural
accident data are completely different to that in the MR.
Hit and Run accidents
So-called “hit and run” accidents are cases in which one or more drivers flee the scene of an
accident, resulting in missing alcohol-involvement data for those drivers. The evidence
available41
suggests that these drivers have high BACs (which is perhaps why they flee) and thus
are of direct relevance to our study, making up yet another source of downward bias to our
parameters. However, any assumptions made about such a group must be arbitrary and they have
thus been ignored.
41
The Compton et al study pursued hit-and-run drivers, apprehending 94 of 603. Of these, over 69% had positive BACs, typically at high levels. Moskowitz et al 2002 report police in La Puente, California made an effort to
apprehend hit and run drivers, reporting that 65% of those apprehended had positive BACs. However, those caught
are not necessarily representative, as higher BACs may make successful escape more unlikely.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
33
6 – Results
The 2000-2003 database was used to identify the following serious injury or fatality causing 2-
car night-time crashes:
Period: 2000-2003 drunk-sober crashes drunk-drunk crashes sober-sober crashes
Carabineros data 2000-Sept2004 200 34 539
SML data (reduced sample) 81 15 72
SML data (extrapolating for
whole sample) 317 59 282
In terms of the model the following results were obtained:
Period: 2000-2003 θ N % of drinking drivers on the road
Carabineros data 2000-Sept2004 - - -
SML data (reduced sample) 3.8 0.23
18.96% (2.35) (0.094)
SML data (extrapolating for
whole sample)
3.8 0.23
(1.181) (0.047)
It is immediately evident that the Carabineros data by itself is of no use in estimating the RR
parameter, θ: the data is simply incompatible with the binomial distribution - no parameters exist
that could have generated the data. This is the direct result of the downward bias in the
Carabineros alcohol measure, and makes the model unworkable.
Moreover, this incompatibility with the model is not due to the equal mixing assumption not
holding, as the problem here is that the number of Drunk-Sober and Drunk-Drunk crashes are
extremely low, not just that Drunk-Sober crashes are too low for the model to function.
The more trustworthy, although still somewhat downwardly biased SML data do provide a large
enough value for R (R ≥4) and provide an estimate of θ: 3.8. That is, drunk drivers are 3.8 times
more likely to cause a serious injury or fatality causing crash than sober drivers42
. However, the
standard error is too large in the penultimate line of the table for the parameter to be statistically
significant. This is because in this case we are using only those cases in which we have a direct
42
Using the Compton et al RR curve suggests an average BAC for Chilean night-time car drivers of approximately
0.95, similar to the average BAC for dead drivers, 0.75. However, here we are deriving a RR of causing a severe or
fatal accident, while the Compton et al curve is for accidents of all severities. As alcohol is implicated to an
increasing extent as BAC rises, we can infer that the average Chilean BAC implied by the RR estimate is above 0.95.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
34
BAC measurement, and as this is only 25% of the drivers in the sample, the variance of the
estimate is too large. Extrapolating (saying, effectively, that our sample is representative, and
simulating the data) provides a much lower s.e. – though of course this is a mere simulation.
It is important to note the context of this result. In addition to the data problems that have resulted
in our having to use a much reduced sample – the SML data – this is a result that has been
estimated for Chilean car drivers, at night. Even without the small sample size that implies the
result is not statistically significant, it cannot be strictly interpreted as applying to non-car drivers,
or as applying in the daytime. In the sense that drink-driving is principally a night-time
phenomenon, this caveat is not as serious as it first appears. However, the fact that we cannot be
sure that it applies to vehicles that are not cars (trucks, buses etc) is more serious were any policy
relevance to be given to this result, given such vehicles are involved in 38% of serious night-time
crashes.
What the result would provide us with – if standard errors were not a problem – would be a
trustworthy relative risk estimate of a fatal or serious injury causing crash for night car drivers,
which make up the vast majority of night drivers, and an even larger proportion of potential
night-time drivers. As such, this result would be of great value to driver education programs and
would be of substantial use in determining penalties for night-time drink driving. At present it
would appear that no relative risk calculation is involved in the penalty-setting process.
Estimating the parameters taking Under-Reported Alcohol-Involvement into account
It should be clear that the Carabineros dataset by itself cannot yield a parameter estimate.
However, if the degree of alcohol under-reporting can be estimated, then an approximate RR
parameter can be estimated for the whole dataset, and not just the SML sample.
To estimate the extent of alcohol under-reporting we focus on the cases where we have both SML
and Carabineros data on alcohol. These are almost all (157 of 164) the cases used in the SML-
only estimation of θ in the preceding section. These cases are then examined to determine the
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
35
extent of the downward bias in the alcohol measure, as we have the more accurate SML measure
at hand43
.
There are two polar cases by which a biased alcohol measure could affect the number of cases in
each of the three crucial collision categories: Drunk-Drunk (DDC), Drunk-Sober (DSC) and
Sober-Sober crashes (SSC). The first is the least plausible: if a police officer misreports one
driver (as being sober when he is in fact drunk) at a particular crash site, then he will never
misreport the other. This polar case results in a lowering of the value of R44
and hence cannot
serve as a useful guide, as it makes estimating θ even more difficult. The other case is when a
police officer has perfectly correlated reporting errors: if he misreports one driver at a crash then
he always misreports the other. This increases the value of R because in this case the number of
SSC in the data are artificially inflated: many SSC are actually DSC or DDC.
While the latter scenario (correlated police reporting errors) is more plausible it is unlikely to
hold fully: sometimes a police officer will misreport only one of the two drivers at a particular
crash. An examination of the data reveals this to be exactly the case: there are approximately the
same number of cases in which Carabineros misreport one driver and not the other as cases where
both are misreported. However, if we assume that reporting errors are perfectly correlated then
we under-estimate the downward bias in the alcohol measure caused by misreporting: if we
assume an intermediate correlation of reporting errors then the downward bias required to obtain
the same data error-laden data is greater.
Thus, in the simulation below we have assumed perfectly correlated reporting errors. Under this
assumption, the downward alcohol reporting bias required to generate the Carabineros alcohol
measure for those 157 cases in which we have the more accurate SML measure is 51%. That is, if
51% of drunk drivers are misclassified as sober, and reporting errors are perfectly correlated, then
real accident data like the 157 cases under consideration will be reported as Carabineros has
reported them. It is important to note the earlier point: this is the lowest possible value for the
43
By themselves they deliver an R of 1.73 - well below the minimum level for estimation 44
The key issue is whether Drunk-Drunk collisions are shifted to the Sober-Sober or Sober-Drunk categories. If a
police officer never misreports both drivers then all misreported DDC are reported as being DSC, and thus the
number of DSC is artificially inflated and must be lowered, reducing R.
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
36
downward bias in the alcohol measure, and the evidence indicates that the assumption of
perfectly correlated reporting errors does not hold. In all probability the downward bias is higher.
Using this degree of downward bias, and applying it to the 2000-September 2004 dataset, we
obtain the following simulated parameter values, with simulated s.e. in parentheses:
Thus, the implied θ if the degree of downward
bias in the alcohol measure is in fact 51% and
Carabineros misreporting is perfectly correlated is
6: drinking drivers are on average 6 times more
likely45
to cause a fatal or serious injury causing
crash than sober drivers in Chile. If the downward
bias is higher, then the relative risk (θ) will also
be higher, as can be seen from the table, and vice
versa.
45
According to the Compton et al. RR curve for US drivers, this suggests that the average BAC for Chilean night-
time car drivers is above 1.15.
Carabineros data 2000-Sept2004
extent of downward bias Implied θ implied N
32% 1.2 0.31
(6.89) (0.13)
40% 2.8 0.23
(1.11) (0.773)
51% 6.0 0.20
(1.975) (0.001)
60% 13.6 0.18
(6.2) (0.0005)
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
37
Taking Unequal Mixing into account
As noted in the Methodology section, violations of the equal mixing assumption (i.e. that drivers
have proportionately more interactions with other drivers of their same type) bias θ downwards
and N upwards. Estimating the model for different values of the parameter Δ46
(reflecting degrees
of unequal mixing) we obtain the following table:
As can be observed in the table above, as the
Δ we use rises the estimated value of θ
increases (by 26% in moving from Δ=0 to
Δ=0.1) and the value of N falls. A table
containing more values of delta is available in
the appendix. Obviously, we cannot know
which value of Δ most closely approximates
the degree of unequal mixing found in practice, but as Δ rises our estimated θ becomes
statistically significant.
An attempt to choose the value of Δ that best fits the data by maximizing the value of the
likelihood function (V) with respect to Δ47
was unsuccessful: the difference between the highest
and lowest values of V (each of which is the probability of observing the sample) using a
thousand values of delta between 0 and 0.6 is a mere 0.000000001730901. Moreover, as a
numerical optimization method is being used, as the number of iterations is increased this
difference shrinks. In short, a graph of V (on the Y axis) against Δ would be flat, implying that
this method is of no practical use, and that the true value of delta must be approximated by some
other, perhaps experimental, method.
The Alcohol-Involvement of Dead Pedestrians and Dead Drivers
See Appendix 6 for details.
46
A Δ of 0.1 implies that drunk drivers are 10% more likely to interact (not crash, just interact) with other drunk
drivers than with sober drivers. 47
This is because the value of the likelihood function (V)represents the probability of observing this particular
sample. Hence the value of Δ that results in the highest value of V maximizes this probability.
Δ θ (relative risk) N % of drinking drivers
0 3.81 0.23
19.46% (2.35) (0.09)
0.1 4.63 0.21
17.05% (2.48) (0.07)
0.2 5.54 0.18
15.26% (2.68) (0.06)
0.5 8.19 0.13
11.71% (3.34) (0.04)
-
Economics M.Sc. Thesis – William Mullins – 2nd
Semester 2004
38
7- Estimating the External Costs of Drink-Driving
What counts as an externality?
The debate as to the policy relevant costs of alcohol is one of long standing, with estimates of the
total annual social cost of US alcohol use ranging from USD 9.3bn to over 130bn (Heien 1995-6).
Much of the debate revolves indirectly around the issue of whether abuse of alcohol is
appropriately considered rational behaviour.
The Becker & Murphy theory of rational addiction holds that addiction is in no way conclusive
evidence of irrational behaviour – in fact it can be rational. Habits and addictions (extreme habits)
stem from consumption preferences being connected intertemporally. Several effects are in play:
among them what is termed „reinforcement‟ – where past consumption of the good increases the
marginal utility of present consumption. Secondly, the good itself may raise already high
discount rates48
, rationally transforming a habit into an addiction. In short, a coherent and
empirically successful theory exists to justify that consumers are rational when the consume
alcohol in „excess.‟
Standard economic theory holds that the relevant social costs of alcohol involved driving are only
those that can be classified as spillovers. If drinking drivers kill themselves while driving,
causing no other harm then there is no social cost involved. If their passengers die, they are not
externalities either, as they exercise their free will, and internalize the risk in choosing to ride
with such a driver49
. Moreover, drinking drivers, even those who die, receive positive utility from
their consumption choices ex ante.
Some economists have argued that a rigid adherence to consumer sovereignty (the consumer is
rational and takes an optimal consumption path – who are we to say what is best for him?) is not
convincing in this case, and that strict externalities do not capture the full social cost. Pogue and
Sgontz (1989) argue that the optimal tax on alcohol rises markedly if alcoholism is considered to
48
People with high discount rates are more likely to develop habits that may become addictions as they weigh the
future risk of becoming an addict less heavily. 49
Heien notes that Perrine et al 1988 indicates that 83.3% of the passenge