tesis de magÍstereconomia.uc.cl/wp-content/uploads/2015/07/tesis_wmullins.pdfdocumento de trabajo...

56
DOCUMENTO DE TRABAJO Instituto de Economía TESIS de MAGÍSTER INSTITUTO DE ECONOMÍA www.economia.puc.cl One for the Road - Estimating the Drunk-Driving Externality in Chile William Mullins. 2004

Upload: others

Post on 31-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • D O C U M E N T O D E T R A B A J O

    Instituto de EconomíaTESIS d

    e MA

    GÍSTER

    I N S T I T U T O D E E C O N O M Í A

    w w w . e c o n o m i a . p u c . c l

    One for the Road - Estimating the Drunk-Driving Externality in Chile

    William Mullins.

    2004

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    1

    One for the Road – Estimating the Drunk-Driving

    Externality in Chile

    Economics M.Sc. Thesis in Public Policy

    William Mullins

    Drink-driving is a classic negative externality. Nonetheless, it has failed to attract

    economic attention in Chile. This study estimates the relative risk of drunk drivers in causing

    serious accidents1, and the aggregate externality generated by drunk-driving in Chile.

    As an epidemiological phenomenon drunk-driving warrants attention: between the ages of 10 and

    45 it is the joint second highest ranking cause of death for Chileans. Conaset puts the number of

    accidents caused by drink-driving (DD) between 2001 and 2003 at 8,137, in which 472 people

    died and 2,240 were seriously injured. However, such estimates lack a clear methodological

    grounding, confusing the simple presence of alcohol with a causal role. This study aims to

    separate alcohol‟s causal effect from the baseline serious accident risk faced by all drivers.

    The methodology used in this study is taken from Stephen Levitt and Jack Porter‟s 2001 JPE

    article “How Dangerous are Drinking Drivers?” They find that drinking drivers in the US

    (including those not legally classified as drunk) are at least 7 times more likely to cause a fatal

    crash than sober drivers (θ ≥ 7), while for legally drunk drivers θ ≥ 13. The estimation of a lower

    bound for this relative risk, and an upper bound for the proportion of drunk drivers on the roads,

    are the parameters estimated in this paper, allowing an approximate calculation of the aggregate

    externality caused by DD in Chile.

    Levitt and Porter estimate the lower bound of the aggregate US DD externality associated with

    lost lives (no other costs are considered) to be around USD 9 billion in 1993. Chile‟s only

    approximately comparable estimate is from a study commissioned by the Ministry of Public

    1 Serious accidents are defined as accidents that result in at least one death or serious injury.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    2

    Works (MOP) from the consulting firm CITRA in 1996. They estimate that the total annual cost

    of all road accidents in Chile is around USD 6-700 million, or between 7 and 8 percent of the US

    external cost estimate for drink driving alone. This number is used within government as the sole

    basis for public investment proposals, thus according it a policy importance far beyond that of

    most studies. This paper aims to provide a more rigorous estimate of the external costs that will

    allow some perspective as to the magnitude estimated by CITRA.

    The study begins with a review of the theoretical issues that bear on drunk-driving, and follows

    with a review of the evidence on alcohol and crash risk. Sections 4, 5 and 6 detail methodology,

    data and results respectively. Section 7 considers which deaths and serious injuries are rightly

    classed as externalities, and calculates the aggregate spillover. Section 8 concludes.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    3

    2 – Economic Theory and Drunk Driving

    General Considerations

    The accident literature in the US and West European countries often prefaces its remarks on

    alcohol with comments such as “alcohol consumption is involved in x% of fatal crashes” and

    conveys the impression that alcohol causes all accidents it is “involved” in. However, without an

    estimate of the number of drunk drivers on the roads, this figure is meaningless – if the same

    percentage (x%) of drivers have been drinking then alcohol is no more a crash risk factor than

    orange juice. This tendency to demonize alcohol in terms of its crash causation must first be laid

    aside if we are to consider objectively the external cost of drink driving in Chile.

    Moreover, drinking is only one risk factor among many. As Borkenstein et al. note in their

    seminal 1974 study:

    “traffic accidents are the result of interactions among drivers, vehicles and the physical environment. No

    single cause of traffic accident exists. It is not possible to consider a separate element of the accident

    complex in the abstract. These elements operate only in the context of the remaining elements.”2

    Speeding is an example of one such “remaining element.” It also increases the relative risk of

    crashing, and to an extent comparable to drink driving: “driving 65 mph when the speed limit is

    55 mph increases risk of involvement in a fatal crash by a factor of 2.0, similar to the risk

    increase associated with driving with BAC = 0.08% compared to driving at BAC = 0.”3

    Moreover, drivers below the US legal limit of 0.08% are also extremely dangerous, making up

    70% of drivers with a measured BAC in the 2002 US Fatal Accident Reporting System (FARS)

    data. In the 2000-2004 Chilean data, approximately 78% of drivers involved in accidents

    resulting in serious injuries or deaths are recorded as being sober. While this is an over-estimate,

    as will be discussed, most accidents are not caused by alcohol.4 What is also true however, is that

    2 p17 Borkenstein et al. 1974

    3 L. Evans, Ch 10 (2004). The WHO report (2004) cites similar figures (Ch 3 p77)

    4 L. Evans (2004) notes that even if alcohol miraculously disappeared from the roads, 66% of US fatalities in 2002

    would remain.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    4

    alcohol, and perhaps speeding – uniquely among crash risk factors – are perceived by lawmakers

    to be particularly reckless ways to endanger the lives of others, and are directly chosen by the

    drivers involved. As a result, the law assigns property rights to sober, non-speeding drivers. Other

    significant risk factors such as sex and age also increase relative serious crash risk, with young

    men unsurprisingly emerging as the highest risk group: Levitt and Porter report that sober drivers

    under 25 years old pose a fatal crash risk 2.78 times greater than sober drivers over 25, while the

    comparable sober male-female relative risk is 1.36.

    Public policy cannot, of course, focus on removing male drivers from the roads. It focuses instead

    on reducing drink driving and speeding. Moreover, alcohol and speeding dwarf other risk factors

    in terms of the magnitude of the increase in relative risk they provoke. At the Chilean legal DWI

    (Driving While Intoxicated) limit (BAC 0.1%) a driver has a relative crash risk of 4.79, while at

    BAC 0.2% it is approximately 82 times that of a sober driver.5

    A model of the consumption of risky goods

    Thus it should be clear that alcohol is not the sole cause of the devastation often caused by traffic

    accidents. Driving is a dangerous activity per se, in the same way that extreme sports are

    dangerous activities: they increase the risk of death.

    A simple model, developed in Rosen (1981), formalizes how agents determine their optimal

    consumption of risky goods (those that increase risk of death) and beneficial goods (reduce risk

    of death). Define the probability of surviving a single period as q, and utility conditional on

    survival as U(C1,…Cn) for the n available consumption goods. If we consider that consumption

    of certain goods (such as drunk driving) can affect survival probability we can write q =

    q(a1C1,… anCn), where a1…an are non-negative constants. For a good whose consumption

    reduces survival probability the partial derivative qi is negative. If we assume a budget constraint

    5 Compton et al. 2002, p42

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    5

    of j jY p C and maximize expected utility (q(..)U(..)) we obtain the following optimality

    condition6:

    i ii i

    n n

    U Pa qV

    U P ; V= value of a statistical life

    The relevant point here is that the rational consumer „self-regulates‟, in Rosen‟s words. If good i

    is drunk driving (assumed to reduce survival probability by increasing crash risk) then qi is

    negative, making the entire second term positive. This indicates that the ratio of marginal

    consumption utilities must be higher than in the case where consumption of good i does not affect

    risk. In short, this model illustrates the fact that a rational agent takes into account all risks to

    himself: he consumes less of good i given its negative health effects. What the model omits is the

    fact that when this agent crashes while “consuming” drunk driving, the risk is borne in part by his

    passengers and himself, and in part by unfortunate pedestrians or occupants of other cars. The

    risk to these others constitutes a negative externality, and is not factored into the consumption

    decision of our drinking agent.

    This model can also be used to highlight the offsetting effects that result from rational

    consumers‟ reactions to any change in road safety, something that should not be overlooked in

    any cost study such as this. Consider a change that makes driving safer, such as the introduction

    of superior safety technology (e.g. crumple zones, airbags)7, or better enforcement of drunk

    driving laws. The latter reduces the dangerousness of interacting with drunk drivers at night –

    who are present in greater proportions than in daytime hours – and ceteris paribus reduces the

    overall risk of night driving. This safer driving environment should induce more night-time

    driving. In terms of the model, if night driving is good j, then aj will fall (as night driving is less

    dangerous per unit) and the total amount of night driving by sober drivers will rise. If, as has been

    supposed, night driving is an activity that reduces the agent‟s survival probability8, then the

    6 See Appendix 1 for derivation of this result and for derivation of V, the value of a statistical life

    7 Peltzman 1975 notes that “safety regulation has had no effect on the highway death toll…[it] may have increased

    the share of this toll borne by pedestrians and increased the total number of accidents” p677 This is because new

    safety devices have resulted in responses from drivers – such as riskier or faster driving – almost completely

    offsetting the increase in safety brought about by regulation. 8 This is a reasonable assumption: “in times of economic growth, traffic volumes increase, along with the number of

    crashes and injuries…reductions in alcohol-related crashes have also been observed to coincide with periods of

    economic depression” p72 WHO (2004).

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    6

    number of fatal accidents resulting from superior enforcement of traffic laws will be partially

    offset by more accidents caused by sober drivers‟ increased night driving. Thus any study that

    purports to show the lives saved if drink driving were eliminated is implicitly holding offsetting

    activity by other drivers to zero, leading to an overestimate of the benefits of such an outcome9.

    The Economic Issue

    The economic issue at the heart of this paper is the negative accident externality generated by

    drink driving (DD). The externality – defined as a net cost to other members of society not borne

    by the causing agent10

    – results from the higher crash risk of drinking drivers relative to sober

    drivers. Crashes often involve third parties (other drivers, passengers, pedestrians) or their

    property, and given that the law assigns “property rights” over the road to sober drivers, a higher

    crash risk causes a negative spillover effect11

    . This is not to say that only drunk drivers crash –

    we all face a risk of crashing when driving, a risk that depends on numerous characteristics such

    as tiredness, age, experience, and road conditions. This is termed the baseline crash risk. The

    negative externality caused by drink driving is the additional crash risk beyond the baseline level.

    If drinking drivers do not bear the full cost of their actions (because they are not required to or

    cannot fully compensate their victims) then they will choose an individually optimal amount of

    drink driving that is excessive (and thus inefficient) from society‟s viewpoint: for their marginal

    units of drink driving the cost to society is greater than the benefits obtained by such drivers. In

    the Rosen model above this can be seen by noting that the agent considers only the impact that

    consuming DD will have on his health, not on others‟. This is the economic reason behind the

    legal penalties for drunk driving: an optimal tax reduces the individually optimal amount of DD

    9 However, the Rosen model also shows that the marginal willingness to pay for small changes in ai is

    i i n

    i

    dYq CVp

    da . Thus even if a complete offset ensures that P(survival) does not change, the willingness to pay

    may be positive and large, making the exercise worthwhile. 10

    This definition should include the caveat that another agent‟s actions do not constitute an externality if they change

    market prices (this is a pecuniary „externality‟ and generates no inefficiency). A related definition links spillovers to

    the absence of functional markets. 11

    Both drivers are equally responsible for the accident from an economic point of view: were either of them to have

    stayed at home, the accident would not have occurred. It is the legal definition of property rights that establishes the

    blame with one party; such is the case with alcohol-involved driving.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    7

    so that it coincides with the socially optimal amount, by forcing the driver to internalize the costs

    of his dangerous driving.

    Its external effects notwithstanding12

    , drink driving - like speeding - creates private benefits for

    the drivers involved because its avoidance can be costly in terms of time or money. Indeed, this

    discussion is not intended to make the point that drunk driving should be eliminated: it is possible

    that the socially optimal amount of drink driving is not zero – and the fact that the legal BAC

    limit is 0.05 and not zero is a testament to this fact13

    .

    Does Insurance make a (theoretical) difference? Do Private Lawsuits?

    The issue of insurance is important and must be considered: if a person injured by a drunk driver

    is insured then does an external cost exist? The answer: almost certainly, as only if the drunk

    driver is successfully sued by the victim will the externality be fully eliminated. A system with

    perfectly defined and enforced property rights would ensure this, but as most authors consider

    that the probability of a successful private suit is low in the US, it can be confidently assumed

    that it is even lower in Chile. Moreover, even in the most favourable case in which the private

    suit is successful and substantial damages are awarded, it seems unlikely that any financial

    compensation can fully restore the utility lost by dying – the agent himself has disappeared. If the

    basic unit of society is held to be the household the question becomes: can money fully replace a

    lost family member? While the answer depends on the dead individual, some uncompensated

    external cost must surely remain, whatever the payout.

    Another relevant limit to the role of private legal suits is that the wealth level of the driver is a

    binding upper limit to judicially dictated compensation. Given that most plausible estimates of

    the statistical value of life in Chile range from 0.3 to 1.4 million USD, the average drinking

    driver is in no financial position to fully compensate the victim(s).

    12

    Driving involves continuous interaction with other drivers, making it rife with non alcohol-related externalities,

    most notably congestion and accident spillovers. These refer to the fact that an additional driver adds to the overall

    congestion level, increasing the travel time of all drivers, while also increasing the general accident risk. 13

    If the optimal amount of DD is in fact zero then any tax above the marginal damage that DD causes will attain the

    optimal internalizing outcome.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    8

    In short, insuring victims does not solve the problem posed by the DD externality, as the social

    cost of the activity remains above the social benefits despite the existence of insurance14

    .

    Moreover, the possibility of effective private lawsuits does not provide the necessary deterrence.

    Hence, a potential drunk driver may be under-deterred by such a system.

    Optimal Law Enforcement

    The „optimal tax‟ that internalizes the spillover is a deceptively simple term for what is in fact a

    complex instrument made up of 2 broad policy tools: the penalty paid when an offender is

    apprehended and the probability of detection or apprehension.

    The standard textbook solution to a negative externality is the Pigou tax, in which the probability

    of detection of an “offence” (p) is approximately one and the optimal fine (penalty) that offenders

    face is equal to the marginal damage caused by their actions. However, in the real world, the cost

    of a p approximately equal to one is likely to outweigh the damage done by DD: it would require

    huge expenditures on police and surveillance equipment, and severe violations of individual

    liberties.

    The economic theory of law enforcement (see Polinsky & Shavell, 2000) makes use of an

    intuitive and simple result: for risk neutral agents a combination of a high p and a low penalty

    (assume it is a fine) results in the same level of deterrence as a low p, high fine combination. As

    it is costly to catch offenders (i.e. Drinking Drivers, DD) with a high probability then the latter

    combination is more cost effective way to generate deterrence.

    Moreover, deterrence in this context is exactly what is required, as if it is set at the right level it

    makes the expected penalties the DD will have to pay equal to the harm caused to society by their

    externality. If we define F*RN as the optimal fine for a risk neutral DD, and h as the harm he

    does to society, then the following equation illustrates the efficient solution in a static context:

    p F*RN = h i.e. F*RN = h/p

    14

    Moreover, liability insurance for drivers removes even the slight deterrent effect of possible lawsuits from victims.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    9

    An accident caused by a DD also causes costs to society as a whole, such as the (judicial and

    police) costs of imposing the fine (k) and those of investigating and prosecuting the accident (s).

    Moreover, given that many cases do not result in fines, we must also include the probability that

    a fine will be imposed as a result of the prosecution stage (q). Incorporating these costs to the

    model results in a new, larger optimal fine, as drunk drivers also generate these costs in addition

    to the direct externality:

    F* RN = (h/pq) + (s/q) + k

    However, if p were slightly reduced from the level that generates the equality above, then no first

    order social costs would ensue, as the marginal drunk drivers induced to drive because of the

    change generate only slightly higher social costs than benefits. The advantage of reducing p

    however, is that enforcement costs are be saved. Thus, with costly enforcement, some under-

    deterrence is optimal (i.e. in the simple version p F*RN < h). How much p should be lowered

    depends on the balance of savings in enforcement in comparison to the costs of under-deterrence.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    10

    3 – How Alcohol Affects Driver Risk – a Review of the Literature

    The effects of alcohol on drivers can be usefully divided into three main categories: survivability,

    performance, and behaviour15

    . Survivability refers to the fact that vehicle occupants with positive

    BAC are more likely to die from the same physical impact than occupants with zero BAC.16

    As

    this only affects drinkers, it does not constitute an externality. The second effect – performance –

    refers to the functioning of driving relevant skills under the influence of alcohol. There is little

    room for doubt, after hundreds of laboratory experiments that alcohol reduces driver performance

    in terms of coordination, reaction time, spatial orientation and other relevant skills, and that it

    does so to an increasing extent as BAC rises17

    .

    The third category of effect produced by alcohol is that of a detrimental effect on driver

    behaviour. It appears that drinking, by reducing social inhibitions, also encourages more

    aggressive and riskier driving. It is an empirical regularity that alcohol is present in an increasing

    fraction of drivers as severity increases. This suggests that it contributes most to severe crashes.

    In the Chilean 2000-2004 data (which significantly under-reports alcohol involvement as will be

    shown later) this relationship can be observed in the following table:

    Source: Carabineros data 2000-Sept 2004 % of Crashes and Collisions involving at least 1 drinking driver

    Crashes and Collisions involving:

    all outcomes: injuries & no injuries 6.7%

    any kind of injury 9.0%

    medium severity injuries or worse 11.9%

    serious and fatal injuries only 13.4%

    fatal injuries only 18.1%

    As noted by Evans, this relationship suggests that alcohol‟s most salient effect is in changing

    driver behaviour towards taking greater risks and driving at higher speeds. If the performance

    effect were most important, then drinking drivers„ increased driver error would be present at all

    crash severities, and alcohol prevalence would not increase with crash severity. “It appears that

    15

    I owe this structure to Evans, op.cit. 16

    According to Evans (ibid), a vehicle occupant with a 0.08 BAC is 73% more likely to die from the same crash than

    one with zero BAC. 17

    Moskowitz et al. (2000) have demonstrated that alcohol significantly affects some driving skills for some subjects at BACs as low as 0.02%

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    11

    drivers do things when they are drunk that they would not attempt when sober, rather than merely

    executing poorly the same things they would do more skilfully when sober.”18

    Thus behavioural

    effects, although neglected by the literature, appear to be a key factor behind the observed BAC–

    crash relationship in accident data19

    .

    Is the alcohol-crash association due to other factors correlated with drinking?

    The possibility that alcohol is not the causal agent behind actual (as opposed to laboratory

    simulated) crashes has yet to be fully discarded. It is possible that a high level of relative risk for

    drinking relative to non-drinking drivers could result from an association between drinking

    behaviour and other dangerous driving characteristics. That is, dangerous drivers tend to drink,

    but drinking itself is not what causes increased crash risk. While implausible in the light of

    laboratory and behavioural evidence cited above, such a possibility must be discarded if we are to

    have confidence in the analysis undertaken by this paper, given that additional driver

    characteristics (beyond age, sex and licence type) are not collected in Chile, and thus cannot be

    controlled for. To this end the epidemiological literature is briefly reviewed to make manifest the

    causal effect of alcohol on crash risk in real driving situations

    The Three Main Methods of Determining the Contribution of Alcohol to Crash Risk

    A – The Case – Control method

    Three main methods exist for determining alcohol crash risk. The first is the case-control

    method20

    , in which the BAC levels of drivers involved in traffic crashes are compared with those

    of a control group of drivers matching the accident drivers as closely as possible. From a

    comparison of these groups – and controlling for possibly confounding covariates – a relative risk

    (RR) curve is estimated:

    18

    Evans, op cit. 19

    Evans notes that much work remains to be done regarding the effect of alcohol on speed choice, citing an Australian study that found that drivers with BACs over 0.05% were driving faster when apprehended. 20

    A more detailed review of this method is provided in Appendix 2.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    12

    Compton et al 2002 covariate adjusted Relative Risk curve

    0123456789

    10111213

    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3

    BAC

    RR

    of

    cra

    sh

    in

    vo

    lve

    me

    nt

    If US drivers have a similar RR to Chilean drivers, then at the Chilean DUI limit of 0.5 BAC the

    RR is 1.38, while drivers are 4.79 times more likely to cause a crash (of any severity) at the DWI

    limit of 1.0 BAC21

    .

    B – Combining Accident data with Roadside Surveys

    A similar but more general methodology is that which uses national accident data (such as the

    Carabineros data used in this study, or the US FARS data) and a roadside survey to provide

    exposure data (e.g. Zador 1991). The key differences between the case control and the roadside

    survey-FARS methodologies are in the selection of control drivers and their „representative-ness‟.

    Clearly the case control studies have a more reliable measure of exposure, as the drivers in their

    control groups are driving in exactly the same place as the accident drivers. However, in their

    precision lies their weakness: they may simply (albeit accurately) reflect local driving. Moreover,

    using national data rather than data from a single locality may be more reliable in that it can work

    with degrees of freedom orders of magnitude above those available in case control studies.

    The most recent of such studies in the US is Zador et al (2000), which uses FARS data and the

    1996 National Roadside Survey. Their results are somewhat above the case control studies, and

    21

    We cannot be sure that they are: a relative risk curve compares sober to drunk drivers. Chilean drivers may be very

    different - both when sober and when drunk, and to different degrees – to US drivers.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    13

    are somewhat suspect as a result: the major case control studies have obtained similar relative

    risk curves, suggesting that the RR curves they generate are more worthy of confidence.

    C – The Levitt-Porter Methodology

    The third and final methodology is that used by Levitt and Porter, in which no separate set of

    control data is used. Instead, the proportion of drivers in each group is estimated once the RR (θ)

    has been estimated. Identification strategy is discussed more closely in the methodology section.

    Levitt and Porter note that the case-control studies fit well with their results given reasonable

    distributions of BACs across drinking drivers.

    This methodology is unable to separate RR by fine gradations of BAC because of degrees of

    freedom restrictions in smaller countries such as Chile, and because the BAC level is often not

    exactly recorded. This methodology has not, to my knowledge22

    , been applied elsewhere. Hence

    only their results for the US, as cited above, are available for direct comparison. Moreover, no

    estimates for Chile are available, either of the externality, or of the relative risk of drinking as

    opposed to sober drivers – or of any other RR curve.

    22

    No relevant papers cite the Levitt and Porter paper according to IDEAS.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    14

    4 - Methodology

    The Levitt-Porter model‟s first virtue is the minimal data required for its estimation. All that is

    required is fatal crash data such as the date, time, type of accident, alcohol-involvement (i.e.

    whether drivers had been drinking), and the number of deaths and injuries. At first, inferring both

    the relative risk θ, and N from this data alone appears impossible. As the authors note:

    “Separately identifying the fraction of drinking drivers on the road and their relative risk of a

    fatal crash using only the fraction of drinking drivers in fatal crashes is ostensibly equivalent…to

    separately identifying per capita income and population on the basis of only aggregate income

    data.”23

    They achieve the apparently impossible by recognizing that for 2 car crashes (hereafter called

    collisions) the relative frequency of accidents involving 2 sober, 2 drinking or 1-drinking-and-1-

    sober drivers contains enough information to estimate θ. The idea is that the number of fatal

    collision opportunities is given by the trinomial distribution – equivalent to randomly drawing

    coloured balls from a bag. If drinking does not increase RR then the crash data will closely mimic

    the trinomial distribution of crash opportunities. If the data for actual crashes differs from that

    given by crash opportunities (trinomial distribution) we are able to identify how much more

    dangerous drinking drivers are (θ) as N can be eliminated from the model. Once we have θ, then

    we are able to obtain N.

    The basic assumptions of the Levitt-Porter (2001) model24

    Notation:

    Ni = the total number of drivers of type i

    I = an indicator variable equal to 1 if two cars interact25

    A = an indicator variable equal to 1 if two cars collide, resulting in a fatal crash

    P(i,,j | I=1)= Probability that the drivers are of type i and j, given that an interaction takes place.

    Assumptions

    23

    Levitt and Porter (2001), p1199-1200 24

    This section follows the Levitt and Porter 2001 paper very closely, for obvious reasons. A few sentences are taken

    verbatim and are not explicitly quoted to avoid uninteresting footnotes. 25

    An interaction is a 2 car crash opportunity: 2 cars pass on the street. Given an interaction, a single driver error can

    cause a collision

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    15

    1. There are 2 driver types: D and S

    a. This is easily generalized to more types

    b. Thus ND + NS = NTotal

    2. There is equal mixing of D and S drivers on the roads, i.e.

    a. The number of interactions a driver has with other cars is independent of the driver‟s type:

    ( | 1)( )

    i

    D S

    NP i I

    N N

    b. A driver‟s type does not affect the composition of the driver types with which he interacts:

    ( , | 1) ( | 1) ( | 1)P i j I P i I P j I

    3. A fatal crash results from a single driver‟s error

    4. The composition of driver types in a crash is independent of the composition of driver types in other

    crashes

    5. A drinking driver is at least slightly more likely to make an error resulting in a crash than a sober

    driver, i.e. θD > θS

    The assumption doing the most work is assumption 2: equal mixing of sober and drunk drivers.

    Over a small enough area and time period, it is reasonable. Over an entire country and year it

    becomes less so. Assumption 2 gives the joint distribution for a pair of driver types, conditional

    on an interaction between two drivers:

    2( , | 1)

    ( )

    i j

    D S

    N NP i j I

    N N

    (1)

    Assumption 3 implies that the likelihood of a fatal crash is the sum of the probabilities that either

    driver makes a fatal error, minus the probability that both drivers make a mistake. The latter

    probability is extremely small, and is ignored:

    ( 1| 1, , ) i j i j i jP A I i j (2)

    Developing the model

    Multiplying equations (1) and (2), we obtain the joint probability of driver types and a fatal crash

    conditional on an interaction between two drivers is as follows

    2

    ( )( , , 1| 1)

    ( )

    i j i j

    D S

    N NP i j A I

    N N

    (3)

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    16

    In words, given that two random drivers interact, the probability that a fatal crash occurs and that

    the drivers involved are of the specified types, is simply equal to the likelihood that two drivers

    passing on the road are of the specified types multiplied by the probability that a fatal crash

    occurs when these drivers interact.

    The key relationship we seek is the probability of driver types conditional on a fatal accident

    occurring, rather than conditional on an interaction taking place. That value can be obtained from

    equation (3) through an application of Bayes‟ Theorem (dropping the I=1 condition):

    From the definition of conditional probability ( , , 1) ( 1) ( , | 1)P i j A P A P i j A - and we want

    to isolate the final term. To this end note that,

    ( 1) ( 1| , ) ( , )i j

    P A P A i j P i j , i.e. P(A=1) is

    simply equation (1) in all possible combinations of i and j.

    Thus,

    ,

    ( 1, , )( , | 1)

    ( 1| , ) ( , )i j

    P A i jP i j A

    P A i j P i j

    and we obtain:

    2 2

    ( )( , | 1)

    2[ ( ) ( ) ( ) ]

    j i j i

    D D D S D S S S

    N NP i j A

    N N N N

    (4)

    Let Pij represent the probability that the drivers are of type i and j given that a fatal crash occurs.

    We can explicitly state the values of Pij by simply substituting for i and j in equation (4):

    2

    2 2

    ( )( , | 1)

    ( ) ( ) ( )

    D DDD

    D D D S D S S S

    NP P i D j D A

    N N N N

    (5)

    2 2

    ( )( , | 1) ( , | 1)

    ( ) ( ) ( )

    D S D SDS

    D D D S D S S S

    N NP P i D j S A P i S j D A

    N N N N

    (6)

    2

    2 2

    ( )( , | 1)

    ( ) ( ) ( )

    S SSS

    D D D S D S S S

    NP P i S j S A

    N N N N

    (7)

    Note that the ordering of the driver types does not matter. Thus, in equation (6) the probability of

    a mixed drinking-sober crash is the sum of the probability that i is sober and j is drinking plus the

    probability that j is sober and i is drinking.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    17

    Examination of equations (5)-(7) reveals that there are only three equations, but four unknown

    parameters (θD, θS, ND, NS). As a result, all four parameters cannot be separately identified - only

    the ratios are identifiable. Therefore, let θ=θD /θS and N=NS /ND. θ is the relative likelihood that a

    drinking driver will cause a fatal two-car crash compared to a sober driver, and N is the ratio of

    sober to drinking drivers on the road at a particular place and time. Dividing both numerator and

    denominator of equations (5)-(7) by 2

    1

    S SN expresses them in terms of θ and N as follows:

    2

    2( , | )

    ( 1) 1DD

    NP N A

    N N

    (8)

    2

    ( 1)( , | )

    ( 1) 1DS

    NP N A

    N N

    (9)

    2

    1( , | )

    ( 1) 1SSP N A

    N N

    (10)

    The next step is to derive the likelihood function.

    Aij is defined as the number of fatal crashes involving one type j and one type i driver. Given

    assumption 4 (independence across fatal collisions) and the total number of fatal crashes the joint

    distribution of driver types is given by the trinomial distribution:

    ( )!( , , ¦ ) ( ) ( ) ( )

    ! ! !DS SSDD A AADD DS SS

    DD DS SS TOTAL DD DS SS

    DD DS SS

    A A AP A A A A P P P

    A A A

    (11)

    Substituting PSS, PDD and PDS into the equation above produces the likelihood function, and the

    equation is estimated by maximum likelihood, with the following fairly self-evident result:

    DD DS SSP ; P ; P

    DS SSDD

    TOTAL TOTAL TOTAL

    A AA

    A A A

    Levitt and Porter then take advantage of the fact that in the binomial distribution ADS2 is in fixed

    proportion to the product of ADD and ADS to eliminate N from the equation:

    2 2 2

    2

    ( ) ( 1) 12DS DS

    DD SS SS DD

    A P N

    A A NP P

    (12)

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    18

    Thus we can estimate θ from the observed distribution of crashes ONLY. Defining 2

    DS

    DD SS

    A

    A A as R,

    and multiplying by θ we obtain a quadratic equation:

    2 (2 ) 1 0R . If R = 4 then θ = 1, that is, the observed distribution of collisions matches

    the distribution of fatal crash opportunities: drinking does not affect driving. If R4 then 2 solutions always exist, one with θ1. By

    assumption 5 the former is discarded, and we have an estimate of θ.

    Estimating N

    One car crashes are incorporated into the model, but their identification depends on having first

    estimated N (the proportion of drinking drivers) from collisions. N is estimated from the

    following FOC of the likelihood function, substituting the ̂ estimated above for θ:

    [ ( ) ]1

    1[ ( ) ]

    1

    DS DD

    DS SS

    A A

    N

    A A

    (13)

    Standard errors are derived using the delta method, as described in Appendix 3.

    Estimating the RR of one car crashes, λ

    Lambda (λ), or the relative risk of one car crashes for drunk drivers (analogous to θ), is estimated

    in similar fashion. Let QD and QS denote the probabilities that a drunk or a sober driver is

    involved in a given one car crash:

    ( | 1)

    j j

    j

    D D S S

    NQ P i j Crash

    N N

    ; with j=D,S (14)

    We can define λ as λD/ λS and equation (14) for both j= D and S can be combined to give:

    D

    S

    QN

    Q , while both QD and Qs are obtained from the accident data. As can be observed,

    lambda can be estimated only by using the estimator of N.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    19

    Violations of the assumptions

    The next step is to consider violations of the assumptions. The key point is that violations of the

    assumptions generate downward biases for θ, making it a reliable lower bound. Possible

    violations are discussed in more detail in Appendix 4, but violations of Assumption 2 are

    sufficiently important to merit consideration here.

    Relaxing the Equal Mixing assumption

    The model requires an equal mixing assumption (Assumption 2), which holds that over a given

    geographical and temporal area (for example on weekend nights in Santiago) drivers of both

    types are homogeneously distributed. This assumption, as you may recall, has two parts:

    a. The number of interactions a driver has with other cars is independent of driver type:

    ( | 1)( )

    i

    D S

    NP i I

    N N

    A2(a)

    b. A driver‟s type does not affect the composition of the driver types with which he

    interacts: ( , | 1) ( | 1) ( | 1)P i j I P i I P j I A2(b)

    Combining the two results in equation (1), as noted earlier: 2

    ( , | 1)( )

    i j

    D S

    N NP i j I

    N N

    A2 is a demanding assumption: we must consider only very small space time areas to be sure of

    its holding, as driving conditions change from hour to hour, and between neighbourhoods: the

    Suecia district on a Saturday night between 4 and 5 am is very different to the Alameda area on a

    weekday at 9 pm – it is likely that Suecia would have a far higher share of drunk drivers, as

    would a night time period as opposed to a daytime period, for example. As Borkenstein et al note:

    “within a fairly short period of time (not exceeding one hour) the driving conditions or exposure

    for a driver using a specific “block” tended to remain relatively constant.”26

    Thus ideally the

    model should be estimated for 50-60 minute periods, over a 5-10 city block area – something that

    real world data simply cannot make possible.

    26

    Borkenstein et al. (1974) p 22

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    20

    Thus we can be relatively sure that A2 will not hold perfectly in the data. That is, drunk drivers

    will be somewhat concentrated in certain space-time regions (near bars and at night), and the

    same will hold for sober drivers in other areas. This results in a lower number of drunk-sober

    interactions (crash opportunities) than predicted by the model under the equal mixing assumption,

    and a higher number of same type (DD and SS) interactions, exerting a downward bias on our

    estimate of θ (R falls) and an upward bias on our estimated N (as more DD crashes occur, and θ

    is lower, N must rise). As the units of temporal and spatial analysis shrink, violations of this

    assumption will be less severe, and the downward bias it exerts on estimates of θ should fall.

    This prediction is borne out by Levitt and Porter‟s results. Applying their model to US fatal

    accident data assuming equal mixing over the whole US for their entire sample (1983-93, 8pm-

    5am) they estimate θ = 3.79 (s.d. = 0.14). As they reduce the space-time areas over which they

    assume equal mixing – weakening the bias caused by violation of A2 – their estimated θ rises to

    7.51.

    Levitt and Porter suggest amending the model to allow an increased probability of same-type

    interaction, while still maintaining the reasonable assumption that the number of interactions a

    driver type has is proportional to the percentage it makes up in the overall driver population, i.e.

    that: ( | 1)( )

    i

    D S

    NP i I

    N N

    should still hold

    27. This does not wholly eliminate the problem

    posed by the violation of the equal mixing assumption, but it does allow a reduction in the

    downward bias it produces.

    They do not explicitly solve the amended model. They simply note that for Δ = 0.1 (a 10%

    increase in DD interactions), their estimates of both one and two car relative risks rise by

    approximately 25%.

    27

    This is a reasonable requirement, as otherwise we world be requiring that either drunk or sober drivers have

    proportionately more interactions than the other type i.e. that one type passes more cars than the other by driving in

    more congested areas or simply by driving longer distances. Such a requirement would necessarily be entirely

    arbitrary.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    21

    Developing the Model with Unequal Mixing

    Define the parameter representing the increased probability of a Drinking-Drinking (DD)

    interaction as Δ. Thus equation (1) changes from:

    2 2

    2 2( , | 1) D D

    D S Total

    N NP D D I

    N N N

    to:

    2

    2

    1( , | 1)

    D

    Total

    NP D D I

    N

    (15)

    Similarly, for Sober-Sober (SS) and Drunk–Sober (DS) interactions:

    2

    2

    1( , | 1)

    S

    Total

    N xP S S I

    N

    (16)

    2

    1( , | 1)

    D S

    Total

    N N zP D S I

    N

    (17)

    The amended model has 3 additional parameters (Δ, x and z), as described by equations (15-17).

    x and Δ should be positive, reflecting the fact that the clustering of same-type drivers leads to

    more same-type interactions and z should be negative. To solve for these in terms of N, θ and Δ

    we must impose the condition mentioned earlier regarding P(i | I=1). This term reflects the

    probability that an interaction involves a driver of type i and can be expressed as:

    ( | 1) ( , | 1)j

    P i I P i j I . Thus:

    ( | 1) ( , | 1) ( , | 1)P D I P D S I P D D I (18)

    ( | 1) ( , | 1) ( , | 1)P S I P S D I P S S I (19)

    Substituting (i) , (ii) and (iii) in (iv) and (v) along with A2(a) yields:

    2x N

    z N

    Solving the model with these modifications produces an equation for R in which N does not

    cancel out, unlike the standard model:

    2 2 2 2

    2

    ( 1) (1 ) ( 1) (1 )

    (1 )(1 ) (1 )(1 )

    z NR

    x N

    (20)

    Define 2

    2

    (1 )

    (1 )(1 )

    NB

    N

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    22

    Then:

    2 2 2

    2

    2

    ( 1) (1 ) ( 1)

    (1 )(1 )

    (2 ) 0

    NR B

    N

    B B R B

    Recalling that the value of R is obtained from the dataset:

    2 2( 2 ) (2 ) 4

    2

    R B B R B

    B

    (21)

    Thus θ cannot be estimated without a value for N and Δ, as these are components of B.

    Identification of delta from the model is impossible, and a value for this parameter must be

    assumed. However, N was estimated using the following iterative procedure.

    First, equation (20) was solved for an arbitrary value of N28

    and a given delta, obtaining an

    estimate of θ ( ˆi ). Then, using ˆi , N was estimated using equation (13) (unchanged by the

    amendments to the model):

    [ ( ) ]1ˆ

    1[ ( ) ]

    1

    DS DD

    DS SS

    A A

    N

    A A

    (13)

    This estimate for ˆ iN was then used to re-estimate θ using equation (20), resulting in 1ˆi , and the

    whole procedure was repeated until ˆi converged, with this final ˆi used to obtain a final

    ˆiN .

    28

    The initial value of N makes no difference to the final estimates of N and θ at any relevant level of accuracy.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    23

    5 - The Data

    Two complementary datasets are used in this paper. The first is the Carabineros de Chile dataset

    on all reported traffic accidents in Chile from 1 January 2000 to 30th

    September 2004. Earlier

    data either does not exist or is of no use as drinking status and rut identification numbers were not

    recorded.

    Restricting the Dataset

    Because our interest is in drunk-driving we limit the sample to the period in which drunk driving

    is most common: night-time driving. There are three reasons for this decision. The first is

    consistency with the literature: for this study to be comparable to the relative risk estimates that

    have gone before, the focus must be exclusively on night driving. All the studies – without

    exception – referred to in the review of the literature focus on night driving, including Levitt and

    Porter29

    . Some – unlike this study – narrow the focus further, concentrating exclusively on

    weekend nights30

    , when the concentration of drinking drivers reaches its peak. Nonetheless, a

    relative risk curve based solely on night driving raises the suspicion that such a curve might not

    be readily applicable to drunk driving in the daytime.

    This is a serious issue, and one that admits of no easy solution, as during the day the proportion

    of accident-involved drinking drivers fall dramatically, as it does on weekdays, making

    estimation difficult outside of these timetables:

    Occurs: at night in daytime on a weekend on a weekday

    % of accidents

    involving at least 1 DD 18.6% 2.7% 11% 3.2%

    Source: Carabineros data 2000-2004

    As can be seen from the table, during the day on a weekday there are proportionately very few

    drinking drivers. In fact, in the whole dataset Carabineros recorded only 3 Drinking-Drinking 2

    29

    Levitt and Porter (2001) p 1213 “we limit our sample to those hours (8:00 p.m – 5:00 a.m.) in which drinking and

    driving is most common” 30

    see Lund & Wolfe (1991) and Zador (1991)

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    24

    car collisions over the 2000 - September 2004 period. Thus, the second reason for this restriction

    is that estimation is virtually impossible without it.

    The third and final reason for this restriction is that the night is essentially a distinct driving

    environment to the daytime, with a much lower traffic density, different traffic light settings, and

    a different accident distribution. This is reflected in the fact that only 14.7% of serious accidents

    occur in the daytime, as opposed to 25% at night31

    .

    To avoid arbitrarily choosing at exactly what time “the night” begins and ends, the UOCT

    (Unidad Operative de Control de Tránsito) for the metropolitan region provided an average

    timetable for the night programming of traffic lights, which are based on detailed traffic flow

    studies. This is designed to capture the marked change in traffic conditions from night to day, and

    provides a useful reference point. Interestingly, the UOCT timetable closely resembles the

    timetable that maximizes the proportion of drinking drivers.

    Metropolitan Region The rest of Chile

    UOCT definition

    Weekdays 2300 – 0630 2130 - 0730

    Friday night - Saturday 2300 – 0900 2130 - 0900

    Saturday night - Sunday 2200 – 1000 2100 - 1000

    Sunday night - Monday 2100 – 0630 2100 - 0730

    Definition used in this study

    Weekdays 2300 - 0630 2200 - 0700

    Friday night - Saturday 2300 - 0900 2200 - 0900

    Saturday night - Sunday 2300 - 1000 2200 - 1000

    Sunday night - Monday 2200 - 0630 2200 - 0700

    The definition used in this study is slightly different to both the UOCT timetable and the

    timetable that maximizes the proportion of drunk drivers in the accident data. The UOCT

    timetable has the virtue of capturing a large proportion of drivers, while the maximizing timetable

    ensures a high proportion of drinking drivers in the data. The definition used in this paper was

    chosen as a middle path between the two: it stays close to the UOCT timetable (capturing more

    31

    The last two figures exclude non passenger vehicles and buses – the final restriction discussed.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    25

    drivers than the maximizing timetable), while also ensuring a high proportion of drinking

    drivers.32

    . As can be observed from the table above, differences between the definition used and

    the UOCT timetable are slight.

    The final restriction on our dataset involves excluding trucks, buses, tractors, motorcycles and

    bicycles. In short, we examine only car crashes. The logic for this restriction is direct: any crash

    involving such a non-car road vehicle is extremely likely to be serious, due to the mass (either

    high or low) of the vehicles involved. Thus for these vehicles almost every crash is serious

    (exaggerating somewhat), making serious crashes much more numerous and much less related to

    poor driver performance such as that caused by alcohol. Moreover, drivers of such vehicles

    (excluding bi- and motor-cycles) represent a sub-group of the driver population with entirely

    distinct characteristics: they are likely to have more driving experience, drive far greater

    distances and for different purposes, and value their licences more highly – hence they are less

    likely to be drunk. As this study aims to find a RR for the general driver population, this sub-

    group represents a confounding influence and is removed33

    .

    An examination of the effects of these restrictions on the data is available in Appendix 5.

    The Alcohol Involvement variable

    The measure of alcohol involvement is classified in the following categories:

    This table also shows the BAC levels

    that correspond to each category in

    theory. In practice, the relatively few

    breathalyzer units [220 in the whole

    country, 59 in MR] possessed by

    Carabineros de Chile are rarely on hand

    for an exact measure. Instead, the officer‟s assessment is used to gauge impairment – the same

    measure used by Levitt and Porter for their main results. It is likely that this measure is biased,

    32

    While the definition used is a middle path between the two, using the UOCT timetable hardly changes the RR

    estimates, while the max DD timetable raises them slightly. 33

    Levitt & Porter and all other studies also eliminate such drivers, even eliminating taxis, which is not done in this

    study, although this makes little difference to our results.

    Alcohol Involvement measure thresholds

    Physically unimpaired BAC = 0

    Physically deficient driving condition 0 < BAC < 0.5

    Under the influence of alcohol (DUI) 0.5 ≤ BAC < 1.0

    Intoxicated (alcohol DWI) BAC ≥ 1.0

    Under the influence of drugs -

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    26

    although the extent and direction of the bias is impossible to predict a priori: for example,

    Carabineros could classify all drivers with traces of alcohol on their breath as drunk, or

    systematically sub-report alcohol involvement as Lund and Wolfe (1991) report.

    A more accurate measure of alcohol involvement is obtained from the blood samples that are

    legally required for all drivers after every accident involving serious injuries. These are carried

    out by the Metropolitan Region‟s Servicio Médico Legal (SML) or State Coroner, and regional

    SML‟s or ad hoc coroners in more remote areas. The Metropolitan Region (MR) SML covers

    approximately 35% of all accidents34

    , and over 70% of random breath test samples. It is the MR

    SML‟s database of rut and BAC data covering the period 1st January 2000 to 30

    th December 2003

    that is used in this paper.

    Combining Carabineros and SML data

    As the SML database contains only the date the sample was received, BAC and rut, and includes

    many random breath test (i.e. not from an accident) samples, combining it with the Carabineros

    database is essential: it ties the BAC measures to specific accidents and drivers. However, for a

    successful match to occur many not inconsequential hurdles must be surmounted. Firstly,

    individual rut data must be correctly collected and correctly entered into the database. For

    Carabineros and the SML each of these tasks is a formidable obstacle35

    . For example, in the case

    of data entry many illegible numbers result in 8,7 or even 3 digit rut identification numbers in the

    database instead of the required 9 digits. Furthermore, poor transportation ensures that many

    blood samples are broken in transit, resulting in congealed samples and hence missing BAC

    measures.

    In addition to unusable or missing data, the date of the accident and the date the BAC sample was

    received by the SML are rarely exact matches. This is because the date the sample was received

    or taken is often the day after the accident, given that many such accidents occur before midnight.

    34

    That is, they measure BAC for all dead traffic victims (drivers, passengers and pedestrians) and drivers‟ BAC in

    all accidents involving at least some degree of serious injury. Percentages are from SML data for 2000 and 2001. 35

    The fact that the rut identification number is 9 digits long, and must be deciphered from an often illegible

    (according to the SML Statistics department) hand-written accident or sample data form is especially conducive to

    error. Outdated, non 21st century compliant or error ridden database software at both the SML and Carabineros de

    Chile is an additional hindrance.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    27

    Moreover, if the crash victim dies, then several days may elapse before the blood sample arrives

    at the SML. This is especially true of cases in which the crash victim dies after days or weeks in

    hospital. In these cases it is usual to have 2 BAC samples: one taken relatively soon after the

    accident occurred, and one taken near the time of death. The former is preferred in the measure

    used in this paper, for obvious reasons.

    Thus it should be clear that formidable missing or corrupted data problems hinder a simple and

    direct merging of the datasets. Moreover, exact date matching cannot be done because of the

    nature of the data. The final combination of the two databases was the result of a two stage

    process. The first used the following match criteria: an exactly matching rut and a match on the

    month and year of the accident. Thus, for a BAC sample to be matched to a particular driver, the

    SML and Carabineros databases‟ recorded ruts had to be the same and both the blood sample and

    the accident had to be recorded on the same month of the same year.

    There is an obvious source of error: that the same person might have been involved in two

    accidents in the same month and year, and that the blood sample is being associated with the

    wrong accident. This is certainly possible, and appears to occur remarkably frequently in the

    dataset. To tackle this, those records in the Carabineros dataset that have more than a 3 day lag

    between the date of the accident and reception of the sample are removed from the “matched”

    group, to make it extremely unlikely that they do not come from the same accident.

    Another source of error is that by combining the databases by month, we are ignoring all crashes

    occurring near the end of each month, in which the sample was taken or received a few days later,

    in the following month. To this end a second matching process is used, attempting to match the

    Carabineros data with SML data using the same criteria as before, but changing the month in the

    SML data to the month before. This second stage uses only records not matched in the first stage

    from both databases to avoid re-assigning a blood sample that has already been matched to an

    accident36

    .

    36

    More refinements to the matching process were in fact used, such as making the match conditional on

    driver/passenger/pedestrian status, and only using records in the second stage within a 2 week period of each other.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    28

    This discussion has two main aims. Firstly it has highlighted the difficulties inherent in working

    with multiple databases, and the care that has been taken in circumventing them. It is highly

    unlikely that matches between SML and Carabineros data that result from this procedure are

    spurious, as they have the same recorded rut (a nine digit code!), occur in the same month and

    year and the dating discrepancy is at most 3 days for non-fatal blood samples. Nonetheless,

    mismatches are possible. No matching process can realistically claim otherwise, given the data.

    However, the likelihood of the same driver being involved in two crashes within 3 days of each

    other, and having the wrong crash associated with a blood sample is extremely low, making the

    error safely negligible.37

    The second aim of the discussion is to make clear why the match rate of drivers to SML records

    is 25% of all drivers, making up approximately 40% of MR drivers. This figure, although

    apparently low, is made up of matches that are to be trusted. Relaxing any of the matching

    criteria increases the proportion of records matched, but to the detriment of accuracy. Moreover,

    it must be considered that blood samples are taken only in accidents involving serious injuries,

    which make up 28.5% of all accidents. Thus the figure of 62,144 uncorrupted blood samples,

    although apparently low, makes up a much larger proportion of drivers involved in serious

    accidents.

    The final numbers of matched driver records from the databases are the following:

    Sample: 2000-3003 number % of all drivers % of MR drivers

    Total matched drivers 63,967 25.0% 41.9%

    Total matched drivers with uncorrupted BAC data 62,144 24.3% 40.7%

    Biases in the Data

    SML data

    37

    Despite a visual inspection of over 1000 matches I found no obvious discrepancies in SML and Carabineros

    recorded dates. A separate error involving duplicate assignment of a single BAC to multiple crashes, though

    unavoidable, has been identified and eliminated almost completely from the results presented. Any further effects

    from this problem are negligible.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    29

    The fact that we have reliable (i.e. SML) alcohol involvement measures for only a quarter of

    crash-involved drivers (all of which are in the metropolitan region) means that the sample is not

    directly representative of the country as a whole in one crucial sense: it is highly skewed towards

    urban crashes. Only approximately 5% of metropolitan region crashes are classed as having

    occurred in rural areas – the same percent as in the sample.

    Along other dimensions of the data there is no reason to suspect a systematic bias in the sample,

    given that it is composed of those observations that are free from data collection and entry errors

    in both databases. By way of example, the number of observations by year that are included in

    the sample indicates no bias:

    Year 2000 2001 2002 2003 Total

    Un-matched drivers 44833 50256 46485 50417 191,991

    matched drivers 14375 15,381 17020 17191 63,967

    % of year's data 24.3% 23.4% 26.8% 25.4% 25.0%

    Along other observable dimensions such as sex38

    there are no clear biases, making it possible to

    cautiously infer that inclusion in the sample is effectively random within the urban subset of the

    data.

    The urban bias cannot be directly corrected. However, for the purpose of extrapolating the urban

    SML sample to rural areas it is necessary to prove – at the very minimum – that rural crashes are

    at least as likely to involve drinking drivers as urban crashes. This is in fact the case: Carabineros

    records indicate that 6% of urban crashes involve at least one drinking driver, as opposed to 11%

    of rural crashes. Moreover, the percentage of drinking drivers in severe urban car crashes is 9%,

    somewhat lower than the rural 12.5% rate. Obviously we cannot be sure how closely matched

    urban and rural accident data are in terms of alcohol involvement. With the evidence at hand it

    seems that if anything rural accidents involve more drinking drivers than urban crashes. This

    permits tentative extrapolation of urban data to rural accidents, but the extent to which this is

    justified is unclear.

    38

    For example in both matched and non-matched driver groups women make up approximately 13% of the sample.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    30

    Under-reporting of fatalities

    Before moving on a bias that is present in all the data, and not just the sample, is the systematic

    under-reporting of fatalities by Carabineros, as they report as fatalities only those victims that die

    within 24 hours of the crash. Naturally, many more die in the days following the crash, and these

    are classed as serious injuries, not deaths, by Carabineros. For example, between January 2000

    and the end of 2002 Carabineros recorded 4974 traffic deaths, while the Ministry of Health put

    the figure at 627939

    – 26% more than Carabineros. Thus, when considering the drink driving

    externality we would do well to consider this systematic under-reporting of deaths in the database.

    This issue also strengthens the case for examining accidents resulting in both fatalities and

    serious injuries, as many of the latter are effectively deaths. Moreover, serious injuries are absent

    from US FARS data, making the availability of this data an advantage of this study over studies

    using US data.

    Under-reporting of Alcohol-Involvement

    The combination of the Carabineros and SML databases provides an ideal opportunity to

    compare the alcohol measures in each. The SML measure is obviously more trustworthy.

    However, it is well to note that even the SML measure is not exact: given that at least one and

    often many more hours transpire between the accident occurring and a blood sample being taken,

    the sample is a downwardly biased measure. This is because alcohol is eliminated from the

    bloodstream at approximately 0.15 BAC units per hour, a rate fast enough to tip many DUI

    (0.49

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    31

    Legally classified as Drunk 253 2,270 2,523

    Total 57,202 4,839 62,041

    The number of cases in which Carabineros fails to realize that a driver is in fact drunk is

    highlighted in red and is larger than those that are accurately identified. The number in blue is the

    opposite, upward bias: cases in which Carabineros identifies drivers as legally drunk (DUI or

    DWI), when in fact they are not according to the SML. The explanation for at least a third of

    these cases is direct: these had alcohol in their blood and their legal alcohol reading is likely to

    reflect the delay between the accident and the taking of the blood sample40

    , i.e. they were drunk

    at the time of the accident, but by the time the sample was taken their BAC had fallen to

    permissible levels.

    The extent of the downward bias in the Carabineros alcohol measure (identifying a drunk driver

    as sober) is such that Carabineros correctly identified only 47% of drunk drivers. This is

    surprising, but not unprecedented. US measures of police under-reporting of alcohol have found

    similar figures: 71% correctly identified (Soderstrom et al 1990), 57.1% (Maull et al 1984) and

    51.7% (Dischinger et al 1989). Moreover, the Dischinger study found an even lower

    identification rate of 28.6% for drivers with a BAC below 1.0, which overlap with our illegal

    measures, suggesting that the Chilean rate of 47% is well within the expected range. A final and

    more recent reference point is Blincoe et al (2002) which found that in the US state of Maryland

    police identified 74% of cases with BAC ≥ 1.0 and only 46% of cases where BACs were between

    1.0 and zero.

    A closer look at the misclassified cases suggests some explanations: compared to correctly

    identified drivers, double the proportion (15%) of those mistakenly reported as sober suffered

    either fatal or very serious injuries, and approximately a quarter suffered serious injuries. Thus it

    is likely that Carabineros were unlikely to have had close access to these drivers as they may

    have been immediately transported to hospital. The remaining 75% of misclassified drivers pose

    a problem: how were they misclassified given that they are overwhelmingly (70%) registered as

    DWI (BAC>1), rather than the less serious DUI (0.49

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    32

    explanation is a combination of Carabineros lack of equipment and training coupled with the fact

    that detection of even quite severe impairment (BAC >1) might be somewhat harder than it

    appears.

    In terms of rural alcohol under-reporting the SML data cannot provide many cases. Nonetheless,

    some areas of the Greater Santiago Metropolitan region are effectively rural, and the alcohol

    detection table closely parallels that for the whole MR:

    Carabineros Data 2000-03 (rural) SML data (rural)

    Legally Sober (BAC< 0.5) Legally Drunk Total

    Legally Sober (BAC< 0.5) 2,881 164 3,045

    Legally Drunk 24 179 203

    Total 2905 343 3248

    This provides some support for our extrapolation of SML data to the whole country. However, as

    the data collection is exclusively by MR police precincts, it is possible that the biases in the rural

    accident data are completely different to that in the MR.

    Hit and Run accidents

    So-called “hit and run” accidents are cases in which one or more drivers flee the scene of an

    accident, resulting in missing alcohol-involvement data for those drivers. The evidence

    available41

    suggests that these drivers have high BACs (which is perhaps why they flee) and thus

    are of direct relevance to our study, making up yet another source of downward bias to our

    parameters. However, any assumptions made about such a group must be arbitrary and they have

    thus been ignored.

    41

    The Compton et al study pursued hit-and-run drivers, apprehending 94 of 603. Of these, over 69% had positive BACs, typically at high levels. Moskowitz et al 2002 report police in La Puente, California made an effort to

    apprehend hit and run drivers, reporting that 65% of those apprehended had positive BACs. However, those caught

    are not necessarily representative, as higher BACs may make successful escape more unlikely.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    33

    6 – Results

    The 2000-2003 database was used to identify the following serious injury or fatality causing 2-

    car night-time crashes:

    Period: 2000-2003 drunk-sober crashes drunk-drunk crashes sober-sober crashes

    Carabineros data 2000-Sept2004 200 34 539

    SML data (reduced sample) 81 15 72

    SML data (extrapolating for

    whole sample) 317 59 282

    In terms of the model the following results were obtained:

    Period: 2000-2003 θ N % of drinking drivers on the road

    Carabineros data 2000-Sept2004 - - -

    SML data (reduced sample) 3.8 0.23

    18.96% (2.35) (0.094)

    SML data (extrapolating for

    whole sample)

    3.8 0.23

    (1.181) (0.047)

    It is immediately evident that the Carabineros data by itself is of no use in estimating the RR

    parameter, θ: the data is simply incompatible with the binomial distribution - no parameters exist

    that could have generated the data. This is the direct result of the downward bias in the

    Carabineros alcohol measure, and makes the model unworkable.

    Moreover, this incompatibility with the model is not due to the equal mixing assumption not

    holding, as the problem here is that the number of Drunk-Sober and Drunk-Drunk crashes are

    extremely low, not just that Drunk-Sober crashes are too low for the model to function.

    The more trustworthy, although still somewhat downwardly biased SML data do provide a large

    enough value for R (R ≥4) and provide an estimate of θ: 3.8. That is, drunk drivers are 3.8 times

    more likely to cause a serious injury or fatality causing crash than sober drivers42

    . However, the

    standard error is too large in the penultimate line of the table for the parameter to be statistically

    significant. This is because in this case we are using only those cases in which we have a direct

    42

    Using the Compton et al RR curve suggests an average BAC for Chilean night-time car drivers of approximately

    0.95, similar to the average BAC for dead drivers, 0.75. However, here we are deriving a RR of causing a severe or

    fatal accident, while the Compton et al curve is for accidents of all severities. As alcohol is implicated to an

    increasing extent as BAC rises, we can infer that the average Chilean BAC implied by the RR estimate is above 0.95.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    34

    BAC measurement, and as this is only 25% of the drivers in the sample, the variance of the

    estimate is too large. Extrapolating (saying, effectively, that our sample is representative, and

    simulating the data) provides a much lower s.e. – though of course this is a mere simulation.

    It is important to note the context of this result. In addition to the data problems that have resulted

    in our having to use a much reduced sample – the SML data – this is a result that has been

    estimated for Chilean car drivers, at night. Even without the small sample size that implies the

    result is not statistically significant, it cannot be strictly interpreted as applying to non-car drivers,

    or as applying in the daytime. In the sense that drink-driving is principally a night-time

    phenomenon, this caveat is not as serious as it first appears. However, the fact that we cannot be

    sure that it applies to vehicles that are not cars (trucks, buses etc) is more serious were any policy

    relevance to be given to this result, given such vehicles are involved in 38% of serious night-time

    crashes.

    What the result would provide us with – if standard errors were not a problem – would be a

    trustworthy relative risk estimate of a fatal or serious injury causing crash for night car drivers,

    which make up the vast majority of night drivers, and an even larger proportion of potential

    night-time drivers. As such, this result would be of great value to driver education programs and

    would be of substantial use in determining penalties for night-time drink driving. At present it

    would appear that no relative risk calculation is involved in the penalty-setting process.

    Estimating the parameters taking Under-Reported Alcohol-Involvement into account

    It should be clear that the Carabineros dataset by itself cannot yield a parameter estimate.

    However, if the degree of alcohol under-reporting can be estimated, then an approximate RR

    parameter can be estimated for the whole dataset, and not just the SML sample.

    To estimate the extent of alcohol under-reporting we focus on the cases where we have both SML

    and Carabineros data on alcohol. These are almost all (157 of 164) the cases used in the SML-

    only estimation of θ in the preceding section. These cases are then examined to determine the

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    35

    extent of the downward bias in the alcohol measure, as we have the more accurate SML measure

    at hand43

    .

    There are two polar cases by which a biased alcohol measure could affect the number of cases in

    each of the three crucial collision categories: Drunk-Drunk (DDC), Drunk-Sober (DSC) and

    Sober-Sober crashes (SSC). The first is the least plausible: if a police officer misreports one

    driver (as being sober when he is in fact drunk) at a particular crash site, then he will never

    misreport the other. This polar case results in a lowering of the value of R44

    and hence cannot

    serve as a useful guide, as it makes estimating θ even more difficult. The other case is when a

    police officer has perfectly correlated reporting errors: if he misreports one driver at a crash then

    he always misreports the other. This increases the value of R because in this case the number of

    SSC in the data are artificially inflated: many SSC are actually DSC or DDC.

    While the latter scenario (correlated police reporting errors) is more plausible it is unlikely to

    hold fully: sometimes a police officer will misreport only one of the two drivers at a particular

    crash. An examination of the data reveals this to be exactly the case: there are approximately the

    same number of cases in which Carabineros misreport one driver and not the other as cases where

    both are misreported. However, if we assume that reporting errors are perfectly correlated then

    we under-estimate the downward bias in the alcohol measure caused by misreporting: if we

    assume an intermediate correlation of reporting errors then the downward bias required to obtain

    the same data error-laden data is greater.

    Thus, in the simulation below we have assumed perfectly correlated reporting errors. Under this

    assumption, the downward alcohol reporting bias required to generate the Carabineros alcohol

    measure for those 157 cases in which we have the more accurate SML measure is 51%. That is, if

    51% of drunk drivers are misclassified as sober, and reporting errors are perfectly correlated, then

    real accident data like the 157 cases under consideration will be reported as Carabineros has

    reported them. It is important to note the earlier point: this is the lowest possible value for the

    43

    By themselves they deliver an R of 1.73 - well below the minimum level for estimation 44

    The key issue is whether Drunk-Drunk collisions are shifted to the Sober-Sober or Sober-Drunk categories. If a

    police officer never misreports both drivers then all misreported DDC are reported as being DSC, and thus the

    number of DSC is artificially inflated and must be lowered, reducing R.

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    36

    downward bias in the alcohol measure, and the evidence indicates that the assumption of

    perfectly correlated reporting errors does not hold. In all probability the downward bias is higher.

    Using this degree of downward bias, and applying it to the 2000-September 2004 dataset, we

    obtain the following simulated parameter values, with simulated s.e. in parentheses:

    Thus, the implied θ if the degree of downward

    bias in the alcohol measure is in fact 51% and

    Carabineros misreporting is perfectly correlated is

    6: drinking drivers are on average 6 times more

    likely45

    to cause a fatal or serious injury causing

    crash than sober drivers in Chile. If the downward

    bias is higher, then the relative risk (θ) will also

    be higher, as can be seen from the table, and vice

    versa.

    45

    According to the Compton et al. RR curve for US drivers, this suggests that the average BAC for Chilean night-

    time car drivers is above 1.15.

    Carabineros data 2000-Sept2004

    extent of downward bias Implied θ implied N

    32% 1.2 0.31

    (6.89) (0.13)

    40% 2.8 0.23

    (1.11) (0.773)

    51% 6.0 0.20

    (1.975) (0.001)

    60% 13.6 0.18

    (6.2) (0.0005)

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    37

    Taking Unequal Mixing into account

    As noted in the Methodology section, violations of the equal mixing assumption (i.e. that drivers

    have proportionately more interactions with other drivers of their same type) bias θ downwards

    and N upwards. Estimating the model for different values of the parameter Δ46

    (reflecting degrees

    of unequal mixing) we obtain the following table:

    As can be observed in the table above, as the

    Δ we use rises the estimated value of θ

    increases (by 26% in moving from Δ=0 to

    Δ=0.1) and the value of N falls. A table

    containing more values of delta is available in

    the appendix. Obviously, we cannot know

    which value of Δ most closely approximates

    the degree of unequal mixing found in practice, but as Δ rises our estimated θ becomes

    statistically significant.

    An attempt to choose the value of Δ that best fits the data by maximizing the value of the

    likelihood function (V) with respect to Δ47

    was unsuccessful: the difference between the highest

    and lowest values of V (each of which is the probability of observing the sample) using a

    thousand values of delta between 0 and 0.6 is a mere 0.000000001730901. Moreover, as a

    numerical optimization method is being used, as the number of iterations is increased this

    difference shrinks. In short, a graph of V (on the Y axis) against Δ would be flat, implying that

    this method is of no practical use, and that the true value of delta must be approximated by some

    other, perhaps experimental, method.

    The Alcohol-Involvement of Dead Pedestrians and Dead Drivers

    See Appendix 6 for details.

    46

    A Δ of 0.1 implies that drunk drivers are 10% more likely to interact (not crash, just interact) with other drunk

    drivers than with sober drivers. 47

    This is because the value of the likelihood function (V)represents the probability of observing this particular

    sample. Hence the value of Δ that results in the highest value of V maximizes this probability.

    Δ θ (relative risk) N % of drinking drivers

    0 3.81 0.23

    19.46% (2.35) (0.09)

    0.1 4.63 0.21

    17.05% (2.48) (0.07)

    0.2 5.54 0.18

    15.26% (2.68) (0.06)

    0.5 8.19 0.13

    11.71% (3.34) (0.04)

  • Economics M.Sc. Thesis – William Mullins – 2nd

    Semester 2004

    38

    7- Estimating the External Costs of Drink-Driving

    What counts as an externality?

    The debate as to the policy relevant costs of alcohol is one of long standing, with estimates of the

    total annual social cost of US alcohol use ranging from USD 9.3bn to over 130bn (Heien 1995-6).

    Much of the debate revolves indirectly around the issue of whether abuse of alcohol is

    appropriately considered rational behaviour.

    The Becker & Murphy theory of rational addiction holds that addiction is in no way conclusive

    evidence of irrational behaviour – in fact it can be rational. Habits and addictions (extreme habits)

    stem from consumption preferences being connected intertemporally. Several effects are in play:

    among them what is termed „reinforcement‟ – where past consumption of the good increases the

    marginal utility of present consumption. Secondly, the good itself may raise already high

    discount rates48

    , rationally transforming a habit into an addiction. In short, a coherent and

    empirically successful theory exists to justify that consumers are rational when the consume

    alcohol in „excess.‟

    Standard economic theory holds that the relevant social costs of alcohol involved driving are only

    those that can be classified as spillovers. If drinking drivers kill themselves while driving,

    causing no other harm then there is no social cost involved. If their passengers die, they are not

    externalities either, as they exercise their free will, and internalize the risk in choosing to ride

    with such a driver49

    . Moreover, drinking drivers, even those who die, receive positive utility from

    their consumption choices ex ante.

    Some economists have argued that a rigid adherence to consumer sovereignty (the consumer is

    rational and takes an optimal consumption path – who are we to say what is best for him?) is not

    convincing in this case, and that strict externalities do not capture the full social cost. Pogue and

    Sgontz (1989) argue that the optimal tax on alcohol rises markedly if alcoholism is considered to

    48

    People with high discount rates are more likely to develop habits that may become addictions as they weigh the

    future risk of becoming an addict less heavily. 49

    Heien notes that Perrine et al 1988 indicates that 83.3% of the passenge