e cient provision of experience goods: evidence from

51
Efficient Provision of Experience Goods: Evidence from Antidepressant Choice * Michael J. Dickstein Stanford University This Version: January 2012 Abstract In the market for depression care, physicians face uncertainty about how a newly di- agnosed patient will respond to available treatments. I design a framework to analyze how price and drug promotion influence the physician’s recommendation as she sequen- tially searches for the best match. Given the large number of treatments available for depression, I specify a novel decision rule that accommodates large choice sets and permits correlation in the learning process. In the model, the physician updates her priors on both the drug tried and drugs with similar characteristics. I find that insurer policies that prioritize treatments with high predicted patient adherence outperform commonly-used tiered pricing, both in reducing spending and improving health. Keywords: Learning, dynamic discrete choice, pharmaceutical demand, health in- surance design * I owe special thanks to David Cutler, Guido Imbens, Ariel Pakes, and Dennis Yao for their guidance and suggestions. I thank Gary Chamberlain, Andrew Ching, Greg Lewis, Eduardo Morales, Julie Mortimer, Amanda Starc, and seminar participants at the Harvard Industrial Organization Lunch and Harvard Econometrics Lunch for helpful comments. I gratefully acknowledge support from the Cowles Foundation at Yale University, where I conducted part of this research. Department of Economics, Stanford University, Stanford, CA 94305, [email protected]

Upload: others

Post on 24-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E cient Provision of Experience Goods: Evidence from

Efficient Provision of Experience Goods: Evidencefrom Antidepressant Choice∗

Michael J. Dickstein†

Stanford University

This Version: January 2012

Abstract

In the market for depression care, physicians face uncertainty about how a newly di-agnosed patient will respond to available treatments. I design a framework to analyzehow price and drug promotion influence the physician’s recommendation as she sequen-tially searches for the best match. Given the large number of treatments available fordepression, I specify a novel decision rule that accommodates large choice sets andpermits correlation in the learning process. In the model, the physician updates herpriors on both the drug tried and drugs with similar characteristics. I find that insurerpolicies that prioritize treatments with high predicted patient adherence outperformcommonly-used tiered pricing, both in reducing spending and improving health.

Keywords: Learning, dynamic discrete choice, pharmaceutical demand, health in-surance design

∗I owe special thanks to David Cutler, Guido Imbens, Ariel Pakes, and Dennis Yao for their guidanceand suggestions. I thank Gary Chamberlain, Andrew Ching, Greg Lewis, Eduardo Morales, JulieMortimer, Amanda Starc, and seminar participants at the Harvard Industrial Organization Lunch andHarvard Econometrics Lunch for helpful comments. I gratefully acknowledge support from the CowlesFoundation at Yale University, where I conducted part of this research.†Department of Economics, Stanford University, Stanford, CA 94305, [email protected]

Page 2: E cient Provision of Experience Goods: Evidence from

1 Introduction

In markets for experience goods, buyers learn about the quality of available products

through experimentation. After testing out one option, they reevaluate the alternatives

to identify those that better match their preferences. In their next decision, buyers

may choose one of the preferred products or may select a less well understood option to

sharpen their opinions further. Many markets feature this search behavior, including

consumers’ purchases on non-durable goods; manufacturers’ choices of suppliers; and

physicians’ recommendations of prescription drugs for their patients.1 The market for

prescription drugs is a particularly important example because improving the pace and

precision of the patient and physician’s learning process has consequences for health.

The health benefits come in two forms: patients may realize better outcomes while on

more tailored treatment and may adhere to the proper drug regimen at higher rates.

I develop a dynamic model of prescription drug choice to ask: what set of insurer

pricing policies and drug promotions can improve physician decision-making, maximiz-

ing patient adherence and health while minimizing insurer cost? In the model, these

policies enter both the physician’s initial treatment choice and also interact with the

learning process. If the incentives target treatments with better expected outcomes and

tolerability, physicians will search among those drugs most likely to relieve symptoms

and least likely to cause side effects that discourage patients from continuing treatment.

I focus specifically on the antidepressant market. The potential improvement in

depression care is substantial, as few patients continue treatment to the six month

threshold recommended by the American Psychiatric Association. In my empirical

setting, half of the patients discontinue treatment by the first month and over 90% exit

care before the recommended six months.2 In addition, physicians face uncertainty in

this market about how a new patient will respond to the treatments available. Six

subclasses of drugs compose the choice set, each affecting a distinct set of chemicals

in the brain; physicians cannot predict ex ante which mechanism will work best for a

particular patient.

To analyze this dynamic choice problem, I begin by specifying the physician’s prior

1Erdem and Keane (1996) and Ackerberg (2003) study consumer products; Crawford and Shum(2005) examine demand for prescription drugs.

2Berndt et al. (2002) find that such short treatment spells limit the probability of short-run recovery.Inadequate treatment duration also nearly doubles the rate of illness relapse (Melfi et al. (1998)). Theresulting increase in depressive symptoms inflicts serious pain: in surveys, patients equate 10 yearsliving with depression to only 6 years living without it (Fryback et al. (1993)).

1

Page 3: E cient Provision of Experience Goods: Evidence from

beliefs on the effectiveness of each option. In the model, the physician recommends

a treatment to the patient based in part on the patient’s outcome and in part on her

own private considerations, as in Hellerstein (1998); the patient then decides whether

to adhere to the treatment selected. Using data on the initial treatment choice, I

estimate the individual-specific priors that enter the joint patient and physician decision.

Individuals differ in their preferences over a rich set of drug characteristics, including

price. This helps rationalize the diffuse market shares at the initial decision.

The estimates of these priors illustrate that physicians respond to high patient

copayments by recommending cheaper options; they favor specific highly-promoted

options; and supply-side incentives, such as paying physicians using a capitated rate,

can shift prescribing toward cost-effective options. Finding this effect at the initial

choice, I analyze how price and promotion influence the sequence of treatments provided

to a patient over the duration of his illness.

To measure the dynamic effects of alternative policies in this market requires method-

ological innovation. Two features of the market complicate the application of existing

tools. First, physicians select from among twenty unique products in the antidepressant

class. The agent—and the econometrician—must hold in memory the expected out-

comes and the covariance in outcomes across the twenty drugs when evaluating possible

treatment regimens. This large number of required state variables reflects the “curse

of dimensionality”; traditional computational methods cannot easily handle a problem

of this size.3 Second, when searching for the drug best suited to a particular patient’s

illness, physicians learn about the quality of each drug in a correlated fashion. After

a poor outcome on drug A, physicians may also avoid drug B if it shares those char-

acteristics of A that the patient disliked, such as side effects. The optimal sequence of

products to sample thus depends directly on the similarity among subsets of treatments

within the choice set. If several products share a characteristic, physicians might start

with a drug in this set to learn rapidly about the patient’s match to all of the drug’s

close substitutes.

I accommodate these features by representing preferences over drugs as a function

of individual sensitivities and a set of product attributes, similar to the approach Berry

et al. (1995) employ in the static discrete choice setting. I parameterize outcomes under

each drug as a linear function of these product attributes. Physicians and patients begin

with priors on the coefficients in this parameterization. After trying out a medication,

3See Aguirregabiria and Mira (2010) for a survey of dynamic discrete choice models and a discussionof the computational requirements.

2

Page 4: E cient Provision of Experience Goods: Evidence from

they update their priors on the coefficients associated with the attributes of the drug

sampled. Unlike earlier work on dynamic drug choice, including Crawford and Shum

(2005) and Ching (2010), this learning process permits spillovers: when updating priors

on the coefficients, the physician and patient change their expectation of outcomes under

all drugs that share characteristics with the drug selected previously.

Defining a product as a function of attributes also reduces the dimensionality of

the dynamic problem. With k the number of independent coefficients in the linear

parameterization, the physician need only manage 2k variables, including the means

and variances of each coefficient. Choosing a small value of k, I could proceed using a

direct approximation of the dynamic programming solution to the physician’s problem,

such as the method Keane and Wolpin (1994) propose. However, because I define a

choice purely in terms of observable characteristics, I select a relatively large k to feature

many of the attributes that physicians suggest prompt patient complaints. These drug

features include: (1) efficacy, which differs largely by the drug’s subclass; (2) the side

effect profile; and (3) the frequency of dosing per day. Given this choice of k, solving the

decision problem using backward induction for each patient-physician pair still requires

computing many high-dimensional integrals.

I avoid computing these multidimensional integrals by appealing to an alternative

solution to the physician’s sequence problem developed in the statistics literature by

Gittins and Jones (1979) and applied previously in economics contexts by Miller (1984)

and Bergemann and Valimaki (1996), among others. In the drug choice setting, a

physician can always return to an option that she overlooked initially and it will yield

the same outcome regardless of its place in the choice sequence. The physician and

patient can therefore apply forward induction to solve the sequence problem. Gittins

and Jones (1979) provide a two step decision rule for this setting that reduces the

multidimensional integrals to a series of simpler one-dimensional problems. First, the

physician considers each choice independently, as if it were the preferred option. Given

her beliefs at the time of the decision, she calculates the cumulative outcome the patient

would realize from taking that drug until it is optimal to quit. This sum for each drug is

its “index”. Second, the physician recommends the option with the largest index. The

physician and patient update their beliefs after realizing an outcome and then repeat

this procedure.

The approach vastly reduces computation and also captures optimizing behavior

using a policy that appears more like a ‘rule of thumb’ that physicians may follow in

practice. I discuss the theoretical properties of this approach and show how it improves

3

Page 5: E cient Provision of Experience Goods: Evidence from

upon predictions from both a static model and a Bayesian learning model that lacks an

incentive for experimentation.

With the dynamic model, I test the effect of insurer copayment designs and promo-

tion schemes on both the insurer’s long-run costs and patient health, translated to dollar

terms. I find “value-based” insurance designs that set lower copayments for drugs with

higher effectiveness can boost adherence, leading to a dollar gain in patient health that

exceeds the change in cost. These policies outperform commonly-used tiered copayment

schemes. For promotion, I find that marketing the most effective branded products can

enhance patient outcomes by encouraging experimentation with those products most

likely to speed recovery. Insurers may value this practice if the drug’s wholesale cost is

modest.

Finally, I use the information on quit rates in the large census of patient episodes,

filtered through the structural model, to recommended a set of treatments. When pa-

tients quit care early, holding price and other incentives constant, it reveals information

on the effectiveness of the drug chosen. The approach of judging efficacy using attrition

rates follows from work by Chan and Hamilton (2006) and Philipson and DeSimone

(1997) in the setting of randomized clinical trials. I extend this idea to observational

data. Relative to the managers of clinical trials, practicing physicians exert far less ef-

fort to encourage compliance with the prescribed treatment. As a result, the adherence

rates observed in claims data, while lacking the internal validity of a randomized trial,

nonetheless provide an externally valid impression of patient satisfaction with a drug.

Observed switching patterns in the data identify five treatments with 5-10% higher

rates of adherence and 10-20% fewer switches relative to other options. Existing pro-

tocols built from clinical trial outcomes recommend a larger list of products, some of

which far underperform the above treatments in terms of adherence. Thus, in the case

of judging drug efficacy—and possibly for policy evaluation more generally—my results

suggest observed adherence rates provide valuable information missed by treatment

effects in randomized trials.

I proceed in the paper by first describing the market for depression care, the dataset,

and the variation used to identify the parameters of the model in Section 2. In Section

3, I describe the estimation used to initialize the physician’s priors. I then provide detail

on the learning model and the estimation algorithm used to recover its parameters in

Sections 4 and 5. I describe the results and fit of the dynamic model in Section 6.

Finally, I run counterfactual simulations to test the effect of insurer policies on costs

and patient health in Section 7. Section 8 concludes.

4

Page 6: E cient Provision of Experience Goods: Evidence from

2 Background and Data

2.1 Depression Care

Major depression affects 6.5% of adults in the United States each year. The illness

causes patients to suffer significant impairment in their productivity, with surveys

finding 60% have symptoms severe enough to keep them from performing daily tasks

(Kessler et al. (2003)). I study patients with one of five categories of depression: ma-

jor depression; dysthymia and depression with anxiety; prolonged depressive reaction;

adjustment disorder with depressed mood; and, depression not otherwise specified.4 I

exclude less severe diagnoses and episodes where the individual does not receive a for-

mal diagnosis. For the serious conditions I study, the American Psychiatric Association

recommends at least 6 months of treatment. I interpret exit before this threshold as

a poor outcome, since few patients with severe diagnoses will recover within that time

frame.

The volume of diagnoses in the United States leads to a large market for drug

treatment. In 2008, patients filled 164 million monthly prescriptions for antidepressants.

As a class, only cholesterol regulators and pain medicines exceeded this volume of sales.

In dollar terms, sales of antidepressants reached $9.6 billion in 2008, roughly 3.3% of

the U.S. prescription drug market.5 If advertising expenditure is roughly proportional

to the class’ share in sales, manufacturers spent $370 million in 2008 on promotion of

antidepressants. I consider how this advertising might affect prescribing behavior in

later counterfactuals.

The antidepressants studied fall into six distinct subclasses according to their effect

on the concentration of serotonin, norepinephrine, and dopamine in the brain. Patients

react idiosyncratically to changes in the concentrations of these chemicals, creating

uncertainty regarding the efficacy of any one biological mechanism for a patient (Mur-

phy et al. (2009)). The medical literature groups the subclasses themselves into two

generations, with “second generation” antidepressants generally dominating the older

subclasses. Of the drugs I study, only tricyclic antidepressants (TCAs) fall into the first

generation. They act non-selectively on chemicals in the brain, leading to greater issues

with tolerability and higher risk of overdose. As a result, “selective” second generation

4The depression diagnoses listed match the International Classification of Diseases (ICD-9-CM)codes of: 296.2, 296.3, 300.4, 309.0, 309.1, and 311. Melfi et al. (1998), Pomerantz et al. (2004), andAkincigil et al. (2007) use similar diagnostic codes in selecting a sample of depression sufferers.

5“Top Therapeutic Classes by US Sales”, IMS National Sales Perspectives. IMS Health, 2008.

5

Page 7: E cient Provision of Experience Goods: Evidence from

classes have largely replaced TCAs as the preferred treatment. The second generation

includes: selective serotonin reuptake inhibitors (SSRIs); serotonin-norepinephrine re-

uptake inhibitors (SNRIs); norepinephrine and dopamine reuptake inhibitors (NDRIs);

noradrenergic and specific serotonergic antidepressants (NASSAs); and serotonin mod-

ulators (SMs) (Gartlehner et al. (2007)).

2.2 Formation of the Sample

I draw a sample from Thomson Reuter’s Commercial Claims and Encounters Mar-

ketScan database, collecting its patient-level insurance claims data for outpatient visits

and prescription drug services in the years 2003 through 2005. I link this data to pa-

tient demographics and plan design features from the Benefit Plan Design MarketScan

database. The individuals recorded in the data include active employees working for a

group of large, self-insured firms in the United States; the employees’ dependents and

some classes of retirees enter the database as well.

To form a sample, I identify patients in the outpatient data who receive a new di-

agnosis in one of the depression categories listed earlier. Starting from this index office

visit, I collect panel data on the patients’ prescription drug use over the remaining

months of the sample. By conditioning on significant diagnoses in creating the sample,

I avoid observations that reflect off-label use of antidepressants.6 I impose the following

additional screens in creating the analysis dataset: patients cannot have a concurrent

diagnosis of bipolar disorder or schizophrenia or receive drugs that signal these con-

ditions, as these more complicated illnesses require distinct prescribing behavior;7 the

patients’ age must be between 18 and 64, the range for which the data are complete;

patients must visit a health professional with the ability to prescribe all possible treat-

ments in the choice set; and, patients must not be pregnant, a condition that involves

serious drug contraindications. To avoid patients whose illnesses began before the sam-

ple period, I restrict the sample to exclude individuals prescribed medications in the

first six months of data.

The initial filters lead to a dataset of 102,780 unique patient episodes of depression

care, comprised of 267,390 observations on antidepressant prescriptions. The individ-

uals are members of 307 insurance plans, each with a distinct set of required drug

6In addition, I exclude patients who suffer from depression but receive either another diagnosis ornone at all. My sample thus reflects care for patients with more identifiable depressive symptoms.

7Patients excluded due to comorbidities have a diagnosis with one of the following ICD-9-CM codes:bipolar and manic disorders (296.0, 296.1, 296.4-.8) and schizophrenic disorders (295.0-295.9).

6

Page 8: E cient Provision of Experience Goods: Evidence from

copayments. Plans also differ in their physician payment schemes, with some offering

capitation contracts to primary care physicians. These contracts specify an annual

lump sum payment per patient treated rather than reimbursing the physician for each

office visit.

To define the end of a patient’s treatment episode, I look for a gap of 90 days

within the treatment history where the patient neither fills a prescription nor visits his

physician. If such a gap occurs, I end the episode at the last prescription observed

before the 90 day wash-out period. If the individual appears again in the data after

this period, I treat the new observations as composing an independent episode.8 Using

this definition of an episode, 35% of patients do not fill a prescription in their episode;

22% of patients fill one; 13% fill two; 9% fill three; and 6% fill four. 15% have episodes

involving between 5 and 30 filled prescriptions.

In July 2003, physicians chose a treatment from a set of 17 unique products.9 The

choice set evolved over the sample period. Citalopram, the generic form of Celexa,

entered in November 2004 and a new SNRI, Cymbalta (duloxetine) entered in August

2004, providing 19 drug options and one non-drug outside option by the end of the

sample period. I allow the patient to face the latest choice set at each decision point in

his observed episode.

2.3 Variation for Model of Initial Choice

The model of the physician’s initial treatment choice conditions on the insurer’s cost,

the patient’s copayment, drug efficacy, side effects, and dosing convenience. The Mar-

ketScan database contains the important price variables. Its “ingredient cost” variable

reflects the insurer’s cost for a drug excluding rebates; the copayment reported is the

plan-specific and drug-specific cost patients face when filling a prescription. In Figure

1, I plot by drug and year the average patient copayment per month supply, taking

the mean across plans in the sample. The error bars illustrate the standard deviation

8Roughly 8% of the unique patient identifiers repeat in a new episode over the course of my sampleperiod. The initial choice in these repeat episodes contributes to the first stage model, and may biasthe mean estimates if they represent a selected sample of drugs. In the data, however, the repeatepisodes begin with a distribution of drugs similar to drugs chosen contemporaneously at the start ofnon-repeated episodes.

9In defining the choice set, I include only drugs maintaining at least a 0.3% market share. Thiscutoff rule excludes extremely rare, older generation treatments, such as MAOIs (monoamine oxidaseinhibitors). Their shares are simply too small to permit estimation and to collect price informationacross distinct plans.

7

Page 9: E cient Provision of Experience Goods: Evidence from

!"

#!"

$!"

%!"

&!"

'!"

(!"

)!"

*+,-.

/,0123

+403.50%!06+

407833

940:-/0;<"

*5268=,0>+?.0:@./.5-=0/+?.0-A02B03+,./,<"

$!!%" $!!&" $!!'"

!"#$%&%' (&$&")*'

Figure 1: Patient copayments by product and year. Error bars show the standarddeviation in copayments across insurance plans in the sample.

of the prices across plans. Copayments vary along three dimensions. First, for most

drugs, average copayments increase over the sample period. Second, across products

in a given year, copayments vary considerably, most sharply when comparing branded

drugs to generic drugs. The largest copayments tend to be for off-patent branded drugs

with generic equivalents. Finally, third, for a product and year, the standard deviation

in copayments across plans is large. Average plans charge patients $20-30 for a 30-day

supply of a branded drug and about $5-10 for a 30-day supply of a generic drug. The

most generous plans require copayments of $0-5 while the least generous require levels

exceeding $50 or more for branded drugs. Given this important variation, I use plan-

specific and year-specific prices in the later estimation. The insurer cost varies similarly,

as reported in Table 1. Generics cost insurers as little as $5-8 per month supply, with

an average near $40; patented drugs cost two to three times as much.

Information on side effects and daily dosing come from Gartlehner et al. (2007).

The authors examine 2,099 relevant citations from the medical literature, collecting

8

Page 10: E cient Provision of Experience Goods: Evidence from

Tab

le1:

Sum

mar

yst

atis

tics

onva

riab

les

ente

ring

the

esti

mat

ion

algo

rith

m

!""#

!""$

!""%

!""#

!""$

!""%

!""#

!""$

!""%

&'(

)&'(

)*

**

++++++++++++

*++++++++++++

*++++++++++++

*++++++++++++

*++++++++++++

*++++++++++++

#%,!

++++++++++

#-,#

++++++++++

#$,!

++++++++++

./01

2'3415

+674

889:

&'

;*

++++++++++++

#<,"

++++++++++

;-,-

++++++++++

*++++++++++++

=,!

++++++++++++

>,=

++++++++++++

*++++++++++++

",!

++++++++++++

$,!

++++++++++++

.)2)?1

889:

@)A

;=!,-

++++++++++

=$,"

++++++++++

=%,#

++++++++++

!!,<

++++++++++

#;,=

++++++++++

#%,$

++++++++++

$,;

++++++++++++

!,=

++++++++++++

",;

++++++++++++

B)?134'

889:

@)A

;<#,!

++++++++++

<#,#

++++++++++

<>,-

++++++++++

!!,=

++++++++++

!$,>

++++++++++

!>,>

++++++++++

;#,>

++++++++++

;#,"

++++++++++

;!,"

++++++++++

C2D'

?)0/(

)+6.B

889:

&'

;*!

#",-

++++++++++

;=,=

++++++++++

;=,"

++++++++++

-,-

++++++++++++

<,-

++++++++++++

=,$

++++++++++++

=,#

++++++++++++

;;,%

++++++++++

;",<

++++++++++

E14'?)0/(

)+6.B

889:

&'

;<>,=

++++++++++

$-,#

++++++++++

#=,"

++++++++++

;;,"

++++++++++

>,#

++++++++++++

>,>

++++++++++++

;,=

++++++++++++

$,#

++++++++++++

$,$

++++++++++++

E1?/2+.

9889:

@)A

;>;,$

++++++++++

>#,-

++++++++++

>=,%

++++++++++

!;,>

++++++++++

!;,"

++++++++++

!%,;

++++++++++

<,$

++++++++++++

#,<

++++++++++++

;,<

++++++++++++

F'2'G0

889:

@)A

;=-,-

++++++++++

>;,-

++++++++++

>%,"

++++++++++

!",$

++++++++++

!%,<

++++++++++

!>,!

++++++++++

;!,%

++++++++++

;",<

++++++++++

-,<

++++++++++++

.H5

I1201

8&9:

@)A

;*!

*++++++++++++

;"#,>

++++++++

;;;,>

++++++++

*++++++++++++

!=,>

++++++++++

#$,<

++++++++++

*++++++++++++

",%

++++++++++++

!,$

++++++++++++

JGG)?'4*K9

8&9:

@)A

;;"-,%

++++++++

;;<,"

++++++++

;;>,=

++++++++

!;,<

++++++++++

!!,$

++++++++++

!<,;

++++++++++

>,<

++++++++++++

=,%

++++++++++++

=,"

++++++++++++

7D34'3

/'(+6.B

&L9:

&'

##>,<

++++++++++

$-,-

++++++++++

%",=

++++++++++

>,%

++++++++++++

-,"

++++++++++++

-,>

++++++++++++

",#

++++++++++++

#,$

++++++++++++

$,%

++++++++++++

M)22ID04/(+KB

&L9:

@)A

;;"$,>

++++++++

;"$,;

++++++++

;;#,%

++++++++

!;,!

++++++++++

!;,!

++++++++++

!<,>

++++++++++

<,%

++++++++++++

%,=

++++++++++++

%,;

++++++++++++

N5/04/30H2/(

)+6.B

O.N

&'

;%,-

++++++++++++

$,!

++++++++++++

$,"

++++++++++++

<,-

++++++++++++

%,#

++++++++++++

%,#

++++++++++++

",=

++++++++++++

",-

++++++++++++

",=

++++++++++++

P/401Q13/()

&188N

&'

;<","

++++++++++

$!,$

++++++++++

#;,-

++++++++++

;;,!

++++++++++

>,>

++++++++++++

>,"

++++++++++++

",%

++++++++++++

",=

++++++++++++

",<

++++++++++++

O41Q'R

'()+6.B

8P&'

#>,"

++++++++++++

<,;

++++++++++++

%,>

++++++++++++

=,!

++++++++++++

%,>

++++++++++++

<,"

++++++++++++

;,#

++++++++++++

;,>

++++++++++++

;,<

++++++++++++

;;S#$;

++++++

$;S!>%

++++++

%"S;%$

++++++

!=,!

++++++++++ ##

,!++++++++++

E10/)

(0TA+321(

+/A+1+('(

*U13

/010)R+6PV

;#,>

++++++++++ %#

,"++++++++++ $<

,=++++++++++ !=

,-++++++++++ !%

,%++++++++++

;"!S=>"

++++ !<

=S#-"

++++

O'012+W+'G+'

IA)4X10/'

(AY

8Z14)+'G+L101

P'(

0Z2H+:(

AD4)4+.

'A0+[\]

#P'(

0Z2H+L4D^+.'3

1H5)(0+[\]

#P14_)0+8

Z14)+[`

]$

E4'R

DU0+&

15)

8DIU21AA

741(

RaL1/2H+

R'A/(^

E10/)

(0TA+321(

+/A+1+E4)G)44)R

+E4'X/R)

4+V4^1(

/Q10/'(+[EEV

]E1

0/)(0+X/A/0A+1+^)()412+341U0/0/'(

)4+[/(0)4(1

2+5)R

/U/()bG15/2H

+5)R

/U/()]

E10/)

(0+X/A/0A+('(

*3AHUZ/104/U+A3)

U/12/A0+['I

A0)04/U/1(

S+)0U]

E10/)

(0+X/A/0A+3AHUZ/104/A0

O'012+W+D(/cD

)+31

0/)(0AY

!"#$%&'(&)#*+,+*-"%&./,"0+"1$2&+#&30/*-.1&.4/+.$&5/*$%

E10/)

(0+R/1^(

'A)R

+d/0Z

+51e'4+R)3

4)AA/X)+R/A'4R)4

E10/)

(0TA+321(

+/A+1+U13

/010)R+6)120Z

+P1/(0)(1(

U)+V4^1(

/Q10/'(+[6

PV]

!"#$%&6(&70-8&94"0".1$0+21+.2

O'012+3

10/)(0+)3/A'R)

AY

&'0)AYf

;,+8'D

4U)+*+O

Z'5A'(+9)

D0)4AT+P14_)08U1(+R1

01I1A)S+H)14A+!""#*!""%,f

!,+815

32)+U'22)U0)R

+G'4+/(R

/X/RD1

2A+()d

2H+R/1^(

'A)R

+d/0Z

+R)3

4)AA/'(+/(+1(+'D

0310/)(0+'gU)+X/A/0h+0Z

)+R1

01+/(

U2DR

)+/(G'45

10/'(+'(

+ADI

A)cD

)(0+

'gU)+X/A/0A+1(R

+34)AU4/30/'(A+R/A3)

(A)R

+'X)4+0Z)+310/)(0TA+)3

/A'R

),++E

4)^(

1(0+d

'5)(+1(R

+/(R/X/RD

12A+d/0Z

+3AHUZ/104/U+U'5

'4I/R/0/)

A+14)+

)?U2DR

)R+G4'5

+0Z)+A1532)S+1A+14)+31

0/)(0A+dZ'

A)+)3/A'R)

A+I)^/(+d/0Z

/(+0Z

)+i4A0+<+5

'(0ZA+'G+R101+[0'

+34)X)(0+2)G0*U)(A'4/(^],f

#,+j12D)

A+4)k)

U0+0Z

)+5)1(+1U4'AA+0Z

)+#"=+321(

A+/(+0Z

)+R1

01,++J1

UZ+321(+A)0A+R/A0/(

U0+U'3

1H5)(0A+1(R

+4)U'4RA+R/l)

4)(0+/(

AD4)4+U

'A0A+IH+R4D^

,f$,+8Z1

4)A+U12UD210)R

+DA/(^

+'(2H+0Z)+/(/0/12+34)AU4/30/'(+d4/m)

(+G'4+)

1UZ+31

0/)(0+)3/A'R)

,+8)2)U0)R+1(

0/R)3

4)AA1(

0A+AZ'

d(+G'4+/22D

A0410/'

(,f

!

9

Page 11: E cient Provision of Experience Goods: Evidence from

information on treatment effects and side effects from each study examined. I report

the dosing requirements in Table 1. I illustrate the variation in side effects by drug

compound in Figure 2. Side effects vary importantly across drug subclasses and, to a

lesser degree, across drugs in a given subclass. Choice behavior both initially and in

the learning model I build later will relate to this variation in side effects.

Finally, in Panel 2 of Table 1, I include statistics on the patient and physician at-

tributes that enter the initial choice model. In the sample collected from the MarketScan

database, 27% of patients suffer from major depression, the most severe diagnosis ex-

amined. Of those patients diagnosed with depression, 26% visit a psychiatrist, 47%

visit general practitioners and the remaining share visit other specialists, such as ob-

stetricians. For plan types, 33% of patients receive coverage under a capitated Health

Maintenance Organization (HMO) plan, 14% have non-capitated HMO plans, and the

remaining 53% have Preferred Provider Organization (PPO) or comprehensive plans

with fewer restrictions on office visits and other aspects of care.

0

5

10

15

20

25

30

35

40

45

!"#$"%&'()"*(&+','-(.+/"01-#&+'.2"3'41"561,+/7

!"#$ %&'($ ))'($ )%'($

8(9/1(7 :./#;.'(7

Figure 2: Rates of side effects reported during clinical trials. Percentage rates shownfor each drug, for a selected group of subclasses.

10

Page 12: E cient Provision of Experience Goods: Evidence from

2.4 Variation for Learning Model

Identification of the learning model parameters requires a sufficiently rich set of observed

switches across products and subclasses with distinct characteristics. I summarize the

switching patterns seen in the panel data in Table 2. Conditional on the subclass

choice at (t− 1), the table lists the fraction of individuals who (1) persist, (2) switch to

another compound within class, (3) switch to another inside class, or (4) exit treatment

at period t.10 The statistics show that individuals tend to persist on treatment at a rate

of 55-70%. The degree of persistence increases over the course of a patient’s episode and

depends importantly on the subclass of the initial choice. Individuals persist at greater

rates when beginning on an SSRI or SNRI, relative to NDRIs, TCAs, and NaSSAs. Of

those who switch at t, 70-80% stop taking medications altogether, 15-20% switch to

another subclass, and the balance switch to a new compound within the same subclass.

To illustrate the timing of switches, I include in matrix form in Table 3 the share of

patients who remain on the initial choice at different decision points within an episode.

Conditional on the level of adherence, about 10-15% of patients switch at the first deci-

sion point. Physicians appear to learn quickly about a patient’s match to a treatment.

3 Initial Condition

To initialize the learning process over the choice of drug regimen, I estimate a reduced

form model of behavior. In the model, the physician recommends a treatment to

the patient who then decides whether to fill the prescription. The utility underlying

the physician’s choice is a joint function of the patient and physician’s preferences,

as in Hellerstein (1998). I represent preferences over drugs as a function of patient

demographic data, the physician’s specialty, and drug characteristics.

I relate the first period choice to the important drug features in a linear specifica-

tion.11 The patient cares about an option’s: (1) copayment; (2) efficacy, summarized

10Decision points, t, roughly correspond to one month of treatment. In cases where the physicianwrites a 90 day prescription, the decision point corresponds to three months. When a patient returnsfor follow-up before completing the initial prescription, the decision point corresponds to the observednumber of days between office visits.

11Unlike earlier work on drug choice, including Crawford and Shum (2005), I choose a utility formfor the physician that assumes risk neutrality. I model risk aversion indirectly via compound-levelindicators. These capture the physician’s concern about a drug’s tendency to cause severe side effectsor prompt increased rates of suicide. I interpret the pace with which an individual switches treatmentsas an indication of the precision of the learning process rather than as risk aversion.

11

Page 13: E cient Provision of Experience Goods: Evidence from

Table 2: Observed continuation decisions (switch vs. persist)

!"#$%$&

'(%&)*+(%&*%,+)-.$$

'(%&)*+&/+.,/&*"#+

)-.$$

'(%&)*+&/+/0&$%1"+/2&%/,

''34 5678+++++++++++ 97:+++++++++++++ ;75+++++++++++++ ;:7<+++++++++++ =8>559+++++++?@34 5A7B+++++++++++ 97<+++++++++++++ B76+++++++++++++ ;A7:+++++++++++ 9>8BA+++++++++'?34 B976+++++++++++ =7:+++++++++++++ A7:+++++++++++++ ;<7=+++++++++++ A>=55+++++++++CDE :B78+++++++++++ <7;+++++++++++++ =;7B+++++++++++ ;87;+++++++++++ A8B++++++++++++?.''E :;75+++++++++++ F++++++++++++ =A79+++++++++++ 997=+++++++++++ ;89++++++++++++''34 BA79+++++++++++ ;7:+++++++++++++ ;75+++++++++++++ ;<75+++++++++++ =9>;68+++++++?@34 5875+++++++++++ ;79+++++++++++++ :78+++++++++++++ ;;7;+++++++++++ ;>556+++++++++'?34 BB7;+++++++++++ <7B+++++++++++++ 976+++++++++++++ =679+++++++++++ 9><6=+++++++++CDE :B79+++++++++++ F++++++++++++ =A75+++++++++++ ;67=+++++++++++ 99:++++++++++++?.''E 5B7B+++++++++++ F++++++++++++ =97A+++++++++++ =678+++++++++++ =5A++++++++++++

&G9

C%H"+/I+1")%$%/,+J&K

@#0L+D-.$$+)*/$",+.&+2"#%/1+J&F=K

M#.)&%/,+/I+1")%$%/,$+JNK

C/&.-+OP$7

&G;

?/&"$QR=7+OP$"#S"1+$(%&)*"$+#")/#1"1+I/#+&*"+:<><<<+2.&%",&+$.H2-"+1#.(,+I#/H+&*"+T.#U"&').,+1.&.P.$"+I/#+0$"+%,+"$&%H.&%/,7++3/0L*-V+9:N+/I+2.&%",&$+#")"%S"+,/+1#0L+).#"+.&+&*"%#+%,%&%.-+S%$%&+.,1+$/+1/+,/&+.22".#+%,+&*"+.P/S"+$(%&)*+$&.&%$&%)$7R;7+'"-")&"1+.,&%1"2#"$$.,&+)-.$$"$+$*/(,7++C*"+,0HP"#+/I+%,1%S%10.-+1#0L$+H/1"-"1+%,+".)*+$0P)-.$$+.#"Q+''34>+6W+?@34>+;W+'?34>+;W+CDE>+;W+?.''E>+=7++C*0$>+,/+2.&%",&$+$(%&)*+X(%&*%,+)-.$$X+(*",+$&.#&%,L+/,+.+?.''E7R

Table 3: Patients persisting on the initial drug choice (in %)

! " # $ % & ' (! !))*) !))*) !))*) !))*) !))*) !))*) !))*) !))*)" (%*! (&*( ((*+ (+*$ (+*& +)*$ +!*$# ($*$ ($*" ($*' (%*( (&*$ ((*"$ ("*# ("*) (#*) (#*+ (&*$% ()*" (!*) ("*" ($*$& '+*& ()*! ("*&' '+*# ()*'( '+*+,-./012314.5.605 %&*'7 !"*+7 +*#7 &*$7 $*$7 #*!7 "*!7 !*'7

809:5-1231;/0.5<0951=>?62401@A1<295-BC1>/06D/?>5?29614?6>09604EF/06D/?>5?291D2G951?910>?6240

H2506IJ!*1,2G/D01K1L0465.51L./M056D.91!"##$%&'()*!)('#+*(,-*.,&"/,0$%+1N.5.O.60P1C0./61"))#K"))%*J"*1,.<>B01D295.?961!)"P'")1G9?QG01>.5?0956P1R?5-1"&'P#+)12O60/S041>/06D/?>5?29612/12TD01S?6?561.D/2661>.5?09510>?62406*J12

Page 14: E cient Provision of Experience Goods: Evidence from

by drug subclass indicators; (3) side effect profile; (4) daily dosing frequency; and (5)

branded status. To simplify computation, I rely on a discretized version of the nausea

variable to proxy for the overall side effect profile of a drug.12

Products in this market are differentiated both horizontally and vertically. With

the same out-of-pocket cost, patients would prefer products with lower side effects and

more convenient dosing, albeit to different degrees. However, if these attributes and the

patient copayment were set equal across drug options, physicians would still prescribe

a diffuse set of drugs because the treatment effect of each drug differs by patient. I

account for these horizontal and vertical components by introducing the copayment

level, side effects, dosing frequency, as well as drug category indicators in Xr,Patientijt .

These variables have random coefficients, allowing patients to differ in their sensitivity

to each.

Two additional characteristics influence the expected outcome under a treatment.

First, the likelihood of recovery under each of the drug subclasses may depend on

the patient’s specific diagnosis. To allow expected outcomes to differ by diagnosis, I

interact an indicator for the severity of the patient’s diagnosis with the drug category

indicators. Second, I allow patients to prefer branded treatments, all else equal, by

creating an indicator variable for branded products. These additional variables enter

Xf,Patientij with coefficients fixed across individuals:

UPatientijt = aiX

r,Patientijt + bXf,Patient

ij + εPatientijt (1)

Here, εPatientijt accounts for a patient’s unobserved utility for choice j at time t. This

might reflect, for example, the patient’s exposure to advertising for drug j at time t.

The physician incorporates the patient’s utility when making her recommendation.

However, she may have distinct preferences. I explicitly allow for both observable and

unobservable physician-specific components in the utility function. First, I include

indicators for each drug compound in the specification.13 These indicators together

capture the physician’s unobserved private benefits to prescribing drug j. For example,

if the manufacturer of drug j routinely sends detailing agents to the physician, she may

respond by prescribing the drug more often. The estimation will then return a positive

coefficient on the j-specific constant.

12In interviews I conducted, physicians report switching drugs most often following complaints ofnausea.

13Because there are discrete product characteristics in the model, I include only those drug indicatorsthe data can identify. Table 4 includes results from a specification without drug-level indicators.

13

Page 15: E cient Provision of Experience Goods: Evidence from

Second, I allow the insurer’s costs to enter the specification. Insurers may pay

physicians using lump-sum capitated payments or may design bonus schemes that pay

out with a reduction in the total cost of the prescriptions a physician writes. I include

the insurer’s cost in the utility model with a random coefficient. I also interact a

capitation indicator with this cost. Xr,Physicianijt represents the insurer’s cost. All other

effects enter Xf,Physicianijt with coefficients common across individuals. Finally, εPhysicianijt

captures time-varying unobservables, such as whether the physician has manufacturer

samples on hand at period t. The joint utility, incorporating the patient’s utility in (1),

takes the following form:

UPhysicianijt = γUPatient

ijt + ciXr,Physicianijt + dXf,Physician

ijt + εPhysicianijt (2)

= (γai, ci)Xrijt + (γb, d)Xf

ijt + γεPatientijt + εPhysicianijt (3)

= β′iXrijt + α′Xf

ijt + εijt (4)

I cannot separately identify the relative weight of the patient’s preferences, denoted

γ, in the physician’s recommendation. I focus instead on estimating the parameters of

the joint utility function, (βi, α), with a single idiosyncratic term, εijt, that combines

patient and physician-specific unobservables. I allow the sensitivity to price, side effects,

subclass indicators, and dosing frequency to be specific to the physician-patient pair.14

I use a Bayesian Markov Chain Monte Carlo (MCMC) procedure for estimation.15

Four key findings emerge from the parameter estimates, reported in Table 4. First,

physicians respond importantly to patient price: the coefficient on log copayment equals

-.52 and is highly significant. The estimated standard error of the random coefficient is

.36, suggesting that 82% of individuals respond negatively to price increases. Physicians

may respond to copayment levels due to requests from patients or after receiving so-

called “blue letters”–warnings from an insurer when a physician prescribes expensive

medications too often. The implied elasticities fall near the center of the range of -

.8 to -.1 identified in a meta-regression analysis by Gemmill et al. (2007). Demand

for off-patent branded drugs and older generation treatments, like TCAs, prove most

elastic.

14Mixing over the distribution of sensitivity parameters produces correlations in the unobservedportion of utility for products with similar observed characteristics. This breaks the Independence ofIrrelevant Alternatives property of a simple logit model, which is undesirable in this setting (McFaddenand Train (2000)).

15To reduce computation time, I estimate the model using a random sample of 50,000 patients drawnfrom the raw data, retaining the entire time series for the selected patients.

14

Page 16: E cient Provision of Experience Goods: Evidence from

Tab

le4:

Mix

edlo

git

esti

mat

esbas

edon

firs

tch

oice

dat

a

!"#$

%#&$'!((

)*"#+#

!"#$

%#&$'!((

)*"#+#

!"#$

%#&$'!((

)*"#+#

,-+./'(+

.&01

'20-33424-.#"

56)7

89

*:$;<

:$:=

>?$@

*:$;5

:$:=

=:$@

*5$5=

:$:>

<A$B

56CDEF9

*5>$;?

:$@>

>:$B

*5=$>A

5$:;

5>$>

*55$:5

:$;@

5>$;

56%%EF9

5$A5

:$5B

?$@

5$<;

:$5@

5:$>

>$A@

:$5=

5B$?

56%C

EF9

A$<A

:$:=

5@?$B

<$5@

:$:?

?5$?

*5;$<@

>$@5

?$5

56C+%%8

9*:$B=

:$:>

AA$A

*5$::

:$:=

=5$=

*5$:=

:$:=

=<$A

56%,

9:$;<

:$:;

5:$5

*:$>=

:$5A

5$@

:$A:

:$:<

;$A

56D0"4.G'H5'I4JJ'I-('&+K9

*5$B=

:$:5

5@B$=

*>$:A

:$:=

?@$;

*5$=:

:$:5

;B$?

.+L"-+'(-

I0(#"'H@:#M'I2#4J-

*=>$@?

5$:@

=:$?

*5=$=:

5$5B

55$5

*:$;5

:$:>

A5$@

J0GN4."L(-('2

0"#O

*:$=5

:$:5

<B$5

*:$>;

:$:5

A>$A

*:$>5

:$:5

A:$:

J0GNI+

#4-.#'20I

+K1-.#O

*:$<>

:$:>

=:$@

*:$<>

:$:>

=A$5

*:$A;

:$:5

@=$:

P+(4+.

2-/'(+.

&01'20-33424-.#"

J0GN4."L(-('2

0"#O

:$:=

:$::

?$;

:$:>

:$::

B$@

:$:>

:$::

;$=

J0GNI+

#4-.#'20I

+K1-.#O

:$=>

:$:A

;$=

:$=@

:$:>

5<$=

:$:<

:$::

5>$>

Q4R-&'70-33424-.#"

56S(+.

&-&9

*:$A<

:$:=

5<$<

*:$<:

:$:=

5B$?

:$==

:$:>

5B$;

J0GN4."L(-('2

0"#OT'567+I

4#+#-&9

*:$:B

:$:5

55$;

5$'%0L

(2-'*')

M01"0.'E-

L#-("U',+(V-#%2+.'&+

#+S+"-/'K-+("'>::=*>::<$

N>O

W-"

W-"

W-"

>$'!"#41

+#40.'S+"-&'0.

'+'(+

.&01

'"+1

IJ-'03'<:/:::'I+

#4-.#"'3(01

'#M-'3LJJ'&+

#+"-#$'70-33424-.#"'03'#M-'M4-(+(2M42+J'1

0&-J'-"#41

+#-&

'L"4.G

'+'

,+(V0

X'7M+

4.',

0.#-'7+(J0'I(02-&L

(-$

C0#-"Y

P+(4+SJ-

W-"

C0

D(LG*"I-24342'34R

-&'-33-

2#"'4.2JL&

-&Z

N5O

N=O

F.#-(+2#40."'03'"I-

24+J#K'[

4#M'"LS

2J+""Z

C0

C0

F.#-(+2#40."'03'&

4+G.

0"4"'[

4#M'"LS

2J+""Z

W-"

W-"

15

Page 17: E cient Provision of Experience Goods: Evidence from

Table 5: Comparison of estimated drug fixed effects with advertising expenditures

!"#$%%&'()"*+),

#$%%-.$%%& !"#$%%&'()"*+),$%%-.$%%&

//01 23456*789"*:#;<)=*9"8> $?@A %?%& -B,A%B A,$AC -,%AA &$C//01 /)"6"*75D)#EF<#;G878H6> $?B$ %?%C I,@AC @,&%@ -,%-% JAJ//01 K7L8=)65D)#EF<#;M"8N*4> -?-% %?%& CI% -,ABI BC $I&//01 F56*789"*:#EO"#;F)7)=*> %?AA %?%A AJ& @,&@A &J A$-PF' Q8"6"596R75D)#EF<#;M*:)78"> .%?@B %?%J % % % %/Q01 SL78=)65D)#EF<#;FR:T*76*> .-?-$ %?%A % % % %

'()"*+)#:8D6U7R#(87L:)#8H#3*:97)3#9"8(5V)V#68#

9UR3545*D3#;%%%3>

/LT47*33 F8:98LDVK5=)V#2HH)46

/6V?#2""8"

'()"*+)#:8D6U7R#V)6*575D+####;W#%%%3>

Q86)3XY-?#S)6*575D+#)=9)DV56L")#")Z)463#9"8H)3358D*7#9"8:8658D#")48"V)V#5D#[)"539*D\3#M)"38D*7#/)775D+#'LV56]E83956*7#M)"38D*7#/)775D+#'LV56?Y$?#/*:975D+#(87L:)#")Z)463#6U)#9*4^*+)#(87L:)#8H#*#9"8VL46#9"8(5V)V#*3#3*:97)3#68#8_4).T*3)V#9UR3545*D3#6U"8L+U#(*"58L3#V)75()"R#:)6U8V3#;1`/#E)*76U>?Y&?#K5=)V#)a)463#)365:*6)V#5D#b"36#36*+)#:5=)V#78+56#:8V)7,#39)45b4*658D#;->?YC?#FR:T*76*#)D6)")V#6U)#*D65V)9")33*D6#:*"^)6#5D#$%%&,#*H6)"#6U)#9)"58V#8H#6U)#*V()"6535D+#V*6*#*(*57*T7)#H8"#*D*7R353?Y

Second, the coefficients on subclass indicators show that physicians have preferences

over classes, despite little evidence from clinical trials that second generation antide-

pressants differ in efficacy. Holding other observables constant, physicians on average

prefer SSRIs, SNRIs, and serotonin modulators. The mean effects will feed into the

dynamic model and influence the design of a new treatment protocol.

Third, paying physicians by capitation can influence decision making. Specification

(2) includes an interaction that allows capitation to change the physician’s sensitivity

to insurer costs. The coefficient of this interaction is negative and highly significant,

demonstrating that physicians paid by capitation are more sensitive to the total cost

of a drug. There are multiple hypotheses to explain precisely how capitation influences

decisions. Dickstein (2011) tests these hypotheses using both patient and physician-

level data.

Fourth, the drug-specific fixed effects in the model suggest a possible role for mar-

keting in the physician’s choice. In Table 5, I compare the estimated fixed effects with

promotional data from the pre-sample years of 2001 to 2003.16 The data include both

manufacturer expenditures on detailing as well as the volume of samples provided to

clinicians for each drug. The raw correlation between drug effects and various measures

16Personal Selling Audit/Hospital Personal Selling Audit. Verispan, Inc., 2003. Also, IntegratedPromotional Services, IMS Health, 2003.

16

Page 18: E cient Provision of Experience Goods: Evidence from

of advertising lies between .73 and .88. I exploit the correlation between advertising

and the fixed effects in later counterfactuals.

4 Learning Model

Patients react idiosyncratically to drug treatments, requiring the physician to adjust

her recommendation after the patient reports his experience. The divide between the

physician-agent and the patient complicates this dynamic choice: following a poor

reaction to a drug’s side effects or if the drug provides no relief of symptoms, the

patient may choose to quit treatment entirely. To ensure the best long-term outcome,

the physician must therefore recommend a sequence of treatments that balances the

expected relief from symptoms with a level of side effects or costs that will not cause

the patient to quit treatment prematurely.

I feature this trade-off in the joint patient and physician utility function by including

drug efficacy, price, side effects, and dosing convenience in the specification. I describe

below the Bayesian updating process physicians and patients use and the decision rule

they employ after updating.17

4.1 Utility and Outcomes

I reframe the joint patient and physician utility in (4) to separate the drug features that

patients and physicians learn about over time from those that are known ex ante. The

utility for individual i under treatment j in period t will be a function of: (1) known

elements, fijt; (2) a measure of the unknown effectiveness of drug j for patient i, vij;

and (3) a noise component, Wijt :

Uijt = fijt + vij +Wijt (5)

The variables in fijt include patient copayments, insurer prices, the branded status

of j, and drug-specific effects. At time t, both the agent and the econometrician know

their values; physicians do not learn about these attributes from experience. I include

an idiosyncratic logit shock, εijt, in fijt to capture time-varying unobservables, like

17The patient can remain in treatment or exit care but cannot switch to another physician duringa treatment episode. As a result, changes in medication reflect learning within one physician-patientpair. I eliminate episodes from the sample that involve switching from a general practitioner to aspecialist; this occurs in less than 1% of episodes.

17

Page 19: E cient Provision of Experience Goods: Evidence from

the availability of manufacturer samples. The econometrician does not observe this

element, though the physician and patient do. I apply the mean coefficients estimated

in the initial stage model to form fijt, implying that prior preferences are well-captured

using the cross section of initial choices.

The second component, vij, reflects the patient’s match to drug j in terms of health.

In interviews I conducted, physicians report switching treatments when the initial drug

failed to improve the patient’s symptoms, when it led to intolerable side effects, or when

it proved inconvenient to take at the required daily intervals. I define vij as a linear

function of those drug attributes, zj, that cause physicians to reevaluate a patient’s

treatment regimen: (1) drug subclass variables to proxy for efficacy; (2) a discretized

nausea side effect variable built from the clinical trials literature, and (3) a dosing

frequency indicator.

The characteristics of drug j, zj, are known to the physician and the econometrician.

However, neither knows ex ante how the patient will react to these characteristics. The

coefficients on the zj are thus individual-specific. Physicians and patients form normal

priors on this coefficient vector, βi, to update with patient i’s experience:

vij = zj ∗ βi, where βi v N (βi,0, V0) (6)

Here, I set the mean of the distribution of βi equal to the estimates from the initial

condition. The elements of βi are independent such that V0 is a diagonal matrix.

Finally, Wijt is an unknown shock to outcomes that is independent of the compo-

nents in fijt and vij and may vary by (i, j, t). It follows a normal distribution with

variance, σ2W , and a mean set equal to zero. It reflects individual or drug-specific at-

tributes affecting the patient’s health outcome that are not captured by the included

characteristics. For example, if a patient loses his job or has a family crisis in period

t, the physician cannot easily determine whether the patient’s outcome relates to the

drug he tried or to external factors.18

The observed outcome depends on both the drugs’s effectiveness and environmental

features:

yijt = vij +Wijt = Uijt − fijt18Randomness in outcomes may also relate to the individual’s temperament. Physicians may have

difficulty identifying the effectiveness of the treatment when patients report a range of symptoms orhave multiple comorbidities. To capture this, one could extend this model to allow the variance todiffer by blocks according to the Charlson comorbidity index, a measure of a patient’s overall health(See Charlson et al. (1987)).

18

Page 20: E cient Provision of Experience Goods: Evidence from

Conditional on βi, yijt follows a normal distribution with variance σ2W . I collect out-

comes for t = 1, ..., T independent realizations: yi,1

...

yi,T

|z(di,1), ..., z(di,T ), βi v N

z(di,1)

...

z(di,T )

βi, σ2W ∗ IT

(7)

Here, di,t is the identity of the individual’s choice at t and z(di,t) is the observable

vector of characteristics of that choice. The distribution is not degenerate because of

the unobservable, Wijt. In the structural model of the agent’s choice, the decision at

time t depends only on yi,s from prior periods s = 1, ..., t− 1.

4.2 Bayesian Updating

The physician’s goal is to learn about the patient’s sensitivities to a set of drug char-

acteristics. By specifying the patient’s preferences as a function of drug attributes, I

permit spillovers in learning. The physician and patient learn about the patient’s match

to all drugs that share at least one characteristic with the drug sampled.

The physician and patient update their beliefs about the coefficients, βi, after observ-

ing realized outcomes under previous drug recommendations. Formally, using Bayes’

rule, the posterior for βi at time (T + 1) is the product of two components: a likelihood

on the T outcome realizations to date, Yi,T = (yi1, ..., yiT )′, and a prior distribution on

βi:

P (βi|Yi,T , Z) = L(Yi,T |βi, Z)g(βi|βi,0, V0)

where Z is a TxK matrix holding the covariates, zj, in row t if the physician chose drug

j at t. I repeat expressions (6) and (7), in which I specify a normal prior on the K-

dimensional βi and independent normal distributions for yi,t at t = 1, ..., T , conditional

on (Z, βi):

βi v N(βi,0, V0)

Yi,T |Z, βi v N(Zβi, σ2W ∗ IT )

Given the normal likelihood and prior, the posterior is normal with parameters

19

Page 21: E cient Provision of Experience Goods: Evidence from

(β∗i,T , V∗i,T ) after updating from T periods of experience:

βi|Z, Yi,T v N(β∗i,T , V∗i,T )

V ∗i,T = (V −10 + Z ′Z(σ2W )−1)−1

β∗i,T = V ∗i,T ∗[V −10 βi,0 + Z ′Z(σ2

W )−1 ∗ βi]

where βi = (Z ′Z)−1Z ′Yi,T

The updating process is akin to a Bayesian analysis of the classical regression model,

here a normal linear model (See Gelman et al. (2004) or Zellner (1971)). This appears

clearly from the manner in which Yi,T enters the posterior mean: the predicted βi uses

the standard formula for a regression coefficient. The updated variance, V ∗i,T , combines

two sources of variation. The first, V −10 , reflects uncertainty about the true value of βi;

the second term is due to fundamental variability from the model.

The values of the posterior parameters over time depend on the rows of Z and on

the size of σ2W . The larger is σ2

W , the less information the agent gains from experience,

since large σ2W implies noisy outcomes. From the expressions for (V ∗i,T , β

∗i,T ), the terms

involving σ2W will go to zero. The posteriors (β∗i,T , V

∗i,T ) then approximately equal their

priors, (βi,0, V0).

4.3 Physician’s Recommendation

After updating her beliefs on βi conditional on outcome realizations, Yi,T , from periods

t = 1, ..., T , the physician must recommend an option in period T + 1. After processing

the information from past experience, she can follow one of two approaches in her next

recommendation. First, the physician may learn from experience but fail to consider the

future implications of her decision. I label this behavior ‘Bayesian-myopic’, following

the terminology of Brezzi and Lai (2002). The Bayesian-myopic physician maximizes

expected utility, given past experience:

maxj∈1,...,J

E(Ui,j,T+1|βi,0, Z, Yi,T ) (8)

She updates her priors at T + 1 using outcomes, Yi,T , from t = 1 up to T .

In the data, physicians occasionally switch a patient from the first drug chosen to

one with a much lower expected outcome. To rationalize this pattern with the Bayesian-

myopic model requires very large variances on the estimated coefficients, βi. That is,

20

Page 22: E cient Provision of Experience Goods: Evidence from

the patient’s outcome would have to be quite extreme for the physician to update her

priors in a way that raises the expected outcome of a seemingly weak option above all

others in the choice set.

I rationalize the observed patterns using a second approach. I allow the physician to

look forward, accounting for the patient’s outcome over his entire episode of care. Under

this assumption, the physician may experiment with lesser-known options to identify

those that produce a superior outcome for her patient. Below, I define a decision rule

that solves the physician’s dynamic discrete choice problem. I compare the fit of this

forward-looking decision rule against estimates from a Bayesian-myopic model.

4.3.1 Dynamic Programming Solutions to the Drug Choice Problem

Given the set of choices defined in characteristic space, j = 1, ..., J , the physician and

patient choose a sequence of drugs to maximize the discounted sum of returns:

∫...

∫Eβ1,...,βk

(∞∑t=1

δt−1Yt

)dΠ(1)(β1) · · · dΠ(k)(βk) (9)

Here, δ is given and βk are the unknown coefficients in the linear parameterization for

the mean outcome of Yj. The agent forms independent priors, Π, on the elements of β.

The observed state variables include the mean and variances of the coefficient vector,

β.

Most empirical work on dynamic discrete choice problems directly applies dynamic

programming to solve the sequence problem in (9), using the methodology Rust (1987)

or Hotz and Miller (1993) propose. This approach requires conditional independence:

the observed state variables are independent of the unobservables in the problem, con-

ditional on the past choice and past observed state. The unobserved state variables

themselves have independent and identical distributions. In this set-up, the observed

state vector is a sufficient statistic for the current choice. The computational burden

depends only on the dimension of the observed state variables.

For the drug choice setting and for similar choice problems involving learning, the

conditional independence assumption is unlikely to hold. First, there is an unobserved

level of effectiveness of product j for patient i that is persistent over time. The observed

state variables are then no longer a sufficient statistic for the contemporaneous choice;

lagged choices contain information on this permanent unobserved component of utility.

Second, the transitory unobserved shocks to utility may be correlated across choices.

21

Page 23: E cient Provision of Experience Goods: Evidence from

The integrated value function in Rust (1987) then no longer has a convenient closed

form.

Keane and Wolpin (1994) adjust the dynamic programming framework to accommo-

date these features. Their approach assumes a finite mixture over the persistent unob-

served state variable and uses simulation to compute the multidimensional integrals in

the value function. Crawford and Shum (2005) apply this approach in related work on

dynamic drug choice. Unfortunately, with larger choice sets, the dynamic programming

solution Keane and Wolpin (1994) provide faces the classic “curse of dimensionality.”

In the antidepressant setting with twenty distinct choices, the method is impossible to

compute. Even when I parameterize choices as a linear function of a smaller set of

k characteristics, the simulation needed to compute k-dimensional integrals for each

patient-physician pair remains costly.

The backward induction process underlying the dynamic programming approach

drives the computation time. I exploit features of the drug choice setting that allow

me to apply solutions that follow forward induction. Under forward induction, I can

transform the multidimensional integrals in the dynamic problem into a series of one-

dimensional integrals, easing the computational burden. I describe this approach below.

4.3.2 An Index Rule for Dynamic Drug Choice

A forward induction rule requires two maximizations. In an inner maximization, the

agent considers each choice as if it were optimal and determines a stopping time at

which she would prefer to retire rather than continue taking the drug. The agent solves

a one-dimensional optimal stopping problem of this type for each choice. The sum of

returns under each choice, were it taken for the optimal length of time, is the “index”

of the choice. In an outer maximization step, the agent selects the option with the

largest index. This procedure repeats in each period after the agent updates her priors

from past outcomes. I provide a formal description of this procedure in Section 9.1 of

the technical appendix.

Gittins and Jones (1979) prove that the forward induction rule solves the sequence

problem in (9) if the following conditions hold: (1) the decision-maker selects only one

option at t; (2) the options not selected do not contribute to the individual’s outcome;

(3) the options not chosen will produce the same average outcome in later periods as

22

Page 24: E cient Provision of Experience Goods: Evidence from

they would in the initial period; and, (4) the options are independent.19

The setting of antidepressant choice satisfies conditions (1) and (2): physicians

recommends only one drug per period and patients do not suffer side effects or health

benefits from drugs they do not take. Condition (3) also holds in this setting because

the effectiveness of a drug is not path dependent. When a patient begins on drug A,

all the alternatives, call one B, will remain available with the same effectiveness. That

is, prescribing drug A, followed by B, would be equivalent to choosing B then A, up to

a discount rate.20

The linear structure I place on outcomes under each drug allows me to apply a

decision rule for independent options, under Condition (4). A ‘choice’ is a unique

bundle of k attributes contained in the vector of characteristics, z. Agents learn only

about the independent coefficients, β, that serve as weights on the covariates in z.

Rusmevichientong and Tsitsiklis (2010) connect this linearly-parameterized problem to

the standard independent choice problem.21 When the set of unique combinations of

covariates is near infinite, the authors provide a distinct decision rule and show it has

desirable properties. When the space of covariates has a limited number of extreme

points, a standard index rule succeeds because the correlation across drugs enters the

updating process alone; the inner maximization step in the forward induction rule

remains simply a comparison among unique bundles of covariates.

To apply a forward induction rule in my setting, I must compute the one-dimensional

optimal stopping problem for each unique combination of covariates and at each period

in the patient’s episode. I employ a closed form approximation to the optimal stopping

problem provided by Chang and Lai (1987). This numerical approximation is asymp-

totically optimal as the discount rate, δ, approaches 1. I provide more detail on this

19See Mahajan and Teneketzis (2007). Whittle (1981) proves that Gittins’ index rule is also optimalin settings in which new products enter the choice set, provided the entry is independent of the decisionmaker’s past choices.

20This would not be true if the initial drug changed the individual’s physiology such that it forecloseduse of a second treatment. In the antidepressant example, ignoring minor discontinuation periods,physicians can prescribe drugs in any order without changing their treatment effects.

21I replicate the argument in Section 4.1 of Rusmevichientong and Tsitsiklis (2010). If the set ofchoices, Zk, is a polyhedral set, then Zk has a finite set of extreme points. I denote this set E(Zk).By a standard linear programming result, for all β ∈ <k:

maxz∈Zk

z′β = maxz∈E(Zk)

z′β, where β ∈ <k

We can treat each z ∈ E(Zk) as a unique choice and proceed using algorithms appropriate for settingswith independent options. This is feasible only when the set, Zk, has a small number of extremepoints.

23

Page 25: E cient Provision of Experience Goods: Evidence from

final computational step in Section 9.2 of the Technical Appendix.

The patient and physician maximize the following index calculated for each choice:

maxj∈{1,...,J}

E(Ui,j,T+1|βi,o, Z, Yi,T ) + σi,j,T+1 ∗ h(δ)

where E(Ui,j,T+1|.) is option j’s expected utility at (T + 1) and σi,j,T+1 is the posterior

standard deviation of j’s outcome. I update the coefficients in the normal linear model

for Ui,j,t via the least squares approach described in Section 4.2. In contrast to the

Bayesian-myopic rule in (8), the index rule accounts for the variance in outcomes under

each choice. This rule matches the intuition described earlier: the physician balances

her impulse to choose the drug with the highest expected utility against the incentive

to experiment with lesser-known but potentially superior treatments.22

The function, h(δ), represents a numerical approximation to the boundary of the

one-dimensional optimal stopping problem for each drug. It declines when physicians

discount the future more heavily. When δ is small or when past experience diminishes

σi,j,T+1, the second term shrinks and the decision rule closely resembles the Bayesian-

myopic rule.

5 Algorithm

5.1 Forming Choice Probabilities

I combine information from the initial condition and the learning model to form choice

probabilities. The probabilities begin with the index rule described above. Since the

rule balances the expected mean utility of a choice with its posterior variance, it depends

directly on fijt, the known component of mean utility; on (βi,0, V0), the priors in the

effectiveness component; and on random outcome realizations from periods 1 through

(t − 1), collected in the vector, Yi,t−1. Here, fijt includes an additive extreme value

22The statistics literature labels decision rules of this form “upper confidence bound” designs (Gine-bra and Clayton (1995)). The approach accounts for the physician’s incentive to select a drug froma subclass with a large number of options available. She does so to learn quickly about the qualityof the entire group of products. The prior mean of β for such subclasses will be high both when theexpected quality is high and when there is potential to learn quickly about many options in the choiceset.

24

Page 26: E cient Provision of Experience Goods: Evidence from

error. The probabilities therefore take a logit form:

hijt(βi,0, V0, Yi,t−1) =exp(Gijt(βi,0, V0, Yi,t−1))

1 +∑

k exp(Gikt(βi,0, V0, Yi,t−1)

where Gijt is the index rule.23 However, I cannot use this directly in estimation, since,

as the econometrician, I do not observe either the vector Yi,t−1 or βi,0. I therefore

integrate over their distributions. Yi,t−1 follows the multivariate normal distribution in

expression (7). I integrate the probability of the observed sequence of choices up to and

including period t with respect to past outcome realizations from periods 1 through

t− 1: ∫Yi,t−1

hit(βi,0, V0, Yi,t−1) ∗ ... ∗ hi2(βi,0, V0, Yi,1)(f(Yi,t−1|βi,0, σ2W )dY

βi,0 should come from the initial stage mixed logit model. However, again I do not

have a direct estimate of it. Since the initial stage relies only on a single observed

choice per individual, I can consistently estimate the population parameters, (b,W ), of

the distribution of βi,0 but not βi,0 itself. Instead, I use Bayes’ rule to transform the

population parameters from the initial stage into draws from a distribution of β that

conditions on di1, the observed drug choice in the initial period:

k(β|di1, Xi, b,W, α) =P (di1|Xi, β, α)g(β|b,W )∫P (di1|Xi, β, α)g(β|b,W )

The conditional posterior reflects the distribution of β for the subset of agents who

would choose di1 when Xi defines the choice situation.24 This posterior does not have a

closed form. I collect R simulation draws from it via a Metropolis-Hastings algorithm.25

Thus, the choice probabilities I use in the likelihood function involve simulation:

Pi =R∑r=1

[∫Yi,t−1

hit(βri,0, V0, Yi,t−1) ∗ ... ∗ hi2(βri,0, V0, Yi,1)(f(Yi,t−1|βri,0, σ2

W )dY

]23I normalize the index for the outside option to zero, such that exp(Gi,Outside,t(.)) = 124As discussed by Train (2003), conditional on the underlying parameters of β, (b,W ), the distribu-

tion k(.) is exact. However, I estimate (b,W ) as (b, W ). In my bootstrap procedure to estimate the

standard errors of the second stage parameters, I account for the variation of (b, W ) around (b,W ).25Collecting these draws requires no additional effort. Because I estimate the initial stage mixed logit

via a hierarchical Markov Chain Monte Carlo procedure, I have in storage draws from this conditionalposterior as a by-product of the estimation routine.

25

Page 27: E cient Provision of Experience Goods: Evidence from

I integrate out the unobserved outcomes, Yi,t−1, over the smooth functions his(.) using

Gauss-Hermite quadrature.26

5.2 Simulated Maximum Likelihood

With these simulated choice probabilities for the sequences chosen, I form the following

log likelihood to maximize:

Log(L) =N∑i=1

log(Pi)

I search over the diagonal elements of V0 and the variance of the idiosyncratic out-

come term.27 I find the appropriate standard errors of the resulting estimates using a

bootstrap procedure described in the Technical Appendix (Section 9.3). In brief, I take

random draws from the initial stage posterior distributions and use each in a bootstrap

sampling procedure that computes the learning model. I account for variation in βi,0

away from E(βi,0) and for the fact that I plug in estimates of the population parameters

of βi,0 found in the initial stage.

5.3 Identification

There are three distinct sets of parameters to identify. First, I estimate the param-

eters of a mixed logit model to set the physician and patient’s priors. In the initial

condition problem, each patient faces plan-specific choice characteristics that also vary

by the calendar year of the patient’s visit. The variation in the choice characteristics

produces distinct initial period drug shares, which identifies the fixed coefficients and

the population mean and variance of the random coefficient distribution.28

Second, using the panel data, I estimate the variances on the coefficients in the

linear outcome function. Similar to Crawford and Shum (2005), the identification of

these variances depends on how aggregate drug shares change from the first period

of treatment through to later periods, calculated across patients. If the shares of a

26Each element of Yi,t−1 = (yi,1, ..., yi,t−1) follows a normal distribution. The mean and variance ofthe distribution in each period s depend on the choice at s. This allows me to simplify the dimension ofthe integration to be only as large as the number of distinct choices in the sequence. See Judd (1998)for a discussion of the numerical properties of quadrature.

27As a check on the consistency of my estimation via simulated maximum likelihood, I increasetenfold the number of simulation draws used in the optimization. The results change very little.

28Bajari et al. (2009) and Berry and Haile (2009) provide a formal nonparametric identificationargument for the distribution of the random coefficients in a mixed logit model.

26

Page 28: E cient Provision of Experience Goods: Evidence from

product change little, then the model would fit smaller variances on the coefficients of

characteristics that compose that product. For example, if a drug in the SNRI class

maintains roughly the same market share at each point across the sample of patient

episodes, the variance on the SNRI indicator’s coefficient will be small. That the

physician rarely switches away from the initial drug suggests she had tight priors on

the likely outcome under the treatment.

Finally, I estimate the variance of the idiosyncratic shock to the patient’s outcome.

This variance, σ2W , relates to the rate of learning within each patient’s episode. With

small variance in the shock, switching will occur faster after the initial period. As

the variance in the idiosyncratic component grows larger, physicians and patients learn

relatively less about the quality of the match to a treatment from experience. Physicians

will then require a greater number of observed outcomes before recommending a switch.

6 Dynamic Model Estimation

6.1 Results

The parameters identified from the switching patterns in the panel data include the

variance of the idiosyncratic shock to outcomes and the variances of the coefficients in

the linear outcome function. I report the estimated standard deviations in Table 6,

along with bootstrapped standard errors.

The variances provide insight into the physician’s priors, conditional on the means

fixed from the initial condition. A large variance on the coefficient for a particular

subclass, for example, implies significant prior uncertainty on the efficacy of the class

for any patient. Conditional on the mean, the larger the variance of such coefficients,

the greater is the experimentation incentive in the index rule for drugs with these

characteristics.29 This invites physicians to try out the drug. Following experience,

rates of switching away from the treatment will also be high, since the index value can

drop substantially when the posterior variance of a drug’s effect shrinks. In addition,

large variances on the idiosyncratic component of the outcome implies slow learning—

the physician and patient update very little from experience since outcomes are noisy

signals of a drug’s effectiveness.

29Large variance in outcomes from a treatment might be undesirable from a malpractice perspectiveif it increases the risk of suicide or adverse reactions. As noted in the development of the initial periodmodel, the level of the estimated drug effect and other discrete components can account for the riskof rare side effects.

27

Page 29: E cient Provision of Experience Goods: Evidence from

Table 6: Model estimates

!"#$"%&' ()*$+"*' ,*-./(##0# !"#$"%&' ()*$+"*' ,*-./(##0#1'"2) 1'"2)!"#$%& '()*+ ()(, -./0123456578.39: '(),! ()(!!";<=>& '!?)*@ ()A? -./0BC9162978.BCDE629: '()+? ()(?!"FF=>& !)G! ()!H !"I5C2J6J& '()G+ ()(,!"F;=>& G)+G ()(, !"$6-6KC& ()++ ()(+!";CFF%& '()H, ()(? !"$DEIC-9C& '!)!? ()(+!"FL& ()*+ ()(* !"M6KCB5.& ?)@+ ()(,!"<.312/7N!7B1--7B657JCD& '!)H, ()(! !"O-4.K691267P$M& !)!( ()(,!"2C436C756B.5937NA(9Q7B891-6& ',?)A@ !)(A !";.5951B9D-1267P$M& '()@A ()(H,*"2-"#-/3'4$"*$02)5 !"R.-.S9& ?)A? ()(G!"#$%& ()H ()(!";<=>& !+)+ !)G!"FF=>& ?!)G ?),!";CFF%& (), ()!!"FL& ()+ ()!!"2C436C756B.5937NA(9Q7B891-6& ??)? ?)@>J1.3D285C91878.EB.26297.S7.498.E63 ?,), ?)A

60'77$8$'2*)/02/9:*80+'/6;"#"8*'#$)*$8)< =$>'-/6;"#"8*'#$)*$8)?

;.963TU!)7#Q67E6C237.S79Q678.6V81629371279Q67BC5CE69651WC91.27.S7.498.E637C5676391EC96J71279Q6712191C-739C/67E.J6-)77#Q67XC51C286370-.Y657BC26-:78.E67S5.E79Q6734I36Z46297ECK1E4E7-1[6-1Q..J75.49126)U?)7#Q67\K6J78.6V8162937C5676391EC96J71279Q6712191C-739C/67E.J6-)77#Q67E.J6-7C-3.7128-4J63712965C891.237I69Y6627BC916297J1C/2.3637C2J734I8-C33712J18C9.53]79Q63678.6V8162937C56734BB56336J7CI.X67S.578-C519D)U,)7F9C2JC5J7655.537S.579Q67368.2J739C/67BC5CE6965376391EC96J7X1C7C7I..9395CB7563CEB-12/7B5.86J4567J63851I6J71279Q67#68Q218C-7%BB62J1K)U

Looking at the estimates, the NDRI and SSRI subclasses have the largest variances,

holding constant nausea effects and dosing. These are often the first treatments pre-

scribed to mildly sick patients, and there is a great deal of uncertainty at this stage

about the value of drug treatment in general. The SSRI subclass also has the largest

number of compounds available within it, making it a desirable class to test out initially.

The significant variance on the side effect variable again suggests physicians begin

with uncertainty about a specific patient’s sensitivity to this drug characteristic. The

variance on dosing and on several of the class indicators are not significantly different

from zero, implying that physicians have tight priors on the relationship between these

characteristics and outcomes. Finally, the size of the idiosyncratic variance fits the high

observed persistence on treatment of 60-70% across decision points. Experience offers

too noisy a signal of a patient’s true match to a drug to allow substantial updating in

each period.

28

Page 30: E cient Provision of Experience Goods: Evidence from

6.2 Model Fit

I assess model fit in two ways. First, I use the model to predict the aggregate prob-

abilities of switching to the outside good. I compare the observed transitions against

predictions from the dynamic model, from the Bayesian-myopic model, and from a static

model. Second, I compare how well each of these three models predicts the precise drug

selected for the patient at each point within the treatment sequence.

To determine the static predictions, I rerun the mixed logit specification from the

initial stage, feeding in the entire sequence of observed choices for each patient’s episode.

The procedure is static in that it treats all prescriptions chosen within an episode as

independent. For the Bayesian-myopic model, I run an estimation procedure similar

to the full dynamic analysis. The physician and patient update their priors from past

experience, but use a decision rule that maximizes only the expected mean of a choice

without future considerations.

0

10

20

30

40

50

60

t=2 t=3 t=4 t=5 t=6

Share  of  patients  on  drug  care  (in  %)

Decision  Point  within  a  patient'ʹs  treatment  episode

Observed  Data Dynamic  Model  Prediction Bayesian  Myopic  Prediction

Static  Model  Prediction  (55%)

Figure 3: Share of Patients on Drug Care over Six Decision Points. The static modelpredicts a share of 55% at all points (shown as a line in the figure).

Looking first at aggregate transitions, Figure 3 illustrates that the dynamic model

matches closely the observed pattern of patient exit from drug care, with the exception

of decision point 2. The static model fits poorly both at decision point 2 and at all future

29

Page 31: E cient Provision of Experience Goods: Evidence from

decision points: it predicts a constant share over time, which the data clearly reject.

Figure 3 also illustrates precisely how predictions from the dynamic model differ from

those of the Bayesian-myopic model. Relative to the observed share, the Bayesian-

myopic model underestimates the share of patients remaining in care. Without the

dynamic incentive to try out goods that are less well understood, the Bayesian-myopic

model predicts greater exit to the outside option of no drug treatment.

In a second fit measure, I compute how often the dynamic model correctly predicts

an individual’s observed choice over six decision points. The dynamic model is unlikely

to fit perfectly, given the restrictions placed on the set of covariates. However, as shown

in Table 7, the observed choice matches one of the top three predictions for that patient

at a rate of around 65%. Expanding to the top five choices, the model matches the

observed choice in 77-82% of patients. I compare this fit to the Bayesian-myopic model

and a static model based on market shares. I report the results in Table 7. The dynamic

model selects the observed choice nearly twice as often as does the static model.

The model faces the most difficulty in predicting the exact date the patient will

exit treatment. In the second set of columns in Table 7, I omit from the statistics

predictions at decision points at which an individual exits treatment. For the subset

of periods where the individual chooses an inside good, the model’s top 3 and top 5

predicted choices match the observed choice at rates of 80-85% and 85-90%, respectively.

This fit measure also illustrates sharply the trade-off between using the full dy-

namic model and the Bayesian-myopic model. In simulations reported in Table 7, the

Bayesian-myopic model outperforms the full dynamic model only in early periods where

the individual exits treatment. In later periods and in periods in which the physician

recommends an inside good, the dynamic model more often predicts the observed choice.

By allowing the physician to reason forward in her selection, the dynamic model can

explain persistence in treatment and the tendency in the data for physicians to try out

lesser-known alternatives.

7 Counterfactual Experiments and Patient Outcomes

I return to the public policy question motivating the empirical work, employing the

dynamic model to predict how copayment policies and promotion schemes affect the

insurer’s net drug costs and adherence.

I simulate the experience of 10,000 sample patients diagnosed in 2005, the latest

30

Page 32: E cient Provision of Experience Goods: Evidence from

Tab

le7:

Model

fit

-M

atch

ing

pre

dic

ted

sub

clas

san

dpro

duct

choi

ces

toob

serv

edse

lect

ions

by

pat

ient.

!"#

!"$

!"%

!"&

!"'

(!)*+$(+,-

./01

+23+13

.-456+4

)107

'%89

''8#

'&8:

'%8:

'$8%

(!)*+$(+,-

./01

+23+;-

30<5-.

+43)

*56+4)1

079&8%

:'8=

'=8=

&=8>

$:8:

(!)*+$(+,-

./01

+23+<?-,0

$$8&

$#8%

$>89

$>8:

$>8#

(!)*+&(+,-

./01

+23+13

.-456+4

)107

:'8:

9#8%

9#8>

9@8=

:98=

(!)*+&(+,-

./01

+23+;-

30<5-.

+43)

*56+4)1

07=#8:

9&89

9@8&

:@8=

%98@

(!)*+&(+,-

./01

+23+<?-,0

%'8'

%&8$

%%8:

%%8&

%$8=

A)!-7+*

-!50.!+)2<0,B-!5)

.<C

#=D$&9

++>=D=%$

++>$D==$

++=D=@>

++++

:D@'#

++++

!"#

!"$

!"%

!"&

!"'

(!)*+$(+,-

./01

+23+13

.-456+4

)107

9%8&

9#8$

:=8@

:&8$

:>8$

(!)*+$(+,-

./01

+23+;-

30<5-.

+43)

*56+4)1

079@8:

:>8%

'$8=

&#8:

$#89

(!)*+$(+,-

./01

+23+<?-,0

$@8:

$@8'

$@8&

$@8&

$@8#

(!)*+&(+,-

./01

+23+13

.-456+4

)107

=>8'

998#

9%8:

9#8@

:=8@

(!)*+&(+,-

./01

+23+;-

30<5-.

+43)

*56+4)1

07=@8%

9#8>

:&8=

'%8:

%>89

(!)*+&(+,-

./01

+23+<?-,0

%%8:

%%8'

%%8%

%%8&

%%8>

A)!-7+*

-!50.!+)2<0,B-!5)

.<C

##D>'%

++>&D9>=

++>>D#='

++9D@:#

++++

&D9%@

++++

E+)F+*

-!50.!<+F),+G

?)4+!?

0+)2

<0,B01

+6?)

560+0HI-

7<+).0+

)F+!)

*+$+6?)560<

E+)F+*

-!50.!<+F),+G

?)4+!?

0+)2

<0,B01

+6?)

560+0HI-

7<+).0+

)F+!)

*+&+6?)560<

JK-4

5.-!5).

L,01

56!5)

.<+F)

,+5.<510+-.

1+)I

!<510+M)

)1<

L,01

56!5)

.<+F)

,+5.<510+M)

)1<+).

73

JK-4

5.-!5).

N!-!5<!56

E+)F+*

-!50.!<+F),+G

?)4+!?

0+)2

<0,B01

+6?)

560+0HI-

7<+).0+

)F+!)

*+$+6?)560<

E+)F+*

-!50.!<+F),+G

?)4+!?

0+)2

<0,B01

+6?)

560+0HI-

7<+).0+

)F+!)

*+&+6?)560<

N!-!5<!56

O)!0<CP

>8+Q5!+<!-!5<!56<+,073+).+0<!54

-!01

+*-,-4

0!0,<+F,)4

+<!0*<+>R#+)F+!?0+0<!54

-!5).+*,)601

I,08++A

?0+<!-!5<!56<+,06),1+!?

0+*0

,60.!-M0+)F+*

,0156!01

+6?)

560+<0HI

0.60<+!?-!+4

-!6?+!?

0+)2

<0,B01

+<0HI0

.60+5.+-+M5B0.+*0,5)1D+-66),15.M

+!)+!?

0+6,5!0

,5).

+)F+!?0+<!-!5<!568++(!(+10.)!0<+!?

0+*0

,5)1

+)F+!?0+1065<5).8+SF!0

,+*0,5)1+>D+!?

0+*,01

56!5)

.<+F)

,+*0,5)1<+!"

#D$D%+

6).1

5!5).

+).+!?0+)2

<0,B01

+6?)

560<+F,)4

+<">D888D!R>8P

#8+TU

-.1)

4T+,-.

/5.M

+*,)2-2575!50<+6-76I7-!01

+I<5.M

+!?0+?3

*0,M0)40!,56+15<!,52I

!5).+V<-4

*75.M+G5!?

)I!+,0*

7-6040.!W+

2-<01+).

+)2<0,B01+4-,/0!+<?-

,0<8P

year in my data. I run two sets of counterfactuals. In the first, I consider a series

of copayment schemes, including uniform pricing and detailed “value-based” insurance

designs, as described by Chernew et al. (2007). The insurer acts as a central planner,

changing the entire vector of prices for the choice set patients and physicians face. In

the second set of counterfactuals, I simulate the impact of promotion for generics and

31

Page 33: E cient Provision of Experience Goods: Evidence from

for two branded medications. I proxy for advertising by adjusting the drug fixed effects

that are correlated with manufacturer promotion. I report the simulation results in

Table 8. I use these predictions later to build a new treatment protocol.

To quantify changes in patient health under the alternative policies, I translate the

rate of adherence to an expected number of weeks the patient suffers from depressive

symptoms, assigning a dollar value to the relief from symptoms. I describe the calcu-

lation in Section 9.4 of the Technical Appendix. In brief, I use summary results from a

survey of psychiatric specialists that Berndt et al. (2002) conduct. The results include

the panel’s prediction of the median probability of full and partial recovery from de-

pressive symptoms under different treatment categories. They define categories by the

subclass of the treatment chosen and the length of time the patient actually adheres

to the treatment. I use the dynamic model to predict the drug/adherence category for

each patient in my sample; I assign patients the matching recovery probability from the

survey. The number of patients in my sample that fall into each category will differ by

counterfactual policy.

The recovery rates for a patient determine the expected number of weeks in the

four months following the diagnosis that he will suffer from depression. I rely on Lave

et al. (1998) and Cutler (2004) to translate this expected number of weeks into a

dollar measure. In the baseline case, the distribution of treatments prescribed leads

to an average reduction in depressive symptoms after 16 weeks worth $1957, versus a

drug cost of roughly $147. I compare the dollar value of symptomatic relief under the

counterfactual policies against the cost of drug care in Table 9.

7.1 Insurance Design

I measure the effect of three alternative copayment policies on insurer costs and patient

health. For the baseline case, I use the tiered pricing policies defined by the 307 benefit

plans in the database. In the first counterfactual, I set a uniform copayment of $0 for

all treatments. In the second, I set a “value-based” policy, as Chernew et al. (2007)

suggest. I set the copayments for generic treatments to $5 for a 30 day supply. For the

specific branded SSRI and SNRI treatments that lead to the fewest switches and highest

adherence in the raw data and under the model parameters, I also set the copayment

equal to $5. For all other branded treatments, including those with both moderate and

poor predicted adherence, I set the copayment equal to $50 per month supply, the 95th

percentile of observed copayments.

32

Page 34: E cient Provision of Experience Goods: Evidence from

Table 8: Counterfactual prescription drug costs to the insurer and adherence.

!"#$"%&' ( ) *!"#$%&'$ ()*"%+,)'*-%.+/012+3)#*#

4+567 +)),*-./// +)*,0+(/// )10,2*0///

8)#*#+9$0+9"*&$'*:+56;/".7 (3+(///////// (3+(///////// 430./////////

<"*&$'*#+"/-$0&'2+*)+*0$"*,$'*+5=7 (44345 -13)5 24315

>%%+3)9".,$'*#+#$*+*)+6? 8-"'2$+&'+"/-$0$'3$ 4345 (3-5 03158-"'2$+&'+,)'*-%.+/012+3)#*# 123.5 12315 ..3+5/1$+*)+3-"'2$+&'+"/-$0$'3$ :@A= :@A= B@4=

/1$+*)+#-&C*#+&'+/012#+3-)#$' DE@A= DE@F= DF@G=

8-"'2$+&'+"/-$0$'3$ 4345 (325 23+58-"'2$+&'+,)'*-%.+/012+3)#*# 6(03.5 6(03+5 6)4345/1$+*)+3-"'2$+&'+"/-$0$'3$ 4@:= 4@:= G@F=

/1$+*)+#-&C*#+&'+/012#+3-)#$' H4B@D= H4B@G= H4F@D=

!"'+I0"'/$/+"/J$0*&#&'2 8-"'2$+&'+"/-$0$'3$ 4345 643.5 6)3)58-"'2$+&'+,)'*-%.+/012+3)#*# 6))3(5 6))3(5 6)13)5/1$+*)+3-"'2$+&'+"/-$0$'3$ H?@K= H?@K= H4@?=

/1$+*)+#-&C*#+&'+/012#+3-)#$' H:4@D= H:4@D= H:E@:=

8-"'2$+&'+"/-$0$'3$ 4345 (3+5 13-58-"'2$+&'+,)'*-%.+/012+3)#*# )*3(5 )*345 )2325/1$+*)+3-"'2$+&'+"/-$0$'3$ 4@F= 4@F= E@F=

/1$+*)+#-&C*#+&'+/012#+3-)#$' :4@G= :4@G= :G@?=

8-"'2$+&'+"/-$0$'3$ 4345 43+5 (3158-"'2$+&'+,)'*-%.+/012+3)#*# (325 (315 -345/1$+*)+3-"'2$+&'+"/-$0$'3$ ?@K= ?@K= 4@D=

/1$+*)+#-&C*#+&'+/012#+3-)#$' 4@4= 4@4= F@D=

788'9:;/<8/=#<><:$<?"&/=<&$9$';

@<A?:'#8"9:A"&/B9'?"#$<C'9$;$<?/=<$?:

788'9:;/<8/@<D"E>'?:/=<&$9$';

8)9".+C)0+2$'$0&3#+"'/+

90$C$00$/+LLMN;LOMNP6?Q+

"%%+)*-$0#P6D?

>/J$0*&#$+-$"J&%.+C)0+

R)%)C*

>/J$0*&#$+-$"J&%.+C)0+

8.,I"%*"

O)*$#ST

4@+()*"%+,)'*-%.+/012+3)#*#+$U1"%+*-$+0$3)0/$/+&'#10$0+3)#*+9$0+G?+/".+#199%.+%$##+*-$+

9"*&$'*V#+/012+3)9".+9$0+G?+/".+#199%.Q+C)0+*-$+#$*+)C+9"*&$'*#+)'+"3*&J$+*0$"*,$'*@++T

:@+8)#*#+9$0+9"*&$'*+0$W$3*+,)'*-%.+/)%%"0+3)#*#+/&J&/$/+I.+*-$+*)*"%+'1,I$0+)C+9"*&$'*#+*-$+

&'#10$0+#$0J$#HHX-$*-$0+)'+"3*&J$+*0$"*,$'*+)0+')*@T

G@+Y"%1$#+3"%31%"*$/+C)0+4?Q???+#",9%$+9"*&$'*#+"*+&'/&J&/1"%H#9$3&Z3+90&3$#@T

!

For all three policies, I simulate rates of adherence over three decision points and

calculate net costs to the insurer, which equal the cost charged by the manufacturer less

the copayments paid by patients.30 I translate adherence to a reduction in depressive

30Copayments typically return to the insurer indirectly via a bargaining process with the drugmanufacturer (See Levy (1999)).

33

Page 35: E cient Provision of Experience Goods: Evidence from

symptoms, denominated in dollar terms, and judge the performance of each policy by

its effect on the trade-off between insurer cost and patient health.

Looking first at the $0 uniform copayment policy, the main effect is to increase

the use of branded drugs relative to their generic counterparts. Physicians in the data

express strong preferences for particular brand names and so prescribe them readily

when price effects disappear. The policy leads to a 8.5% increase in adherence by period

3, stemming largely from the increased use of effective SSRIs and SNRIs. However, the

improvement comes at an added cost equal to 58-66% of the baseline level. By lowering

all copayments to zero, the insurer loses the revenue from these patient contributions

and fails to encourage the use of cost-effective treatments. The policy leads to a $115

increase in the value of health above the baseline average of $1957 over four months of

care, but at an added cost of $88.

The “value-based” design offers a four month payoff of $100, below but similar to the

$0 uniform price case. Patients shift to the most effective SSRIs and SNRIs, incentivized

by copayments of $5 per month relative to $50 for branded drugs with moderate or poor

expected adherence. As a result of this shift, adherence increases by 7.4% into the third

month of care. The change in the distribution of treatments prompts an 18-20% decline

in the cost of care, largely due to the shift toward less expensive generic options. The

price incentives that produce the $100 increase in health also reduce the average cost

of care by about $28 over 16 weeks.31

7.2 Promotion

Promotion of particular products can influence physician decisions and alter the effi-

ciency of the depression care provided to patients. The goal of the following exercises

is not to identify the effect of any particular channel of manufacturer advertising, since

I lack information at the individual level on the types of marketing tools employed.

Instead, I study how changing the physician’s prior information set interacts with her

learning process. If manufacturer advertising skews prescribing toward a less effective

drug initially, generating poor adherence over time, policymakers might suitably ban

such ads. On the other hand, if promotion—even of inferior products—leads more pa-

tients to begin treatment and realize a positive outcome on drug care, advertising may

31The projected health gains reflect only a lower bound. If the policy encourages physicians to usedrugs that inspire lower rates of relapse and decreased hospitalizations, the total gains in health maybe higher.

34

Page 36: E cient Provision of Experience Goods: Evidence from

Table 9: Patient health vs. drug costs, under counterfactual scenarios.

!"#$%&'()(%*+"%*,)-+.)%

/-.%01

2,3#+%'4%#)-3-)5%$,-.%

4"'6%(56*)'6,)-&%

"+&'7+"5%/-.%01

8,(+3-.+ !"#$%%&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& !'()#$%%&&&&&&&&&&&&&&&&&&&&&&&&&&&&

9:,.$+%-.%;%6'.):%<"#$%

&'()(%*+"%*,)-+.)%/-.%01

9:,.$+%-.%(56*)'6,)-&%

"+&'7+"5%/-.%01

=33%&'*,56+.)(%(+)%)'%0> *#$*"&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& !!)$")&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&2,3#+?@,(+<%<+(-$.%/AABCDAEBC%*"+4+""+<1 +,#$#*-&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& (($)!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&

9:,.$+%-.%;%6'.):%<"#$%

&'()(%*+"%*,)-+.)%/-.%01

9:,.$+%-.%(56*)'6,)-&%

"+&'7+"5%/-.%01

8,.%@",.<+<%,<7+")-(-.$ +%%$.,-&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& +%($))-&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&=<7+")-(+%:+,7-35%4'"%F'3'4) %)$)!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& *,$.(&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&=<7+")-(+%:+,7-35%4'"%956@,3), "$(!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& ,%$,%&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&

/001234&50&67585395:;<&65<92914

/001234&50&=5>;?81:3&65<92914

E')+(GH

IJ%K+"%*,)-+.)%6'.):35%<"#$%&'()(%+L#,3%):+%,7+",$+%"+&'"<+<%-.(#"+"%&'()%4'"%IM%N++O(%'4%

)"+,)6+.)%3+((%):+%*,)-+.)P(%<"#$%&'*,5Q%4'"%):+%(+)%'4%*,)-+.)(%'.%,&)-7+%)"+,)6+.)J%%H

RJ%S:+%<'33,"%7,3#+%'4%):+%#)-3-)5%$,-.%4"'6%(56*)'6,)-&%"+3-+4%-(%&,3&#3,)+<%7-,%):+%*"'&+<#"+%

-.%):+%S+&:.-&,3%=**+.<-TJ%%S:+%&:,.$+%-.%(56*)'6,)-&%+U+&)%@5%*'3-&5%()+6(%4"'6%):+%

*'3-&5P(%+U+&)%'.%@'):%):+%-.-)-,)-'.%'4%)"+,)6+.)%,.<%,<:+"+.&+J%%S:-(%&:,.$+(%):+%<-()"-@#)-'.%

'4%)"+,)6+.)(%#(+<%,.<%):+"+4'"+%):+%*"'@,@-3-)5%'4%(56*)'6,)-&%"+3-+4%'7+"%)-6+J%%S:+%7,3#+%-(%

4'"%,%IM%N++O%-.)+"7,3JH

VJ%2,3#+(%&,3&#3,)+<%4'"%I>Q>>>%(,6*3+%*,)-+.)(%,)%-.<-7-<#,3?(*+&-W&%*"-&+(JH

H

improve health.

To test the role of promotion, I undertake three examinations. In each, I proxy for

promotion by adjusting drug-specific effects, which are correlated with average adver-

tising expenditure (See Table 5).32 First, I mimic a scenario under which public policy

bans branded advertising by lowering the expected mean outcome for branded medica-

tions. Second, I mimic promotion of Zoloft (sertraline), a branded SSRI with generally

high tolerability and efficacy but also a high cost. I increase Zoloft’s expected mean

outcome threefold, to proxy for a percentage increase in advertising to match the drug

with the largest observed monthly advertising in 2003. Third, I increase promotion for

Cymbalta (duloxetine), a newer SNRI treatment with above average treatment effects

32The existing level of advertising may be endogenous: if advertising is less effective at encouragingthe use of a treatment, manufacturers likely reduce promotion. I do not model supply-side decisionshere, but merely examine how changing the physician’s information set alters the learning process.

35

Page 37: E cient Provision of Experience Goods: Evidence from

but lower average tolerability. I increase Cymbalta’s expected mean effect by the same

percentage as for Zoloft.

Banning branded promotion leads to a 2.2% decrease in the use of drug care, either

because physicians write fewer prescriptions or patients decide not to fill a prescription

in hand. The main effect, however, is to shift the distribution of prescriptions. Generic

SSRIs gain 27% in share, with generic NDRIs gaining 5% in share in the first period.

The change in distribution leads to cost improvement of around 22-25%. Adherence

falls both because the policy fails to increase initiation and incentivizes all generics,

even those with poorer properties. Under the policy, the value of health decreases by

$40 while the cost savings per patient equal only $34.

Increasing Zoloft advertising produces a majority share for Zoloft of 62% from a

baseline level of 16%. After three periods, that share remains around 45%. Adherence

increases by 5.9% into period 3, while costs rise by 23-28%. About 4/5 of this cost in-

crease comes from the greater use of Zoloft, which carries a higher manufacturer price

(before rebates) than generics; the remaining one-fifth is due to improved adherence.

While the increase in adherence shows the value of advertising for effective products,

the cost/health trade-off compares unfavorably with that of “value-based” copayment

design: health, in dollars, increases by $83, but at an added monthly cost of $36. In

addition, if the insurer were to mimic the manufacturer’s advertising to increase adher-

ence, it would incur promotional expense. To provide some comparison, in 2003, the

manufacturer of Zoloft spent $8.75 million a month on detailing to clinicians, delivering

1 million sample packages to physicians’ offices.33

In the final examination, the increase in Cymbalta’s advertising boosts its share

from a baseline of 1.5% to 12% in period 1. Adherence improves only 1.5% over the

baseline case, possibly due to Cymbalta’s higher rate of side effects. Costs rise between

2 and 9%, leading to an added average cost over 16 weeks equal to $5. This is smaller

than the modest value of the gain in health, worth $23.

7.3 Discussion of Policies

The simulations highlight two key features of decisions in this market. First, after ac-

counting for learning, the largest gains in patient outcomes occur when policies promote

treatments with above-average efficacy and tolerability. Costs improve when the policies

use either advertisements or lower prices to encourage the cheapest products within the

33Personal Selling Audit/Hospital Personal Selling Audit. Verispan, Inc., 2003.

36

Page 38: E cient Provision of Experience Goods: Evidence from

set of effective options. This is precisely the approach of value-based insurance design,

on the price dimension, and “academic detailing”, as described by Soumerai and Avorn

(1990), on the promotion dimension. However, the policies differ in costs. Given in-

formation technology, the creation of multiple tiers for pricing involves little additional

expense. Advertising, however, is costly. A small-scale academic detailing program in

Pennsylvania in 2006 required $1 million a year in funding.34

Second, manufacturer promotion may be beneficial as a matchmaker even if its

purpose seems more to persuade than inform (See Anand and Shachar (2011)). The

results from the promotion of Zoloft and Cymbalta suggest that while promoting an

effective product boosts adherence and health more, promoting a lesser option can

improve health if it nonetheless encourages experimentation. Consumers then learn

about their match to specific characteristics of the product, find better options, and

adhere at higher rates.

7.4 New Protocol

Finally, using the estimated rates of adherence by drug treatment from the baseline

model, I build a new treatment protocol. Rather than employ price and promotion

incentives to encourage physicians to prescribe the most cost-effective products, one

can instead present information on adherence directly to physicians.

Past theoretical and empirical work supports the notion that adherence or attrition

rates can provide valuable insight on effectiveness. Philipson and DeSimone (1997),

for example, argue that randomization and blinding in clinical trials may not actually

produce unbiased treatment effects when trial subjects are Bayesian and actively learn

about the effectiveness of the experimental drug. Attrition rates, in contrast, summarize

the effect of all unobserved aspects of a treatment that make it undesirable. Chan and

Hamilton (2006) judge the value of attrition rates empirically using data from a trial

of HIV treatments. I extend this notion to adherence rates in insurance claims data.

Since physicians in practice expend considerably less effort to ensure patient adherence

relative to the managers of clinical trials, the observed rates of quitting treatment

provide an externally valid measure of a drug’s effectiveness.

In Table 10, I report the predicted rates of adherence by drug compound along with

the recommendations for antidepressant treatment from three published protocols. In

34Scott Hensley, “Negative Advertising: As Drug Bill Soars, Some Doctors Get An ‘Unsales’ Pitch,”The Wall Street Journal, March 13, 2006, sec. A, p.1.

37

Page 39: E cient Provision of Experience Goods: Evidence from

Tab

le10

:T

reat

men

tre

com

men

dat

ions

from

publish

edpro

toco

lsvs.

the

dynam

icm

odel

’sre

com

-m

endat

ions.

X’s

wit

hin

aro

win

dic

ate

the

pro

toco

lre

com

men

ds

the

dru

gco

mp

ound.

!"#

!$#

!%#

!&#

'()*+*,-.

/,*-0.$

'()*+*,-.

/,*-0.%

1(23+.4(5

*)30*,-.

678,9*0:

;./9,<()0=.

>(?,

90.@,

9.43<,9.

'(?

9(++*,-.!"AAB#

6/6

.CD*5(

7*-(+=.

E(),-5

.F5*0*,

-.!$GGG=$GGH#

I,;

?3930*J

(.(@@()0*J(-(++.9(J*(K

L.6M>N

.!$GGO#

'P-

3;*).4

,5(7.

/9(5

*)0*,

-+I(7(23.!I*03

7,?93;

#EE>Q

$GR%

%%RG

SS

ST(

23?9,.!F+)*03

7,?93;

#EE>Q

$"RH

%&RH

SS

S/3

2*7.!/3

9,2(0*-

(#EE>Q

"HR"

$%RO

SS

SS

/9,U3).!V

7D,2(0*-(#

EE>Q

"ARA

%$RH

SS

SS

W,7,@0.!E(90937*-(#

EE>Q

$"R&

%&R&

SS

SIP;

X3703

.!'D7,2(0*-(#

EY>Q

"ZR%

$HRG

SS

SF@@(2,9.![(

-73@32*-(#

EY>Q

"ZR%

$&RA

SS

SS

\(77XD09*-.!]D?

9,?*,-

#Y'>Q

$"RB

%HR"

SS

SY,909*?0P7*-

(1I

6$&RA

%BRH

S4*903U3?*-(

Y3EE6

$%RO

%OR"

SY(@3U,5

,-(

E4"OR%

$ZR"

SS

193U,5

,-(

E4$HRB

%ARH

I73++

4,7()D7(

^./30*(-0+._D

*00*-8.

;(5

*)30*,-=.;

,5(7

Y,0(+`a

"R./(9)(-038(.,@.?30*(-0+.K:,

._D*0.0:(*9.;

(5*)30*,-+.30.5

()*+*,-.?,

*-0+.$.3-5

.%.?9(5*)0(5

.D+*-8

.0:(.(2?(

9*(-)(.,@."

G=GGG.+3;?7(.?3

0*(-0+.

3??7*(5.383*-+0.0:(.5P-

3;*).;

,5(7.(+0*;

30(+R..1:

(.9(),;;(-53

0*,-+.XD*70.@9,;

.35:

(9(-)(.?3b

(9-+.5*c(9.J(9P.7*b

7(.K

:(-.D+*-8.,X

+(9J(5

.),?3

P;(-0+.,9.K

:(-.+(b*-8

.377.),?3

P;(-0+.(_D

37.0,

.dGRa

$R.E,D

9)(+`a

!"#.I

9*+;

,-=.4

R.TR=.(0.37R.!"AAA#R.1

:(.1(23+.;

(5*)30*,-.378,

9*0:;.?9,<()0`.>

(?,90.,

@.0:(.1(23+.),-

+(-+D+.),-

@(9(-)(.?3

-(7.,

-.;(5

*)30*,-.

09(30;

(-0.,

@.;3<,9.5(?

9(++*J(.5*+,95(9R.e.I7*-

./+P):*309P=.ZG=."&$f"HZRa

!$#.g

393+D=.1R.]

R=.(0.37R.!$GGG#R./

93)0*)(.8D

*5(7*-(.@,9.0:(.09(30;

(-0.,

@.?30*(-0+.K*0:

.;3<,9.5(?

9(++*J(.5*+,95(9`.E(),-

5.(5

*0*,-

R.697*-80,-

=.[6`.

6;(9*)3-

./+P):*309*).6++,)*30*,

-R..6

7+,=.V,):0;3-

-=.TR.eR=.h.C(7(-X(98=.6

R.eR.!$GGH#R.CD*5(

7*-(.K30):`./

93)0*)(.8D

*5(7*-(.@,9.0:(.09(30;

(-0.,

@.?3

0*(-0+.K*0:

.;3<,9.5(?

9(++*J(.5*+,95(9=.$-5

.(5*0*,

-R.697*-80,-

=.[6`.6

;(9*)3-

./+P):*309*).6++,)*30*,

-Ra

!%#.C

3907(

:-(9=.C

R=.(0.37R.!$GGO#R.I

,;?3

930*J

(.(c

()0*J

(-(++.,@.+(),-

5f8(-(930*,

-.3-

0*5(?

9(++3-

0+.*-

.0:(.?:

39;3),7,8

*).09(30;

(-0.,

@.35D

70.5(

?9(++*,-

R.68(-)P.@,9.M

(370:

)39(.>(+(39):.3-

5.ND3

7*0P=.iE.'(?

0.,@.M

(370:

.3-5

.MD;

3-.E(9J*)(+Ra

!

38

Page 40: E cient Provision of Experience Goods: Evidence from

the final column, I include my own protocol built from the dynamic model’s predic-

tions.35 I use a quit rate of 20% at the second decision point as an upper threshold for

recommending a product. There are two main distinctions between the model-based

protocol and existing recommendations. First, the new protocol de-emphasizes NDRIs

and TCAs. Although clinical trials generally show little difference in efficacy between

subclasses, these two classes prompt high rates of switching in the panel data. Second,

while no existing protocol differentiates among available SSRIs, the model suggests us-

ing off-patent compounds fluoxetine (Prozac) and paroxetine (Paxil) over alternatives.

This preference arises both from the lower cost of these drugs and from greater tolera-

bility. The comparative effectiveness review by Gartlehner et al. (2007) provides some

support for the model’s recommendation. Sertraline (Zoloft) inspired slightly greater

gastrointestinal side effects than comparable SSRIs; the model predicts it having rela-

tively weaker adherence.

The results suggest observed attrition rates can enhance existing protocols. Crismon

et al. (1999), in their algorithm design project, argued for using such analyses, but found

few credible studies to incorporate. The methodology developed here can help fill this

evidence gap cheaply.

8 Conclusion

I specify a model of learning that permits patients and physicians to search within

a large set of potentially correlated experience goods. To estimate the parameters of

this learning process, I depart from standard dynamic programming solutions due to

computational limitations. Employing a forward-induction rule, I design an estimation

framework that accommodates the large choice set of the antidepressant setting.

With this framework, I measure the importance of a key trade-off in incentive de-

sign: policies which promote cheaper but less effective options may lead to decreased

adherence. When patients quit treatment early, they suffer from depressive symptoms

for longer periods. Patient health declines and the insurer’s long-run costs may in-

crease. Managing this trade-off requires “value-based” copayment schemes or academic

detailing programs that emphasize treatment regimens that balance high efficacy with

tolerability.

35As a sensitivity, I predict adherence in the setting where all copayments equal $0. The productrecommendations built from these rates are similar to those in the main analysis.

39

Page 41: E cient Provision of Experience Goods: Evidence from

References

Ackerberg, D. A. (2003): “Advertising, learning, and consumer choice in experience

good markets: an empirical examination,” International Economic Review, 44, 1007–

1040. [1]

Aguirregabiria, V. and P. Mira (2010): “Dynamic discrete choice structural mod-

els: A survey,” Journal of Econometrics, 156, 38–67. [2]

Akincigil, A., J. R. Bowblis, C. Levin, J. T. Walkup, S. Jan, and S. Crys-

tal (2007): “Adherence to antidepressant treatment among privately insured pa-

tients diagnosed with depression,” Medical Care, 45, 363. [5]

Anand, B. N. and R. Shachar (2011): “Advertising, the matchmaker,” The RAND

Journal of Economics, 42, 205–245. [37]

Bajari, P., J. T. Fox, K. il Kim, and S. P. Ryan (2009): “The random coefficients

logit model is identified,” NBER working paper. [26]

Bergemann, D. and J. Valimaki (1996): “Learning and strategic pricing,” Econo-

metrica, 64, 1125–1149. [3]

Berndt, E. R., A. Bir, S. H. Busch, R. G. Frank, and S.-L. T. Normand

(2002): “The medical treatment of depression, 1991-1996: productive inefficiency,

expected outcome variations, and price indexes,” Journal of Health Economics, 21,

373–396. [1, 32, 49]

Berry, S., J. Levinsohn, and A. Pakes (1995): “Automobile Prices in Market

Equilibrium,” Econometrica, 63, 841–890. [2]

Berry, S. T. and P. A. Haile (2009): “Nonparametric identification of multinomial

choice demand models with heterogeneous consumers,” NBER Working Paper. [26]

Brezzi, M. and T. L. Lai (2002): “Optimal learning and experimentation in bandit

problems,” Journal of Economic Dynamics and Control, 27, 87–108. [20, 46]

Chan, T. Y. and B. H. Hamilton (2006): “Learning, private information, and the

economic evaluation of randomized experiments,” Journal of Political Economy, 114,

997–1040. [4, 37]

40

Page 42: E cient Provision of Experience Goods: Evidence from

Chang, F. and T. L. Lai (1987): “Optimal stopping and dynamic allocation,”

Advances in Applied Probability, 19, 829–853. [23, 45, 47]

Charlson, M. E., P. Pompei, K. L. Ales, and C. R. Mackenzie (1987): “A new

method of classifying prognostic comorbidity in longitudinal studies: Development

and validation* 1,” Journal of Chronic Diseases, 40, 373–383. [18]

Chernew, M. E., A. B. Rosen, and A. M. Fendrick (2007): “Value-based

insurance design,” Health Affairs, 26, 195. [31, 32]

Ching, A. T. (2010): “Consumer learning and heterogeneity: Dynamics of demand

for prescription drugs after patent expiration,” International Journal of Industrial

Organization. [3]

Crawford, G. S. and M. Shum (2005): “Uncertainty and Learning in Pharmaceu-

tical Demand,” Econometrica, 73, 1137–1173. [1, 3, 11, 22, 26]

Crismon, M. L., M. Trivedi, T. A. Pigott, A. J. Rush, R. M. A. Hirschfeld,

D. A. Kahn, C. DeBattista, J. C. Nelson, A. A. Nierenberg, H. A. Sack-

eim, and M. E. Thase (1999): “The Texas Medication Algorithm Project: report

of the Texas consensus conference panel on medication treatment of major depressive

disorder,” Journal of Clinical Psychiatry, 60, 142–156. [39]

Cutler, D. M. (2004): Your money or your life, Oxford University Press. [32, 49]

Dickstein, M. J. (2011): “Physician vs. patient incentives in prescription drug

choice,” . [16]

Efron, B. and R. J. Tibshirani (1993): An introduction to the bootstrap, Chapman

and Hall. [48]

Erdem, T. and M. P. Keane (1996): “Decision-making under uncertainty: Captur-

ing dynamic brand choice processes in turbulent consumer goods markets,” Marketing

Science, 15, 1–20. [1]

Fochtmann, L. J. and A. J. Gelenberg (2005): Guideline Watch: Practice Guide-

line for the Treatment of Patients with Major Depressive Disorder, 2nd Edition, Ar-

lington, VA: American Psychiatric Association. [-]

41

Page 43: E cient Provision of Experience Goods: Evidence from

Fryback, D. G., E. J. Dasbach, R. Klein, B. E. K. Klein, N. Dorn, K. Pe-

terson, and P. A. Martin (1993): “The Beaver Dam Health Outcomes Study:

initial catalog of health-state quality factors,” Medical Decision Making, 13, 89. [1]

Gartlehner, G., R. A. Hansen, P. Thieda, A. M. DeVeaugh-Geiss, B. N.

Gaynes, E. E. Krebs, L. J. Lux, L. C. Morgan, J. A. Shumate, and L. G.

Monroe (2007): Comparative effectiveness of second-generation antidepressants in

the pharmacologic treatment of adult depression, Agency for Healthcare Research and

Quality (US). [6, 8, 39]

Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (2004): Bayesian Data

Analysis: Texts in Statistical Science, Boca Raton, FL: Chapman and Hall/CRC. [20]

Gemmill, M. C., J. Costa-Font, and A. McGuire (2007): “In search of a cor-

rected prescription drug elasticity estimate: a meta-regression approach,” Health

Economics, 16, 627–643. [14]

Ginebra, J. and M. K. Clayton (1995): “Response surface bandits,” Journal of

the Royal Statistical Society.Series B (Methodological), 771–784. [24]

Gittins, J. C. and D. M. Jones (1979): “A dynamic allocation index for the

discounted multiarmed bandit problem,” Biometrika, 66, 561–565. [3, 22, 45]

Hellerstein, J. K. (1998): “The Importance of the Physician in the Generic versus

Trade-Name Prescription Decision,” The RAND Journal of Economics, 29, 108–136.

[2, 11]

Hotz, V. J. and R. A. Miller (1993): “Conditional choice probabilities and the

estimation of dynamic models,” The Review of Economic Studies, 60, 497. [21]

Judd, K. L. (1998): Numerical Methods in Economics, The MIT Press. [26]

Karasu, T. B., A. Gelenberg, A. Merriam, and P. Wang (2000): Practice

Guideline for the Treatment of Patients With Major Depressive Disorder: Second

Edition, Arlington, VA: American Psychiatric Association. [-]

Keane, M. P. and K. I. Wolpin (1994): “The Solution and Estimation of Discrete

Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo

Evidence,” The review of economics and statistics, 76, pp. 648–672. [3, 22]

42

Page 44: E cient Provision of Experience Goods: Evidence from

Kessler, R. C., P. Berglund, O. Demler, R. Jin, D. Koretz, K. R.

Merikangas, A. J. Rush, E. E. Walters, and P. S. Wang (2003): “The

epidemiology of major depressive disorder: results from the National Comorbidity

Survey Replication (NCS-R),” JAMA, 289, 3095. [5]

Lave, J. R., R. G. Frank, H. C. Schulberg, and M. S. Kamlet (1998): “Cost-

effectiveness of treatments for major depression in primary care practice,” Archives

of General Psychiatry, 55, 645. [32, 49]

Levy, R. (1999): The pharmaceutical industry: a discussion of competitive and

antitrust issues in an environment of change, Washington, DC: Bureau of Eco-

nomics/Federal Trade Commission staff report. [33]

Mahajan, A. and D. Teneketzis (2007): Multi-armed bandit problems, Foundations

and Applications of Sensor Management, Springer. [23]

McFadden, D. and K. Train (2000): “Mixed MNL models for discrete response,”

Journal of Applied Econometrics, 15, 447–470. [14]

Melfi, C. A., A. J. Chawla, T. W. Croghan, M. P. Hanna, S. Kennedy, and

K. Sredl (1998): “The effects of adherence to antidepressant treatment guidelines

on relapse and recurrence of depression,” Archives of General Psychiatry, 55, 1128.

[1, 5]

Miller, R. A. (1984): “Job matching and occupational choice,” The Journal of Po-

litical Economy, 92, 1086–1120. [3]

Murphy, K. M. and R. H. Topel (2002): “Estimation and inference in two-step

econometric models,” Journal of Business and Economic Statistics, 20, 88–97. [47]

Murphy, M. J., R. L. Cowan, and L. I. Sederer (2009): Blueprints: Psychiatry,

Philadelphia, PA: Lippincott Williams Wilkins, 5 ed. [5]

Newey, W. K. (1984): “A method of moments interpretation of sequential estima-

tors,” Economics Letters, 14, 201–206. [47]

Philipson, T. and J. DeSimone (1997): “Experiments and subject sampling,”

Biometrika, 84, 619. [4, 37]

43

Page 45: E cient Provision of Experience Goods: Evidence from

Pomerantz, J. M., S. N. Finkelstein, E. R. Berndt, A. W. Poret, L. E.

Walker, R. C. Alber, V. Kadiyam, M. Das, D. T. Boss, and T. H. Ebert

(2004): “Prescriber intent, off-label usage, and early discontinuation of antidepres-

sants: a retrospective physician survey and data analysis,” Journal of Clinical Psy-

chiatry, 65, 395–404. [5]

Rusmevichientong, P. and J. N. Tsitsiklis (2010): “Linearly parameterized ban-

dits,” Mathematics of Operations Research, 35, 395. [23]

Rust, J. (1987): “Optimal replacement of GMC bus engines: An empirical model of

Harold Zurcher,” Econometrica: Journal of the Econometric Society, 999–1033. [21,

22]

Soumerai, S. B. and J. Avorn (1990): “Principles of educational outreach (’aca-

demic detailing’) to improve clinical decision making,” JAMA, 263, 549. [37]

Train, K. (2003): Discrete Choice Methods with Simulation, Cambridge Univ Press.

[25]

Whittle, P. (1980): “Multi-armed bandits and the Gittins index,” Journal of the

Royal Statistical Society.Series B (Methodological), 42, 143–149. [45, 46]

——— (1981): “Arm-acquiring bandits,” The Annals of Probability, 284–292. [23]

Zellner, A. (1971): An Introduction to Bayesian Inference in Econometrics, J. Wiley.

[20]

44

Page 46: E cient Provision of Experience Goods: Evidence from

9 Technical Appendix

9.1 Gittins’ Index rule

I consider a market with J options available. The outcome under j, Yj, is a draw from

a univariate distribution, f(y; θj), where θj is an unknown parameter vector. Gittins

and Jones (1979) and the broader literature on “multi-armed bandits” sought to solve

the following basic but non-trivial question: how should a decision-maker select from

among the j choices in each period t to maximize the expected discounted sum of his

payoffs:

Eθ1,...,θJ

∞∑t=1

δt−1Yt

for a given δ. Here, (θ1, ..., θJ) are unknown. Agents form independent priors, Π(j), on

θj for j = 1, ..., J . The optimal allocation rule maximizes:

∫...

∫Eθ1,...,θJ

(∞∑t=1

δt−1Yt

)dΠ(1)(θ1) · · · dΠ(J)(θJ)

One can solve this sequence problem using a dynamic programming approach. Git-

tins and Jones (1979) and Whittle (1980) show that in the case in which agents form

independent priors on (θ1, ..., θJ) and several other conditions hold, there is a simpler

solution. The agent can apply a forward induction rule or “index” rule. He computes J

one-dimensional optimal stopping problems to determine the value of J indices in each

period t. The agent then chooses the option with the largest index. I describe briefly the

form of this index rule; the original contributions cited above provide detailed proofs.

In every period, the agent selects one of J choices and realizes an outcome Yjt. Let

ψ(t) represent the action the agent takes at t.36 I denote the number of times option j

has been selected up to and including period t as:

nj(t) =t∑

s=1

1{ψ(t) = j}

I previously defined Π(j) as the agent’s prior beliefs on the distribution of θj. The

posterior distribution at time t for j after nj(t) outcome realizations under j is Π(j)nj(t)

.

36From Chang and Lai (1987), the allocation rule, ψ(t) can be specified by a sequence of randomvariables, ψ(1), ψ(2), ..., for periods t = 1, 2, ... such that the event ψ(t+ 1) = j for j = 1, ..., J belongsto the σ-field generated by the past observed sequence ψ(1), Y1, ..., ψ(t), Yt.

45

Page 47: E cient Provision of Experience Goods: Evidence from

That is, the agent observes (Yj,1, ..., Yj,nj(t)) for the nj(t) periods when he selects j and

updates his prior beliefs on the distribution of θj from Π(j) to Π(j)nj(t)

.

Gittins’ Index is a function of the prior distribution of each independent option, j:

G(Π(j)) = supτ

{∫Eθj

(∑τ−1s=0 δ

sYj,s+1

)dΠ(j)(θj)∫

Eθj(∑τ−1

s=0 δs)dΠ(j)(θj)

}

where the supremum is over all stopping times τ = 1 defined on {Yj,1, Yj,2, ...}. After

nj(t) realizations under choice j, the Gittins’ Index for j becomes G(Π(j)nj(t)

). The agent

need only calculate {G(Π(1)n1(t)

), ..., G(Π(J)nJ (t)

)} at t. He then chooses the option with the

maximal G(.).

An alternative way to express Gittins’ Index given Π(j)=Π(j)0 is as the infimum of

the set of solutions M of the equation:

supτ

∫Eθj

{τ−1∑s=0

δs∫µ(θj)dΠ(j)

s (θj) +M∞∑s=τ

δs

}dΠ(j)(θj) = M

∞∑s=0

δs

where µ(θj) = Eθj(Yj), the mean outcome under j conditional on θ.

Intuitively, the agent solves an optimal stopping problem in which he decides be-

tween the choice j and a standard project that yields a constant reward, M (See Whittle

(1980) and Brezzi and Lai (2002)). This ‘retirement’ value is specific to each option.

The agent maximizes the expected discounted value of playing choice j for the optimal

number of periods τ given the posterior distribution Π(j)nj(t)

. In the remaining time from

τ forward, he receives M . The infimum of the set of solutions for M is equivalent to

the Gittins’ Index, G(Π(j)nj(t)

) given a particular posterior distribution.

In the main analysis, outcomes take the following form:

Yj,t = µj +Wj,t, where Wj,t v N(0, σj)

Again, agents do not know θj for j = 1, ..., J at the time of the decision. They form

a prior distribution over θ, where θj = (µj, σj). I reduce the dimensionality of θ =

(θj, ..., θJ) by setting µj = z′jβ. Then, θ becomes θ = (β1, ..., βk, σ1, ..., σk). Agents form

priors Π(r) for the r = 1, ..., k elements in θ.

46

Page 48: E cient Provision of Experience Goods: Evidence from

9.2 Diffusion Approximation to the Index Rule

To compute the index rule, I rely on a numerical solution provided by Chang and Lai

(1987). The authors numerically solve the optimal stopping problem underlying the

index rule by using a change of variables to reframe the problem as a Brownian motion.

Chang and Lai (1987) carry out Monte Carlo simulations to demonstrate the robustness

of this numerical solution.

The closed form approximation to the optimal stopping boundary in the Brownian

motion problem takes the following form in the finite horizon case:

h(s) =

{2 log(s−1)− log(log(s−1))− 1og(16π) + ...

.99 exp(−.038s−1/2)}1/2 if 0 < s ≤ .01

−1.58√s+ 1.53 + 0.07s−1/2 if .01 < s < .28

−.576s3/2 + .299s1/2 + .403s−1/2 if .28 < s ≤ .86

s−1(1− s)1/2{.639− .403(t−1 − 1)} if .86 < s ≤ 1

where s = t/T , with T the finite horizon and t the point in the episode along the

sequence to T .

In the discounted infinite horizon case, Chang and Lai (1987) replace s with (1−β)−1

and replace the closed-form h(s) with ψ(s):

ψ(s) =

√s/2

.49− .11s−1/2

.63− .26s−1/2

.77− .58s−1/2

{2 log(s)− log(log(s))− log(16π)}1/2

if s ≤ 0.2

if 0.2 < s ≤ 1

if 1 < s ≤ 5

if 5 < s ≤ 15

if s > 15

9.3 Procedure for Second Stage Standard Errors

I design a procedure to approximate the 2-step maximum likelihood asymptotic variance-

covariance matrix derived in Newey (1984) and Murphy and Topel (2002). The two-step

estimator satisfies the following conditions on the log-likelihood of the outcomes, Y :

∑i

∂L1(Y1; θ1)

∂θ1= 0,

∑i

∂L2(Y2; θ1, θ2)

∂θ2= 0

In cases where θ1 is a consistent estimate of θ1, if the true parameters are (θ∗1, θ∗2), the

second stage maximization of 1n

∑L2(Y2; θ1, θ2) with respect to θ2 is equivalent to the

47

Page 49: E cient Provision of Experience Goods: Evidence from

maximization of 1n

∑L2(Y2; θ

∗1, θ2). Using expansions around (θ∗1, θ

∗2) and relying on the

Central Limit Theorem to find the joint distribution of the the partial derivatives of the

likelihood functions, one can derive the variance of the limiting normal distribution for

θ2. It is a complex function of the partial derivatives of the likelihoods; these derivatives

are difficult to compute in my setting. I approximate this solution using a Bootstrap

resampling procedure (See Efron and Tibshirani (1993)). Simulation here is particular

straightforward given that I have in storage draws from the posterior distribution of

the mixed logit parameters, a consequence of my Bayesian MCMC procedure from the

initial stage. I carry out the following steps:

1. Save the estimated hyperparameters of the βi normal distribution in the matrix

θ, where θ = (E(βi), var(βi)). From the Bayesian implementation of the mixed

logit estimation, I will have S draws on θ saved, along with S draws on the fixed

coefficient vector, α. I set S = 100.

2. For each of the ss = 1, ..., S vectors of parameter draws of (θ, α), take a draw

of β|di1, Xi, Zi, θ, α from its conditional posterior distribution using a Metropolis-

Hastings approach. Label it βi. Collect this for each individual in the sample,

conditioning on individual i’s initial decision, di1. I set a sample size of 3000

drawn from the full sample of patients using a multinomial distribution.

3. With (βi, α) for draw ss on θ, maximize the 2nd stage likelihood for the sample

of 3000 patients.

4. Repeat steps 1-3 for Ns samples drawn from the full sample of patients using a

multinomial distribution. I set Ns=20.

5. Save the vector of model variances that the optimization finds for each of ss = 1 :

S and nn = 1 : Ns. The vector of model variances accounts for sampling variation

in the initial stage mean parameters and in βi around its mean. The standard

error across these points approximates the standard error of the 2nd stage model

estimates.

9.4 Determining the Dollar Value of Symptomatic Relief

To assign a dollar value to the relief of symptoms that prescription drug treatments

provide, I carry out the following procedure.

48

Page 50: E cient Provision of Experience Goods: Evidence from

First, I save the predicted sequence of treatments for 10,000 patients from the sim-

ulation of the baseline policy and five counterfactual policies. I divide the sample of

patients in each counterfactual into 5 categories based on the treatments used within

their episode: (1) no drug care; (2) use of a TCA for one month or less, followed by exit

from drug care; (3) use of a TCA for greater than one month; (4) use of any non-TCA

or second generation drug class for one month or less; and, (5) use of any non-TCA

drug class for greater than 1 month. I choose these divisions to match a subset of the

treatment categories studied by Berndt et al. (2002). They employ a panel of med-

ical experts to rate the likelihood of both full and partial recovery given a patient’s

background and a treatment regime and duration. For example, in Table 1 of Berndt

et al. (2002), treatment under an SSRI with 1-3 office visits produces a probability of

full remission of .20, and a probability of partial remission of .45 over 16 weeks. If the

SSRI treatment lasts for greater than 30 days, those probabilities increase to .28 and

.60, respectively.

Given the probabilities of full and partial remission over 16 weeks for each of the 5

categories above, I calculate the expected number of weeks out of 16 that an individual

suffers either the full symptoms of depression or partial symptoms. I follow a similar

procedure to that of Cutler (2004). Following the medical literature, I set the recovery

rate at 0% in the first two weeks of treatment, before an antidepressant takes full

effect. In every week for the next 16 weeks, I set a fixed rate of full and partial recovery

for those who have yet to recover such that the share of full and partial recovery

patients at the end of the 16 week period equals the median rate predicted by the

expert panel in Berndt et al. (2002) for the category of treatment. I then use these

per-week probabilities to calculate the expected number of weeks without full or partial

recovery. In the case without drug care, the patient suffers from depression for 11.9 out

of the 16 weeks, with partial recovery in an additional 2.8 weeks. The expected weeks

of depression and of partial depression for drug treatment differ a bit by the type and

duration care: with less than one month of TCAs or SSRIs, the expected number of

weeks equals approximately 10.3 and 3.9; for longer duration on SSRIs, the expected

number of weeks of full and partial depression change to 7.4 and 5.9, respectively.

Given the expected duration of complete or partial depressive symptoms, I multiply

the savings in “depression weeks” against the quality disutility from depression, esti-

mated in Lave et al. (1998) to equal -.41 for full depression. That is, patients equate 10

years living with depression to roughly 6 years living without depression. As in Cutler

(2004), I set the disutility from partial depression at half the full depression rate. Multi-

49

Page 51: E cient Provision of Experience Goods: Evidence from

plying this savings in utility against $100,000 as the value of a year of life, I find the per

patient dollar value of treatment in each of the four treatment categories, normalizing

by the dollar value of no treatment. I multiply this by the share of patients in each

category in a particular counterfactual, summing across categories to get a total dollar

savings. Dividing by 10,000, I have the 16-week dollar savings of an average patient

under the counterfactual policy from decreased depressive symptoms. I report these

results in Table 9.

The counterfactual policies change the distribution of treatment types, driving the

observed differences in the dollar value gained. When a policy increases adherence and

promotes use of SSRIs, for example, patients shift toward treatment regimens with

higher recovery probabilities; the effect is to increase the expected number of weeks

with a relief of symptoms in the counterfactual relative to the baseline.

50