more predictive modeling of total healthcare costs using pharmacy claims data

This presentation contains confidential and proprietary information of Caremark and cannot be reproduced, distributed, or printed without written permission from Caremark.

More Predictive Modeling of Total Healthcare Costs UsingPharmacy Claims Data: Adherence Dimension and Boosted Regression

M. Christopher Roebuck &Joshua N. Liberman

American Society of Health EconomistsInaugural ConferenceMadison, Wisc.June 6, 2006

Caremark proprietary and confidential information. Not for distribution.

Predictive Modeling Forecasts health services utilization and costs of

insurance plan members Identifies candidates for disease and therapy

management interventions Infers disease state, severity and

statistical/econometric methods employed, depending on the classification system used

Includes well-known claims- and diagnosis-based “groupers,” such as the chronic disease score, adjusted clinical groups, diagnostic cost groups and episode risk groups

Study Background Pharmacy claims are less costly and contain fewer

coding errors than medical data Pharmacy health dimensions (PHD): pharmacy-

based risk index that categorizes a year of prescription data into 62 disease indicators

Previous study of PHD accuracy at predicting prospective total annual healthcare costs that used several econometric techniques to deal with skewness and kurtosis1

This study is an extension of that work

1 Powers, C.A., C.M. Meyer, M.C. Roebuck, and B. Vaziri. 2005. “Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data: A Comparison of Alternative Econometric Cost Modeling Techniques.” Medical Care 43(11): 1065-1072.

Study Objectives Examine relationship between plan participant

adherence to drug therapy and future total healthcare costs by augmenting PHD with an adherence dimension of predictors

Evaluate use of boosted regression modeling as an alternative to other econometric approaches for predicting (commonly) skewed and kurtotic healthcare cost data

About Adherence Adherence is defined as “the extent to which a person’s

behavior—taking medication, following a diet and/or executing lifestyle changes—corresponds with agreed recommendations from a healthcare provider.”1

“Poor adherence to the treatment of chronic diseases is a worldwide problem of striking magnitude. The impact of poor adherence grows as the burden of chronic disease grows.”1

Adherence to drug therapy is measured as both: Compliance: the extent to which a plan participant takes

medicine as prescribed (e.g., medication possession ratio). Persistence: the extent to which a plan participant follows

the prescribed length of therapy (e.g., length of continuous therapy in days).

Source: 1. World Health Organization 2003. “Adherence to Long-term Therapies – Evidence for action.”

About Boosted Regression Came out of computational learning, called “boosting”1 Expands into generalized linear models based on the

Gradient Boosting Machine2, 3 that recast the algorithm in a likelihood framework

Fits a regression tree to residuals from previously fitted regression tree (beginning first with a “guess” of the response variable)

Updates regression tree sequentially to include all previously estimated regression trees, until the following two parameters (specified a priori) are reached: Number of “splits” (or N-way interactions) Maximum number of iterations

Sources: 1. Freund, Y. and R. E. Schapire. 1997. “A decision-theoretic generalization of online learning and an application to boosting.” Journal of Computer and System Sciences 55(1): 119-139. 2. Friedman, J.H. 2001. “Greedy function approximation: a gradient boosting machine.” Annals of Statistics 29(5): 1189-1232.3. Friedman, J.H. 2002. “Stochastic gradient boosting.” Computational Statistics and Data Analysis 38(4): 367-378.

Data Utilized integrated medical and pharmacy claims

data from a large (N=369,985) U.S. health plan Studied 2003 and 2004 data, which allowed for a

baseline/follow-up design Included plan participants continuously eligible for

pharmacy benefits for the entire study period Allowed for no other exclusions or restrictions

(e.g., all ages and all claims remained in the study) Partitioned data randomly into 70% training and

30% validation samples

Methods Used five multivariate predictive models of total annual

healthcare costs (pharmacy and medical) in the follow-up year to estimate four conditions: Diabetes, congestive heart failure, hypercholesterolemia,

hypertension Included independent variables:

Continuous measure of baseline pharmacy costs 14 age/gender categories 62 PHD disease indicators Average co-pay per day supplied Percent mail service days supplied Four adherence dimension variables:

- Compliance and compliance2

- Days persistent- Number of different drugs

MethodsIncluded five econometric modeling techniques:

Ordinary least squares (OLS) Robust regression Two-part model-probit/OLS Two-part model-probit/GLM (gamma,log link) Boosted regression with STATA command syntax:

boost THC2004_T50 $RHS $OTH, influence distribution(normal) trainfraction(0.7) maxiter(1000) seed(1) bag(0.5) predict(HATS`m') interaction(3) shrink(0.01)

Results

Diabetes

(N=13,202)

Congestive Heart Failure (N=22,243)

Hyper- cholesterolemia

(N=33,597) Hypertension

(N=60,028) Total 2004 healthcare costs (medical + pharmacy)

Minimum Maximum Mean

$0 $1,719,645

$17,551

$0 $3,398,680

$16,127

$0 $3,398,680

$13,366

$0 $3,398,680

$13,612 Median $4640 $3673 $3726 $3204

Mean compliance 0.80 0.72 0.80 0.83 Mean days persistent 300 244 278 300 Mean number of different drugs 1.88 1.18 1.21 1.70

Descriptive statistics

ResultsAdherence dimension coefficient estimates from OLS model of prospective total annual healthcare costs (untruncated)

Diabetes Congestive

Heart Failure Hyper-

cholesterolemia Hypertension Days persistent -$6 -$14*** -$6* -$16*** Compliance $6686 $20,562** -$4390 $13,936*** Compliance2 -$9406 -$18,149*** $851 -$12,851*** Number of different drugs $370 $9328*** $177 $1199*** ***p<0.01; **p<0.05; *p<0.10

ResultsAdherence dimension coefficient estimates from OLS model of prospective total annual healthcare costs (truncated at $50,000)

Diabetes Congestive

cholesterolemia Hypertension Days persistent -2 -3*** -4*** -4*** Compliance 330 3882** 2234 3819*** Compliance2 -1361 -4015*** -2314* -3821*** Number of different drugs 356** 1920*** 585*** 658*** ***p<0.01; **p<0.05; *p<0.10

Results

Model Diabetes Congestive

cholesterolemia Hypertension OLS

20,559

19,547

15,223

16,383 Robust

15,131

14,464

11,354

11,836 Two-Part: Probit-OLS

16,371

15,591

13,915

14,730 Two-Part: Probit-GLM (gamma/log)

16,261

18,940

15,755

18,215 Boosted

17,322

19,844

15,777

16,800

Validation sample summary results from predictive models of prospective total annual healthcare costs (untruncated)

* Mean absolute prediction error

Results

Diabetes Congestive

cholesterolemia Hypertension OLS

.123 8816

.142 8086

.117 7455

.120 7496

Robust R2

.093 7450

.119 6815

.098 6266

.107 6118

Two-Part: Probit-OLS R2

.085 8122

.085 7484

.094 7126

.092 7246

Two-Part: Probit-GLM (gamma/log) R2

.065 8136

.011 8353

.004 7912

.006 8264

Boosted R2

.132 8769

.148 8061

.124 7453

.131 7467

Validation sample summary results from predictive models of prospective total annual healthcare costs(truncated at $50,000)

Conclusions Increased compliance was associated with a decrease

in next year’s total healthcare costs Each additional day of persistent therapy was

associated with a decrease of between $6 and $16 in next year’s total healthcare costs

The magnitude of this association varied, as expected, by disease state

The number of different drugs – filled within a given year and indicated for that disease state – increased next year’s total healthcare costs, likely signifying: Treatment resistance/failure Therapeutic aggressiveness/intensity Disease severity

Conclusions cont.

PHD provided classification and predictive power similar to other prescription-only risk-adjustment groupers

Robust regression, as expected, always returned the least mean absolute prediction error

While boosted regression did offer higher R2, overfitting was evident in untruncated models (Note: this study did not attempt to respecify the boosting parameters)

Unfortunately, currently available user-written command BOOST does not output the regression tree structure for application in other samples

Boosting is useful in uncovering important interaction terms

Limitations The potential endogeneity of the adherence

measures was not examined. Non-adherent behavior may not alter next year’s

total healthcare costs, but may affect future periods’ total healthcare costs.

The study sample was from a single, national health plan, and are therefore, not generalizable.

Need to consider other measures of accuracy (positive predictive value).

Need to tweak boosting specification to reduce overfitting.

more predictive modeling of total healthcare costs using pharmacy claims data

adherence adherence

pharmacy claims data

adherence dimension

kurtotic healthcare

health plan

data analysis384

pharmacy benefits

healthcare provider

Documents

supporting document schedules€¦ · allowed claims are...

maryland medicaid pharmacy programs claims processing...

table of contents€¦ · the auditor general shall conduct...

wisconsin medicaid pharmacy handbook, claims submission...

non-insured health benefits...

predictive analytics: an overview with an application to wc...

nhcaa conference installing predictive modeling claims...

nhcaa conference installing predictive modeling claims...

optimizing pharmacy claims management to increase … snp...

d.0 pharmacy claims processing manual · 2020-06-18 · d.0...

state of tennessee medicaid pharmacy claims submission...

deliver innovative insurance services through predictive...

big picture mutuality · the claims record is made...

athens predictive claims management - quebit€¦ · athens...

rpms pharmacy point of sale (absp) - indian health...

claims claimsclaims submissubmissubmissionsionsion ·...

d.0 pharmacy claims processing manual€¦ ·...

chubb 4d predictive analytics · predictive analytics...

pharmacy point-of-sale electronic claims … osop point of...

2020 · 2020-06-15 · pharmacy 34 formulary overview 34...