more predictive modeling of total healthcare costs using pharmacy claims data
Post on 10-Jun-2015
1.152 Views
Preview:
TRANSCRIPT
This presentation contains confidential and proprietary information of Caremark and cannot be reproduced, distributed, or printed without written permission from Caremark.
©2006 Caremark. All rights reserved.
More Predictive Modeling of Total Healthcare Costs UsingPharmacy Claims Data: Adherence Dimension and Boosted Regression
M. Christopher Roebuck &Joshua N. Liberman
American Society of Health EconomistsInaugural ConferenceMadison, Wisc.June 6, 2006
2
Caremark proprietary and confidential information. Not for distribution.
Predictive Modeling Forecasts health services utilization and costs of
insurance plan members Identifies candidates for disease and therapy
management interventions Infers disease state, severity and
statistical/econometric methods employed, depending on the classification system used
Includes well-known claims- and diagnosis-based “groupers,” such as the chronic disease score, adjusted clinical groups, diagnostic cost groups and episode risk groups
3
Caremark proprietary and confidential information. Not for distribution.
Study Background Pharmacy claims are less costly and contain fewer
coding errors than medical data Pharmacy health dimensions (PHD): pharmacy-
based risk index that categorizes a year of prescription data into 62 disease indicators
Previous study of PHD accuracy at predicting prospective total annual healthcare costs that used several econometric techniques to deal with skewness and kurtosis1
This study is an extension of that work
1 Powers, C.A., C.M. Meyer, M.C. Roebuck, and B. Vaziri. 2005. “Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data: A Comparison of Alternative Econometric Cost Modeling Techniques.” Medical Care 43(11): 1065-1072.
4
Caremark proprietary and confidential information. Not for distribution.
Study Objectives Examine relationship between plan participant
adherence to drug therapy and future total healthcare costs by augmenting PHD with an adherence dimension of predictors
Evaluate use of boosted regression modeling as an alternative to other econometric approaches for predicting (commonly) skewed and kurtotic healthcare cost data
5
Caremark proprietary and confidential information. Not for distribution.
About Adherence Adherence is defined as “the extent to which a person’s
behavior—taking medication, following a diet and/or executing lifestyle changes—corresponds with agreed recommendations from a healthcare provider.”1
“Poor adherence to the treatment of chronic diseases is a worldwide problem of striking magnitude. The impact of poor adherence grows as the burden of chronic disease grows.”1
Adherence to drug therapy is measured as both: Compliance: the extent to which a plan participant takes
medicine as prescribed (e.g., medication possession ratio). Persistence: the extent to which a plan participant follows
the prescribed length of therapy (e.g., length of continuous therapy in days).
Source: 1. World Health Organization 2003. “Adherence to Long-term Therapies – Evidence for action.”
6
Caremark proprietary and confidential information. Not for distribution.
About Boosted Regression Came out of computational learning, called “boosting”1 Expands into generalized linear models based on the
Gradient Boosting Machine2, 3 that recast the algorithm in a likelihood framework
Fits a regression tree to residuals from previously fitted regression tree (beginning first with a “guess” of the response variable)
Updates regression tree sequentially to include all previously estimated regression trees, until the following two parameters (specified a priori) are reached: Number of “splits” (or N-way interactions) Maximum number of iterations
Sources: 1. Freund, Y. and R. E. Schapire. 1997. “A decision-theoretic generalization of online learning and an application to boosting.” Journal of Computer and System Sciences 55(1): 119-139. 2. Friedman, J.H. 2001. “Greedy function approximation: a gradient boosting machine.” Annals of Statistics 29(5): 1189-1232.3. Friedman, J.H. 2002. “Stochastic gradient boosting.” Computational Statistics and Data Analysis 38(4): 367-378.
7
Caremark proprietary and confidential information. Not for distribution.
Data Utilized integrated medical and pharmacy claims
data from a large (N=369,985) U.S. health plan Studied 2003 and 2004 data, which allowed for a
baseline/follow-up design Included plan participants continuously eligible for
pharmacy benefits for the entire study period Allowed for no other exclusions or restrictions
(e.g., all ages and all claims remained in the study) Partitioned data randomly into 70% training and
30% validation samples
8
Caremark proprietary and confidential information. Not for distribution.
Methods Used five multivariate predictive models of total annual
healthcare costs (pharmacy and medical) in the follow-up year to estimate four conditions: Diabetes, congestive heart failure, hypercholesterolemia,
hypertension Included independent variables:
Continuous measure of baseline pharmacy costs 14 age/gender categories 62 PHD disease indicators Average co-pay per day supplied Percent mail service days supplied Four adherence dimension variables:
- Compliance and compliance2
- Days persistent- Number of different drugs
9
Caremark proprietary and confidential information. Not for distribution.
MethodsIncluded five econometric modeling techniques:
Ordinary least squares (OLS) Robust regression Two-part model-probit/OLS Two-part model-probit/GLM (gamma,log link) Boosted regression with STATA command syntax:
boost THC2004_T50 $RHS $OTH, influence distribution(normal) trainfraction(0.7) maxiter(1000) seed(1) bag(0.5) predict(HATS`m') interaction(3) shrink(0.01)
10
Caremark proprietary and confidential information. Not for distribution.
Results
Diabetes
(N=13,202)
Congestive Heart Failure (N=22,243)
Hyper- cholesterolemia
(N=33,597) Hypertension
(N=60,028) Total 2004 healthcare costs (medical + pharmacy)
Minimum Maximum Mean
$0 $1,719,645
$17,551
$0 $3,398,680
$16,127
$0 $3,398,680
$13,366
$0 $3,398,680
$13,612 Median $4640 $3673 $3726 $3204
Mean compliance 0.80 0.72 0.80 0.83 Mean days persistent 300 244 278 300 Mean number of different drugs 1.88 1.18 1.21 1.70
Descriptive statistics
11
Caremark proprietary and confidential information. Not for distribution.
ResultsAdherence dimension coefficient estimates from OLS model of prospective total annual healthcare costs (untruncated)
Diabetes Congestive
Heart Failure Hyper-
cholesterolemia Hypertension Days persistent -$6 -$14*** -$6* -$16*** Compliance $6686 $20,562** -$4390 $13,936*** Compliance2 -$9406 -$18,149*** $851 -$12,851*** Number of different drugs $370 $9328*** $177 $1199*** ***p<0.01; **p<0.05; *p<0.10
12
Caremark proprietary and confidential information. Not for distribution.
ResultsAdherence dimension coefficient estimates from OLS model of prospective total annual healthcare costs (truncated at $50,000)
Diabetes Congestive
Heart Failure Hyper-
cholesterolemia Hypertension Days persistent -2 -3*** -4*** -4*** Compliance 330 3882** 2234 3819*** Compliance2 -1361 -4015*** -2314* -3821*** Number of different drugs 356** 1920*** 585*** 658*** ***p<0.01; **p<0.05; *p<0.10
13
Caremark proprietary and confidential information. Not for distribution.
Results
Model Diabetes Congestive
Heart Failure Hyper-
cholesterolemia Hypertension OLS
R2
MAPE*
.020
20,559
.026
19,547
.036
15,223
.025
16,383 Robust
R2
MAPE
.016
15,131
.018
14,464
.031
11,354
.023
11,836 Two-Part: Probit-OLS
R2
MAPE
.021
16,371
.031
15,591
.027
13,915
.028
14,730 Two-Part: Probit-GLM (gamma/log)
R2
MAPE
.013
16,261
.020
18,940
.001
15,755
.007
18,215 Boosted
R2
MAPE
.005
17,322
.017
19,844
.005
15,777
.007
16,800
Validation sample summary results from predictive models of prospective total annual healthcare costs (untruncated)
* Mean absolute prediction error
14
Caremark proprietary and confidential information. Not for distribution.
Results
Diabetes Congestive
Heart Failure Hyper-
cholesterolemia Hypertension OLS
R2
MAPE
.123 8816
.142 8086
.117 7455
.120 7496
Robust R2
MAPE
.093 7450
.119 6815
.098 6266
.107 6118
Two-Part: Probit-OLS R2
MAPE
.085 8122
.085 7484
.094 7126
.092 7246
Two-Part: Probit-GLM (gamma/log) R2
MAPE
.065 8136
.011 8353
.004 7912
.006 8264
Boosted R2
MAPE
.132 8769
.148 8061
.124 7453
.131 7467
Validation sample summary results from predictive models of prospective total annual healthcare costs(truncated at $50,000)
15
Caremark proprietary and confidential information. Not for distribution.
Conclusions Increased compliance was associated with a decrease
in next year’s total healthcare costs Each additional day of persistent therapy was
associated with a decrease of between $6 and $16 in next year’s total healthcare costs
The magnitude of this association varied, as expected, by disease state
The number of different drugs – filled within a given year and indicated for that disease state – increased next year’s total healthcare costs, likely signifying: Treatment resistance/failure Therapeutic aggressiveness/intensity Disease severity
16
Caremark proprietary and confidential information. Not for distribution.
Conclusions cont.
PHD provided classification and predictive power similar to other prescription-only risk-adjustment groupers
Robust regression, as expected, always returned the least mean absolute prediction error
While boosted regression did offer higher R2, overfitting was evident in untruncated models (Note: this study did not attempt to respecify the boosting parameters)
Unfortunately, currently available user-written command BOOST does not output the regression tree structure for application in other samples
Boosting is useful in uncovering important interaction terms
17
Caremark proprietary and confidential information. Not for distribution.
Limitations The potential endogeneity of the adherence
measures was not examined. Non-adherent behavior may not alter next year’s
total healthcare costs, but may affect future periods’ total healthcare costs.
The study sample was from a single, national health plan, and are therefore, not generalizable.
Need to consider other measures of accuracy (positive predictive value).
Need to tweak boosting specification to reduce overfitting.
top related