evaluating risk adjustment models andy bindman md department of medicine, epidemiology and...
TRANSCRIPT
Evaluating Risk Adjustment Models
Andy Bindman MD
Department of Medicine, Epidemiology and Biostatistics
Evaluating Model’s Predictive Power
Linear regression (continuous outcomes)
Logistic regression (dichotomous outcomes)
Evaluating Linear Regression Models
R2 is percentage of variation in outcomes explained by the model - best for continuous dependent variables– Length of stay– Health care costs
Ranges from 0-100% Generally more is better
Risk Adjustment Models
Typically explain only 20-25% of variation in health care utilization
Explaining this amount of variation can be important if remaining variation is extremely random
Example: supports equitable allocation of capitation payments from health plans to providers
More to Modeling than Numbers
R2 biased upward by more predictors
Approach to categorizing outliers can affect R2 as predicting less skewed data gives higher R2
Model subject to random tendencies of particular dataset
Evaluating Logistic Models
Discrimination - accuracy of predicting outcomes among all individuals depending on their characteristics
Calibration - how well prediction works across the range of risk
Discrimination
C index - compares all random pairs of individuals in each outcome group (alive vs dead) to see if risk adjustment model predicts a higher likelihood of death for those who died (concordant)
Ranges from 0-1 based on proportion of concordant pairs and half of ties
Adequacy of Risk Adjustment Models
C index of 0.5 no better than random
C index of 1.0 indicates perfect prediction
Typical risk adjustment models 0.7-0.8
C statistic
Area under ROC curve for a predictive model no better than chance at predicting death is 0.5
Models with improved prediction of death by– 0.5 SDs better than chance results in c statistic =0.64– 1.0 SDs better than chance resutls in c statistic = 0.76– 1.5 SDs better than chance results in c statistic =0.86– 2.0 SDs better tha chance results in c statistic =0.92
Best Model Doesn’t Always Have Biggest C statistic
Adding health conditions that result from complications will raise c statistic of model but not make the model better for predicting quality.
Spurious Assessment of Model Performance
Missing values can lead to some patients being dropped from models
Be certain when comparing models that the same group of patients is being used for all models otherwise comparisons may reflect more than model performance
Calibration - Hosmer-Lemeshow
Size of C index does not indicate how well model performs across range of risk
Stratify individuals into groups (e.g. 10 groups) of equal size according to predicted likelihood of adverse outcome (eg death)
Compare actual vs expected outcomes for each stratum
Want a non significant p value for each stratum and across strata (Hosmer-Lemeshow statistic)
Hosmer-Lemeshow
For k strata the chi squared has k-2 degrees of freedom
Can obtain false negative (non significant p value) by having too few cases in a stratum
Calculating Expected Outcomes Solve the multivariate model incorporating an individual’s
specific characteristics
For continuous outcomes the predicted values are the expected values
For dichotomous outcomes the sum of the derived predictor variables produces a “logit” which can be algebraically converted to a probability
(e nat log odds/1 + e nat log odds)
Individual’s CABG Mortality Risk
65 y.o obese non white woman with diabetes and serum creatinine of 1 mg/dl presents with an urgent need for CABG surgery. What is her risk of death?
Individual’s Predicted CABG Mortality Risk
65 y.o obese non white woman with diabetes presents with an urgent need for CABG surgery. What is her risk of death?
Log odds = -9.74 +65(0.06) + 0.37+.16+.42+.26+1(1.15) +.09 = 3.39
Probability of death = 0.034/1.034=3.3%
Observed CABG Mortality Risk
Actual outcome of whether individual lived or died
Observed rate for a group is number of deaths per the number of people in that group
Actual and Expected CABG Surgery Mortality Rates by Patient Severity of Illness in New York
Hospital Mortality Rate %
Range of ExpectedMortality Rate, % No. of
PatientsActual Expected P for Difference
0.20-0.70 5719 0.38 0.57 .06
0.70-0.92 5718 0.73 0.82 .50
0.92-1.13 5719 0.75 1.01 .05
1.13-1.38 5719 1.26 1.25 .96
1.38-1.68 5719 1.40 1.52 .45
1.68-2.12 5718 1.94 1.89 .79
2.12-2.74 5719 2.36 2.40 .84
2.74-3.82 5719 3.29 3.22 .78
3.82-6.25 5718 5.33 4.82 .07
6.25-93.20 5719 13.60 13.55 .90
Chi squared p=.16
Goodness-of-fit tests for AMI mortality models
Model A Model B
Number of cases 5,442 5,415
Number of deaths 1,044 1,039
Death rate, % 19.18 19.19
Model chi square 721.73 1,276.49
df 13 25
p value 0.0001 0.0001
C statistic 0.759 0.830
Hosmer Lemeshowstatistic
14.92 27.24
8 8
0.0607 0.0006
Stratifying by Risk
Hosmer Lemeshow provides a summary statistic of how well model is calibrated
Also useful to look at how well model performs at extremes (high risk and low risk)
Validating Model – Eye Ball Test
Face validity/Content validity
Does empirically derived model correspond to a pre-determined conceptual model?
If not is that because of highly correlated predictors? A dataset limitation? A modeling error?
Validating Model in Other Datasets: Predicting Mortality following CABG
STS NY VA Duke MN
C statistic .759 .768 .722 .789 .752
Jones et al, JACC, 1996
Recalibrating Risk Adjustment Models
Necessary when observed outcome rate different than expected derived from a different population
This could reflect quality of care or differences in coding practices
Assumption is that relative weights of predictors to one another is correct
Recalibration is an adjustment to all predictor coefficients to force average expected outcome rate to equal observed outcome rate
Recalibrating Risk Adjustment Models
New York AMI mortality rate is 15% California AMI mortality rate is 13% Is care or coding different?
If want to use New York derived risk adjustment model to predict expected deaths in California need to adjust predictors (eg multiply by 13/15)
Summary Summary statistics provide a means for evaluating the
predictive power of multivariate models Care should be taken to look beyond summary statistics
to ensure that the model is not overspecified and that it conforms to a conceptual model
Models should be validated with internal and ideally external data
Next time we will review how a risk-adjustment model can be used to identify providers who perform better and worse than expected given their patient mix