imputation-enhanced prediction of septic shock in icu patients
DESCRIPTION
Imputation-enhanced Prediction of Septic Shock In ICU Patients. Joyce C. Ho, Cheng H. Lee and Joydeeph Ghosh University of Texas at Austin HI-KDD 2012: ACM SIGKDD Workshop on Health Informatics. Presenter : Kiyana Zolfaghar. Outline. Motivation Challenges of Clinical Data - PowerPoint PPT PresentationTRANSCRIPT
Imputation-enhanced Prediction of Septic Shock In ICU
PatientsJoyce C. Ho, Cheng H. Lee and Joydeeph Ghosh
University of Texas at AustinHI-KDD 2012: ACM SIGKDD Workshop on Health
Informatics
Presenter : Kiyana Zolfaghar
Outline
Motivation
Challenges of Clinical Data
Predictive model for Sepsis Risk
Septic Shock
Impact of imputation methods on prediction
Results
Sepsis and Septic shock
Sepsis
SevereSepsis
SepticShock
a Severe, systemic inflammatory response with a presumed or identified source of infection.
Sepsis with one or more organ dysfunction, hypoperfusion or hypotension
a complication characterized by low blood pressure despite treatment by >600 mL of fluid inputs in the last hour
Motivation Septic Shock as a Severe illness
the most common cause of death in western societies 25% of ICU bed utilization in western countries mortality rates range 12.8% for sepsis to 45.7% for septic shock
Motivation for Prediction of Septic Shock in ICU Patients Early intervention and therapy can improve the outcome of patients treatment transition
treated by critical carephysicians
in later phases
Proactive treatment in early phases
Prediction of Sepsis and Septic shock Data mining approach for identifying patients at risk for developing sepsis
Predictive models
Issues Regarding Classification and Prediction Data Preparation
Feature selection Data cleaning
remove or reduce noise treatment of missing values
Regression method
Support vector Machines
Decision trees
Bayesian Classification …..
Challenges of Clinical Data Typically noisy and inconsistently gathered
Manually recordings of patient's data at irregular intervals
Accurate measures for physiological variables require use of invasive techniques
Naïve Solution Simply ignoring subjects or features with missing data
large amounts of missing data in clinical studies
Dramatic decrease in sample sizes or feature spaces
Bias in the results
The Paper ContributionInvestigates the role and impact of imputation methods
while building predictive models forSepsis risk Septic shock
Methodology of ResearchData Selection
Building predictive models for sepsis and Septic shock
Leveraging different imputation methods on data
Results
Dataset DescriptionMIMIC-II Database (Multiparameter Intelligent Monitoring in Intensive Care)
Publicly and freely availableIncludes very large population of ICU patientscontains high temporal resolution data including
lab results electronic documentationmonitor trends and waveforms.
Funded by :National Institute Of Biomedical Imaging and Bioengineering
Overview of the data categories General
• Patient demographics• Hospital admissions & discharge Info.• Room tracking, death dates• ICD-9 codes
Physiological measures Hourly vital sign metrics
Medication records Lab test results Fluid Balance
Input and output records Notes and Reports
Discharge summary, nursing progress notes Radiology and echo reports.
Clinical Records in MIMIC-II
Data Selection and Target ClassesDataset Size : 12,179 patients
Avoid adults < 18 at time of admission Patients with least ten observations of BP, TEMP, HR…
Target class
Sepsis Risk Prediction• Patients identified by ICD-9 codlings (\995.91" or \995.92“)•~ 10:8% of dataset size (1,310 patients)
Septic shock Prediction• Patient with hypotension and total fluid intake >600 mL• ~ 44:7%of sepsis patients (586 patients)
Predictive Model for Sepsis Risk Features
Patient's Clinical History• Demographic data (gender and ages)• Medical history • Basic health data (weight ..)
Measurements of Physiological Variables
logistic Regression as prediction modeluse only the clinical history featuresuse clinical history features after step-wise regressionall available featuresuse all available features after step-wise regression
Stepwise logistic Regression model• Logistic Regression
• Type of regression analysis used for predicting the outcome of a categorical target variable
• Stepwise Regression• the choice of predictive variables is carried out by an automatic
procedure1. starting with no variables in the model2. testing the addition of each variable using a chosen model
comparison criterion3. adding the variable (if any) that improves the model the most4. repeating this process until none improves the model.
Septic Shock Prediction ModelFeatures
physiologic and laboratory values
Importance of time in septic shock• Feature matrices creation at reference times of 30, 60, 90, and 120
minutes prior to the onset of septic shock.
Prediction Models
Logistic Regression
Support Vector Machine
Classification tree
all available features,features set after forward stepwise regressionfeatures set after backward stepwise regression
Decision Tree LearningGoal
• create a model to predicts value of a target variable based on several input variables
Learning a decision tree Recursive partitioning Based on selected attribute stopping partitioning All samples for a given node belong to the same class
Decision treeClassification TreesRegression Trees
Sex
Age
Male
Survived
sibsp
<= 9.5
died
Survived died
> 2.5
Female
>9.5
<= 2.5
36%
61%
2%2%
Missing Value Imputation Missing data in MIMIC II
excluding records with missing value
47.2%. Reduction in dataset size
Imputation Methods1) Mean Feature Values (Mean for Subgroup)
Derived from the patients' gender and age group
• accounted for fundamental physiological differences between
genders and among age groups
Challenges
Mean substitution is especially problematic when there are
many missing values
distorts the distribution and variance
Imputation Methods2) Matrix Factorization-based Approaches (Very popular in Bioinformatics fields)
SVDImpute• Used a linear combination of k-eigenvalues to predict the missing value
Probabilistic Principal Component Analysis (PPCA)• Combined an Expectation-Maximization (EM) approach to Principal
Component Analysis (PCA) with a probabilistic model• Use a likelihood function to penalizes data far from the training set
Bayesian PCA• EM approach + Bayesian model to calculate the likelihood for constructed
data
Sepsis Risk Prediction ResultsNo Base Model to compare the result with
Evaluation metric • AUC (Area Under the curve)
Septic Shock Prediction Results• The septic shock EWS as baseline
• Prediction model : logistic regression • predict the onset of septic shock one hour in advance• Use invasively-gathered data from MIMIC waveform data
Imputation-enhanced Prediction Of Septic Shock• Impact of various imputation methods on different
reference time• In comparison with baseline with logistic regression model
AUC Curves for predicting septic shock 60 minutes before onset
Septic shock prediction 60 minutes before onset for three types of models:
Effect of imputation on logistic regression coefficients for predicting septic Shock
Consistency across different imputation methods
Inconsistency of values obtained with and without
Imputation
non-imputed model suffer
from over-fitting
Conclusion Imputing missing data can improve model Performance
especially when dealing with larger, noisier, and more incomplete datasets
Matrix factorization imputation methods like BPCA lead to models with better predictive accuracy than simpler approaches like group means.