imputation-enhanced prediction of septic shock in icu patients

Imputation-enhanced Prediction of Septic Shock In ICU

PatientsJoyce C. Ho, Cheng H. Lee and Joydeeph Ghosh

University of Texas at AustinHI-KDD 2012: ACM SIGKDD Workshop on Health

Informatics

Presenter : Kiyana Zolfaghar

Outline

Motivation

Challenges of Clinical Data

Predictive model for Sepsis Risk

Septic Shock

Impact of imputation methods on prediction

Results

Sepsis and Septic shock

Sepsis

SevereSepsis

SepticShock

a Severe, systemic inflammatory response with a presumed or identified source of infection.

Sepsis with one or more organ dysfunction, hypoperfusion or hypotension

a complication characterized by low blood pressure despite treatment by >600 mL of fluid inputs in the last hour

Motivation Septic Shock as a Severe illness

the most common cause of death in western societies 25% of ICU bed utilization in western countries mortality rates range 12.8% for sepsis to 45.7% for septic shock

Motivation for Prediction of Septic Shock in ICU Patients Early intervention and therapy can improve the outcome of patients treatment transition

treated by critical carephysicians

in later phases

Proactive treatment in early phases

Prediction of Sepsis and Septic shock Data mining approach for identifying patients at risk for developing sepsis

Predictive models

Issues Regarding Classification and Prediction Data Preparation

Feature selection Data cleaning

remove or reduce noise treatment of missing values

Regression method

Support vector Machines

Decision trees

Bayesian Classification …..

Challenges of Clinical Data Typically noisy and inconsistently gathered

Manually recordings of patient's data at irregular intervals

Accurate measures for physiological variables require use of invasive techniques

Naïve Solution Simply ignoring subjects or features with missing data

large amounts of missing data in clinical studies

Dramatic decrease in sample sizes or feature spaces

Bias in the results

The Paper ContributionInvestigates the role and impact of imputation methods

while building predictive models forSepsis risk Septic shock

Methodology of ResearchData Selection

Building predictive models for sepsis and Septic shock

Leveraging different imputation methods on data

Results

Dataset DescriptionMIMIC-II Database (Multiparameter Intelligent Monitoring in Intensive Care)

Publicly and freely availableIncludes very large population of ICU patientscontains high temporal resolution data including

lab results electronic documentationmonitor trends and waveforms.

Funded by :National Institute Of Biomedical Imaging and Bioengineering

Overview of the data categories General

• Patient demographics• Hospital admissions & discharge Info.• Room tracking, death dates• ICD-9 codes

Physiological measures Hourly vital sign metrics

Medication records Lab test results Fluid Balance

Input and output records Notes and Reports

Discharge summary, nursing progress notes Radiology and echo reports.

Clinical Records in MIMIC-II

Data Selection and Target ClassesDataset Size : 12,179 patients

Avoid adults < 18 at time of admission Patients with least ten observations of BP, TEMP, HR…

Target class

Sepsis Risk Prediction• Patients identified by ICD-9 codlings (\995.91" or \995.92“)•~ 10:8% of dataset size (1,310 patients)

Septic shock Prediction• Patient with hypotension and total fluid intake >600 mL• ~ 44:7%of sepsis patients (586 patients)

Predictive Model for Sepsis Risk Features

Patient's Clinical History• Demographic data (gender and ages)• Medical history • Basic health data (weight ..)

Measurements of Physiological Variables

logistic Regression as prediction modeluse only the clinical history featuresuse clinical history features after step-wise regressionall available featuresuse all available features after step-wise regression

Stepwise logistic Regression model• Logistic Regression

• Type of regression analysis used for predicting the outcome of a categorical target variable

• Stepwise Regression• the choice of predictive variables is carried out by an automatic

procedure1. starting with no variables in the model2. testing the addition of each variable using a chosen model

comparison criterion3. adding the variable (if any) that improves the model the most4. repeating this process until none improves the model.

Septic Shock Prediction ModelFeatures

physiologic and laboratory values

Importance of time in septic shock• Feature matrices creation at reference times of 30, 60, 90, and 120

minutes prior to the onset of septic shock.

Prediction Models

Logistic Regression

Support Vector Machine

Classification tree

all available features,features set after forward stepwise regressionfeatures set after backward stepwise regression

Decision Tree LearningGoal

• create a model to predicts value of a target variable based on several input variables

Learning a decision tree Recursive partitioning Based on selected attribute stopping partitioning All samples for a given node belong to the same class

Decision treeClassification TreesRegression Trees

Sex

Age

Male

Survived

sibsp

<= 9.5

died

Survived died

> 2.5

Female

>9.5

<= 2.5

36%

61%

2%2%

Missing Value Imputation Missing data in MIMIC II

excluding records with missing value

47.2%. Reduction in dataset size

Imputation Methods1) Mean Feature Values (Mean for Subgroup)

Derived from the patients' gender and age group

• accounted for fundamental physiological differences between

genders and among age groups

Challenges

Mean substitution is especially problematic when there are

many missing values

distorts the distribution and variance

Imputation Methods2) Matrix Factorization-based Approaches (Very popular in Bioinformatics fields)

SVDImpute• Used a linear combination of k-eigenvalues to predict the missing value

Probabilistic Principal Component Analysis (PPCA)• Combined an Expectation-Maximization (EM) approach to Principal

Component Analysis (PCA) with a probabilistic model• Use a likelihood function to penalizes data far from the training set

Bayesian PCA• EM approach + Bayesian model to calculate the likelihood for constructed

data

Sepsis Risk Prediction ResultsNo Base Model to compare the result with

Evaluation metric • AUC (Area Under the curve)

Septic Shock Prediction Results• The septic shock EWS as baseline

• Prediction model : logistic regression • predict the onset of septic shock one hour in advance• Use invasively-gathered data from MIMIC waveform data

Imputation-enhanced Prediction Of Septic Shock• Impact of various imputation methods on different

reference time• In comparison with baseline with logistic regression model

AUC Curves for predicting septic shock 60 minutes before onset

Septic shock prediction 60 minutes before onset for three types of models:

Effect of imputation on logistic regression coefficients for predicting septic Shock

Consistency across different imputation methods

Inconsistency of values obtained with and without

Imputation

non-imputed model suffer

from over-fitting

Conclusion Imputing missing data can improve model Performance

especially when dealing with larger, noisier, and more incomplete datasets

Matrix factorization imputation methods like BPCA lead to models with better predictive accuracy than simpler approaches like group means.

imputation-enhanced prediction of septic shock in icu patients

Documents

prediction of sepsis

hourcases of severe

medical viewpoint sepsis

medical management of

mortality risk septic

ill patients

patients withaids

outcome of patients