training grant (nih/nlm 2t15 lm007092-22) national library ...€¦ · we can use machine learning...

59
Data Science and Health: Representation learning for health with multimodal clinical data Marzyeh Ghassemi Collaborators: Nicole Brimmer, Jen Gong, Nathan Hunt, Maggie Makar, Tristan Naumann, Marco Pimentel, Harini Suresh, Mike Wu, Dr. Robert Hillman, Dr. Leo Anthony Celi, Prof. Finale Doshi-Velez, Prof. John Guttag, Prof. Peter Szolovits Intel Science and Technology Center for Big Data National Library of Medicine Biomedical Informatics Research Training grant (NIH/NLM 2T15 LM007092-22)

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

Data Science and Health:

Representation learning for health with multimodal clinical data

Marzyeh Ghassemi

Collaborators: Nicole Brimmer, Jen Gong, Nathan Hunt, Maggie Makar, Tristan Naumann, Marco Pimentel, Harini Suresh,

Mike Wu, Dr. Robert Hillman, Dr. Leo Anthony Celi, Prof. Finale Doshi-Velez, Prof. John Guttag, Prof. Peter Szolovits

Intel Science and Technology Center for Big DataNational Library of Medicine Biomedical Informatics Research

Training grant (NIH/NLM 2T15 LM007092-22)

Page 2: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

1

Why Health?

•Improvements in health improve lives.

• Same patient different treatments across hospitals, clinicians.

•Improving care requires targeting and evidence.

[1] Smith M, Saunders R, Stuckhardt L, McGinnis JM, Committee on the Learning Health Care System in America, Institute of Medicine. Best Care At Lower Cost: The Path To Continuously Learning Health Care In America. Washington: National Academies Press; 2013.[2] Travers, Justin, et al. "External validity of randomised controlled trials in asthma: to whom do the results of the trials apply?." Thorax 62.3 (2007): 219-223.

Page 3: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

2

Why Health?

•Improvements in health improve lives.

• Same patient different treatments across hospitals, clinicians.

•Improving care requires targeting and evidence.

• Treatments based on RCTs (Randomized Controlled Trials)?

• How many asthmatics are eligible for treatment RCTs?

[1] Smith M, Saunders R, Stuckhardt L, McGinnis JM, Committee on the Learning Health Care System in America, Institute of Medicine. Best Care At Lower Cost: The Path To Continuously Learning Health Care In America. Washington: National Academies Press; 2013.[2] Travers, Justin, et al. "External validity of randomised controlled trials in asthma: to whom do the results of the trials apply?." Thorax 62.3 (2007): 219-223.

Page 4: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

3

Why Health?

•Improvements in health improve lives.

• Same patient different treatments across hospitals, clinicians.

•Improving care requires targeting and evidence.

• Treatments based on RCTs (Randomized Controlled Trials)?… Only 10-20% of treatments.1 Rare and expensive.

• How many asthmatics are eligible for treatment RCTs?

[1] Smith M, Saunders R, Stuckhardt L, McGinnis JM, Committee on the Learning Health Care System in America, Institute of Medicine. Best Care At Lower Cost: The Path To Continuously Learning Health Care In America. Washington: National Academies Press; 2013.[2] Travers, Justin, et al. "External validity of randomised controlled trials in asthma: to whom do the results of the trials apply?." Thorax 62.3 (2007): 219-223.

Page 5: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

4

Why Health?

•Improvements in health improve lives.

• Same patient different treatments across hospitals, clinicians.

•Improving care requires targeting and evidence.

• Treatments based on RCTs (Randomized Controlled Trials)?… Only 10-20% of treatments.1 Rare and expensive.

• How many asthmatics are eligible for treatment RCTs? … Only 6% of asthmatics.2 Structured biases.

[1] Smith M, Saunders R, Stuckhardt L, McGinnis JM, Committee on the Learning Health Care System in America, Institute of Medicine. Best Care At Lower Cost: The Path To Continuously Learning Health Care In America. Washington: National Academies Press; 2013.[2] Travers, Justin, et al. "External validity of randomised controlled trials in asthma: to whom do the results of the trials apply?." Thorax 62.3 (2007): 219-223.

Page 6: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

5

Why Now?

•EHRs (Electronic Health Records) are used by:• Over 80% of US hospitals.1

• Over 60% of Canadian practitioners.2

[1] "Big Data in Health Care". The National Law Review. The Analysis Group, Inc. [2] Chang, Feng, and Nishi Gupta. "Progress in electronic medical record adoption in Canada." Canadian Family Physician 61.12 (2015): 1076-1084

Page 7: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

6

Why Now?

•EHRs (Electronic Health Records) are used by:• Over 80% of US hospitals.1

• Over 60% of Canadian practitioners.2

We can use machine learning to derive insight from health data!

[1] "Big Data in Health Care". The National Law Review. The Analysis Group, Inc. [2] Chang, Feng, and Nishi Gupta. "Progress in electronic medical record adoption in Canada." Canadian Family Physician 61.12 (2015): 1076-1084

Page 8: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

7

Extracting Knowledge is Hard in Health

•Data are not gathered to answer your hypothesis.

•Primary case is to provide care.

•Secondary data are hard to work with.

HeterogenousSamplingData TypeTime Scale

SparseUnmeasuredUnreported

No Follow-up

UncertaintyLabelsBias

Context

.

Page 9: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

8

Health Requires Domain Knowledge

•Doctors and computer scientists have different goals.

+ ≠ Insight

•Learning out-of-the-box without understanding is dangerous.1

“HasAsthma(x) ⇒ LowerRisk(x)”

[1] Caruana, Rich, et al. "Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.

“...aggressive care received by asthmatic pneumonia patients (in the training set) was so effective that it lowered their risk of dying from

pneumonia compared to the general population…”

Page 10: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

9

What Makes a Good Representation/Model In Health?

•Representations are useful abstractions of data X that disentangle underlying factors.

•Enables semi-supervised learning; factors explaining P(X) are useful for learning P(Y|X).

•Allows shared factors across many learning tasks.

Physiological State

HR BP RR °F

HR

BP

RR

°F

Page 11: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

10

Choosing the Right Representation For Each Problem

•Time Series Latent States

Multivariate Timeseries AlertGood

Latent Belief States Over Time

Ok

3v1 v2 vk

4 3 ... 1v1 v2 vk

2 0 ... 0v1 v2 vk

0 0 ...

3 4 3 ... 0 0 0 ... 1 2 0 ...

Page 12: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

11

Choosing the Right Representation For Each Problem

•Time Series Latent States

•Text Data Topic Vectors

Patient is very sick – needs ventilation!

Patient is sick and disoriented; will require help to move around.

“Critical” “Resp” “Critical” “Confused”

NoteTopic Vector: 50/50 Topic Vector: 70/30

Note

Multivariate Timeseries AlertGood

Latent Belief States Over Time

Ok

3v1 v2 vk

4 3 ... 1v1 v2 vk

2 0 ... 0v1 v2 vk

0 0 ...

3 4 3 ... 0 0 0 ... 1 2 0 ...

Page 13: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

12

Choosing the Right Representation For Each Problem

•Time Series Latent States

•Text Data Topic Vectors

• Instrumentation Signals Symbols/Kernels

Patient is very sick – needs ventilation!

Patient is sick and disoriented; will require help to move around.

“Critical” “Resp” “Critical” “Confused”

NoteTopic Vector: 50/50 Topic Vector: 70/30

Note

Time-varying Signals Kernels Quasi-periodic Signal Sequence of Symbols

Multivariate Timeseries AlertGood

Latent Belief States Over Time

Ok

3v1 v2 vk

4 3 ... 1v1 v2 vk

2 0 ... 0v1 v2 vk

0 0 ...

3 4 3 ... 0 0 0 ... 1 2 0 ...

Page 14: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

13

Actionable Clinical Insights from Machine Learning

1. Early Prediction of Actionable In-Patient ICU Interventions Ghassemi, M.*, Wu*, Feng, Celi, Szolovits, and Doshi-Velez. (JAMIA 2016); Ghassemi, M*, Wu*, Hughes*, Szolovits, and Doshi-Velez. (AMIA-CRI 2017)

2. Good Representations for Post-Discharge Outcome Prediction Ghassemi, M., Naumann, Doshi, Brimmer, Joshi, Rumshisky, and Szolovits. (KDD 2014); Ghassemi, M.*, Pimentel*, Naumann, Brennan, Clifton, Szolovits, and Feng. (AAAI 2015); Rumshisky, Ghassemi, M., Naumann, Szolovits, Castro, McCoy, and Perlis. (Translational Psychiatry

– Nature 2016)

3. Evidence-based Feedback in Wearable Out-patient Devices Ghassemi, M., Van Stan, Mehta, Zañartu, Cheyne, Hillman, and Guttag. (IEEE TBME 2014); Ghassemi, M., Syed, Mehta, Van Stan, Hillman, and Guttag. (MLHC 2016/JMLR W&C V56)

Voice Cardio … Resp

Page 15: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

14

Actionable Clinical Insights from Machine Learning

1. Early Prediction of Actionable In-Patient ICU Interventions Ghassemi, M.*, Wu*, Feng, Celi, Szolovits, and Doshi-Velez. (JAMIA 2016); Ghassemi, M*, Wu*, Hughes*, Szolovits, and Doshi-Velez. (AMIA-CRI 2017)

2. Good Representations for Post-Discharge Outcome Prediction Ghassemi, M., Naumann, Doshi, Brimmer, Joshi, Rumshisky, and Szolovits. (KDD 2014); Ghassemi, M.*, Pimentel*, Naumann, Brennan, Clifton, Szolovits, and Feng. (AAAI 2015); Rumshisky, Ghassemi, M., Naumann, Szolovits, Castro, McCoy, and Perlis. (Translational Psychiatry

– Nature 2016)

3. Evidence-based Feedback in Wearable Out-patient Devices Ghassemi, M., Van Stan, Mehta, Zañartu, Cheyne, Hillman, and Guttag. (IEEE TBME 2014); Ghassemi, M., Syed, Mehta, Van Stan, Hillman, and Guttag. (MLHC 2016/JMLR W&C V56)

Voice Cardio … Resp

Page 16: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

15

MIMIC III ICU Data

• Learning with real patient data from the Beth Israel Deaconess Medical Center ICU. 1

Signals

Numerical

Narrative

Traditional

Nurse Note

Doc Note

Discharge Note

Doc Note

Path Note

00:00 12:00 24:00 36:00 48:00

Billing CodesDiagnoses

AgeGender

Risk Score

Misspelled Acronym-laden Copy-paste

Irregular SamplingSporadic

Spurious Data Missing Data

Biased

[1] Johnson, Alistair EW, et al. "MIMIC-III, a freely accessible critical care database." Scientific data 3 (2016).

Page 17: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

16

MIMIC III ICU Data

• Learning with real patient data from the Beth Israel Deaconess Medical Center ICU. 1

Signals

Numerical

Narrative

Traditional

Nurse Note

Doc Note

Discharge Note

Doc Note

Path Note

00:00 12:00 24:00 36:00 48:00

Billing CodesDiagnoses

AgeGender

Risk Score

Misspelled Acronym-laden Copy-paste

Irregular SamplingSporadic

Spurious Data Missing Data

Biased

[1] Johnson, Alistair EW, et al. "MIMIC-III, a freely accessible critical care database." Scientific data 3 (2016).

Page 18: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

17

Example: Early Prediction of Vasopressor Interventions

[1] Müllner, Marcus, Bernhard Urbanek, Christof Havel, Heidrun Losert, Gunnar Gamper, and Harald Herkner. "Vasopressors for shock." The Cochrane Library (2004).[2] D’Aragon, Frederick, Emilie P. Belley-Cote, Maureen O. Meade, François Lauzier, Neill KJ Adhikari, Matthias Briel, Manoj Lalu et al. "Blood Pressure Targets For Vasopressor Therapy: A Systematic Review." Shock 43, no. 6 (2015): 530-539.[3] Vincent, Jean-Louis, and Mervyn Singer. "Critical care: advances and future perspectives." The Lancet 376.9749 (2010): 1354-1361.[4] Ospina-Tascón, Gustavo A., Gustavo Luiz Büchele, and Jean-Louis Vincent. "Multicenter, randomized, controlled trials evaluating mortality in intensive care: Doomed to fail?." Critical care medicine 36.4 (2008): 1311-1322.

• Vasopressors are a common drug to raise blood pressure. • All drugs can be harmful, we’d like to avoid when possible.1,2

• Assume that real clinical actions are good learning data.

• Predict upcoming interventions based on evidence.3,4

Page 19: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

18

Define Clinically Actionable Prediction Tasks:

[1] Müllner, Marcus, Bernhard Urbanek, Christof Havel, Heidrun Losert, Gunnar Gamper, and Harald Herkner. "Vasopressors for shock." The Cochrane Library (2004).[2] D’Aragon, Frederick, Emilie P. Belley-Cote, Maureen O. Meade, François Lauzier, Neill KJ Adhikari, Matthias Briel, Manoj Lalu et al. "Blood Pressure Targets For Vasopressor Therapy: A Systematic Review." Shock 43, no. 6 (2015): 530-539.[3] Vincent, Jean-Louis, and Mervyn Singer. "Critical care: advances and future perspectives." The Lancet 376.9749 (2010): 1354-1361.[4] Ospina-Tascón, Gustavo A., Gustavo Luiz Büchele, and Jean-Louis Vincent. "Multicenter, randomized, controlled trials evaluating mortality in intensive care: Doomed to fail?." Critical care medicine 36.4 (2008): 1311-1322.

Tasks:1. Short Term (5-10 hr) Need:

Predicts before a clinician would have given.

2. Imminent (< 4 hr) Need:Predict when a clinician would have given.

3. Weaning (< 4 hr): Predict when a doctor would have stopped.

Page 20: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

19

Define Clinically Actionable Prediction Tasks:

[1] Müllner, Marcus, Bernhard Urbanek, Christof Havel, Heidrun Losert, Gunnar Gamper, and Harald Herkner. "Vasopressors for shock." The Cochrane Library (2004).[2] D’Aragon, Frederick, Emilie P. Belley-Cote, Maureen O. Meade, François Lauzier, Neill KJ Adhikari, Matthias Briel, Manoj Lalu et al. "Blood Pressure Targets For Vasopressor Therapy: A Systematic Review." Shock 43, no. 6 (2015): 530-539.[3] Vincent, Jean-Louis, and Mervyn Singer. "Critical care: advances and future perspectives." The Lancet 376.9749 (2010): 1354-1361.[4] Ospina-Tascón, Gustavo A., Gustavo Luiz Büchele, and Jean-Louis Vincent. "Multicenter, randomized, controlled trials evaluating mortality in intensive care: Doomed to fail?." Critical care medicine 36.4 (2008): 1311-1322.

Tasks:1. Short Term (5-10 hr) Need:

Predicts before a clinician would have given.

2. Imminent (< 4 hr) Need:Predict when a clinician would have given.

3. Weaning (< 4 hr): Predict when a doctor would have stopped.

Page 21: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

20

Define Predictive Task

20

Observe Physiological Signals

Every HourPredict Onset of Drug

Before the Doctor?

Page 22: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

21

Previous Work

• Baseline 1: Prior work1 predicted vasopressor onset in ICU patients with pre-treatment (fluids).• 2 hour gap• 3 demographics and 22 signals• AUC of 0.79

[1] Fialho, A. S., et al. "Disease-based modeling to predict fluid response in intensive care units." Methods Inf Med 52.6 (2013): 494-502.* 2 hour gap, 22 derived/3 static features.

Page 23: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

22

Domain Knowledge: There Should Be Some Shared Underlying Physiological State• Z-score (standardize) and quantize time series data.

• Every xn,t,d is one of ten possible characters, -4:0:4 or NaN.

• Every xn,t is one of 10D possible words.

1 2 T Time (Hours)

Pat

ient

s 1

2

N

Variables

1

D xn,t,d ∈ [-4:0:4, NaN]

HRSpO2Resp

HRSpO2Resp

μHR, σHR

μSpO2, σSpO2

μResp, σResp

0

0

1 1

0 -1 -1 0

0… …

……

Page 24: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

23

Method: Switching State Autoregressive Model Representation (SSAM)

y=6 y=3y=6

2 1 -4 ... 2 2 -3 ... 2 2 -3 ...

y=1 y=1y=1

2 1 -1 ... 1 0 -1 ... 0 0 0 ...

y=3

3 4 -4 ...

• A patient n is a sequence of latent physiological states y.

• A physiological state y is a distribution over physiological words x.

Page 25: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

24

Extracting Latent Belief States from SSAM

• HMM sequence ynt on the signals xn

t

• xnt modeled by ; θ are governed

by ynt.

• Each state 1… k has distinct set of parameters {θd,k}, via K sets of tuples and D classifiers.

• Train θd;k to predict xnt(d)|xn

t-4:t-1.

• Update state sequences ytn given {θd,k}.

Page 26: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

25

Important for Application: Discrete State Space and Per-Variable Model of Missingness

• Use discrete state space.

• Model NaN (missing) as a valid emission.

• Cluster similar underlying states.

• For D variables and K latent states, perform inference iteratively:

1. Optimize parameters θd,k 2. Sample states ynt

0 1 1

y=6

0 1 1

θd=1,k=6, θd=2,k=6, θd=3,k=6

Forward Filtering, Backward Sampling

y=1

2 1 -1 ...

y=3

3 4 -4 ...

Page 27: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

26

Distribution of Values Per-Variable and Latent

2 2 -3 ...

y=1

0 1 -1 ... 2 0 - ... 2 - 0 ... 2 2 -2 ...

PAST

• Train parameters θd;k to predict xnt(d) given xn

t-4:t-1

xt

2 1 - ... 2 0 4 ... 0 1 0 ... 3 - -1 ...

xt-4 xt-3 xt-2 xt-1

CURRENT

3 - 2 ... 1 - - ... 0 - 3 ... 3 1 4 ...

y=K

1 0 -2 ...

2 - 0 ...

…-4 -3 NaN4

θHR;y=1 θBP;y=1

-4 -3 NaN4…

θRR;y=1

-4 -3 NaN4…

Page 28: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

27

Using SSAM Representation for Structured Prediction• SSAM states are learned in an unsupervised setting.

• Evaluate them in a supervised setting, on clinical tasks.

Page 29: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

28

Vasopressor Onset Prediction Beats State Of the Art Results

• Latent representations add predictive power.

• New state-of-the art prediction, 0.88 = thousands of people treated early!

[1] Fialho, A. S., et al. "Disease-based modeling to predict fluid response in intensive care units." Methods Inf Med 52.6 (2013): 494-502.* 2 hour gap, 22 derived/3 static features.

AUCBaseline 1 – Prior Work 0.79

Baseline 2 – Raw Data 0.83

SSAM Representations 0.83

Raw Data + SSAM Rep. 0.88

Page 30: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

29

Regularized Prediction Emphasizes Latent Belief State Representations

• Latent states are consistently significant across a large variable space.

L1 Regularized Logistic Regression WeightsFor Short-term Vasopressor Need

Odd

s R

atio

BMI

First SOFA AcuityScore

SSAM LATENT STATES

Large Variable Space

Page 31: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

30

Post-hoc Interpretability

•Investigate state associated with vasopressor onset?

•Low average values of blood oxygenation and bicarbonate.

•Highest lactate levels of any state.

30

Average Emission Values

Page 32: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

31

Similar Trends In Other Predictive Tasks

[1] D’Aragon, Frederick, Emilie P. Belley-Cote, Maureen O. Meade, François Lauzier, Neill KJ Adhikari, Matthias Briel, Manoj Lalu et al. "Blood Pressure Targets For Vasopressor Therapy: A Systematic Review." Shock 43, no. 6 (2015): 530-539.

Short-Term Need (Gapped AUC)

Imminent Need(Ungapped AUC)

Weaning

Baseline 1 – Prior Work 0.79 - -

Baseline 2 – Raw Data 0.83 0.89 0.67

SSAM Representations 0.83 0.87 0.63

Raw Data + SSAM Rep. 0.88 0.92 0.71

• Our representations are useful abstractions for multiple tasks.

Page 33: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

32

Similar Trends In Other Predictive Tasks

[1] D’Aragon, Frederick, Emilie P. Belley-Cote, Maureen O. Meade, François Lauzier, Neill KJ Adhikari, Matthias Briel, Manoj Lalu et al. "Blood Pressure Targets For Vasopressor Therapy: A Systematic Review." Shock 43, no. 6 (2015): 530-539.

• For the patients with vasopressors, we often predicted an early wean.

• Possible bias: patients can be on interventions longer than necessary.1

Short-Term Need (Gapped AUC)

Imminent Need(Ungapped AUC)

Weaning

Baseline 1 – Prior Work 0.79 - -

Baseline 2 – Raw Data 0.83 0.89 0.67

SSAM Representations 0.83 0.87 0.63

Raw Data + SSAM Rep. 0.88 0.92 0.71

Page 34: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

33

Finding Where We “Could” Wean Early?

• One example of a 62-year-old male patient with a cardiac catheterization.

• More complexity/higher misclassification penalty don’t solve this!

Page 35: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

34

Early Prediction Summary

•Defined three clinically actionable prediction tasks.

•Created a novel representation of multimodal clinical vitals that:•Model missingness.•Learns in unsupervised autoregressive manner.•Representation adds information to all tasks.

•Improved prior state-of-the art prediction = clinicians treat patients early!

•New work* focusing on new model formulations, predicting other tasks with learned states.

34* To be presented in the upcoming AMIA-CRI 2017 Meeting.

Page 36: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

35

Data Science in Health: Covering The Spectrum

1. Early Prediction of Actionable In-Patient ICU Interventions Ghassemi, M.*, Wu*, Feng, Celi, Szolovits, and Doshi-Velez. (JAMIA 2016); Ghassemi, M*, Wu*, Hughes*, Szolovits, and Doshi-Velez. (AMIA-CRI 2017)

2. Good Representations for Post-Discharge Outcome Prediction Ghassemi, M., Naumann, Doshi, Brimmer, Joshi, Rumshisky, and Szolovits. (KDD 2014); Ghassemi, M.*, Pimentel*, Naumann, Brennan, Clifton, Szolovits, and Feng. (AAAI 2015); Rumshisky, Ghassemi, M., Naumann, Szolovits, Castro, McCoy, and Perlis. (Translational Psychiatry

– Nature 2016)

3. Evidence-based Feedback in Wearnble Out-patient Devices Ghassemi, M., Van Stan, Mehta, Zañartu, Cheyne, Hillman, and Guttag. (IEEE TBME 2014); Ghassemi, M., Syed, Mehta, Van Stan, Hillman, and Guttag. (MLHC 2016/JMLR W&C V56)

Voice Cardio … Resp

Page 37: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

36

Part 1: We Want to Represent Patient Acuity Using Clinical Progress Notes

• Acuity (severity of illness) very important - but no universal def.

• Use mortality as a proxy for acuity.1

• Prior state-of-the-art focused on feature engineering in labs/vitals for target populations.2

• But clinicians rely on notes.

[1] Siontis, George CM, Ioanna Tzoulaki, and John PA Ioannidis. "Predicting death: an empirical evaluation of predictive tools for mortality." Archives of internal medicine 171.19 (2011): 1721-1726.[2] Grady, Deborah, and Seth A. Berkowitz. "Why is a good clinical prediction rule so hard to find?." Archives of internal medicine 171.19 (2011): 1701-1702.

Page 38: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

37

Clinical Notes Are Messy...

37

Patient Y, 12:45:00 EST

uneventful day. pt much improved. VS Stable nuero intact no compromise NSR BP stable Aline discontinued in afternoon. pt to transfer to floor awaiting bed. pt

continues with nausea given anziment and started on reglan prn. small emesis in am. pt continues with ice chips. foley draining well adequate output. now replacing

half cc for cc of urine. skin and surgical site unchanged, C/D/I. family (son and husband) at bedside for most of day. Plan: continue with current plan in progress,

tranfer to floor.

MISSPELLED

CONTEXT MATTERS

ACRONYM

Page 39: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

38

Represent Patients as Topic Vectors

• Model patient stays as an aggregated set of notes.

• Model notes as a distribution over topics.

• A “topic” is a distribution over words, that we learn.

• Use Latent Dirichlet Allocation (LDA)1 as an unsupervised way to abstract 473,000 notes from 19,000 patients into “topics”.2

[1] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022[2] T. Griffhs and M. Steyvers. Finding scientific topics.In PNAS, volume 101, pages 5228{5235, 2004

Patient is sick and disoriented; will require help to move around.

“Critical” “Confused”

Topic Vector: 70/30

Page 40: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

39

Correlation Between Average Topic Representation and Mortality

Pro

porti

onal

In-h

ospi

tal M

orta

lity

Topic # Top Ten Words Possible Topic

1 cabg, pain, ct, artery, coronary, valve, post, wires, chest, sp Cardiovascular surgery

Topic # Top Ten Words Possible Topic

15 intubated vent ett secretions propofol abg respiratory resp care sedated Respiratory failure

Page 41: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

40

Topic Representation Improves In-Hospital Mortality Prediction

• First to do forward-facing ICU mortality prediction with notes.

• Latent representations add predictive power.

• Topics enable accurately assess risk from notes.

Page 42: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

41

More Complex Models Following Our Work Haven’t Done Better

Caballero Barajas, Karla L., and Ram Akella. "Dynamically Modeling Patient's Health State from Electronic Medical Records: A Time Series Approach." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.Che, Zhengping, et al. "Deep computational phenotyping." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015. Che, Zhengping, et al. "Recurrent Neural Networks for Multivariate Time Series with Missing Values." arXiv preprint arXiv:1606.01865 (2016).

Author AUC Method Episodes Hours Variables

Ghassemi, 2014 0.84/0.85 LDA 19,308 24/48 53 - notes

Caballero, 2015 0.86 Text processing + medication

15,000 24 ? - notes/meds

Che, 2015 0.8-0.82 Deep Learning (LSTM) 3,940 48 30 - vitals

Che, 2016 0.7/0.85 Deep Learning (GRU) 19,714 12/48 99 – vitals/meds

More Complex ≠ Better

Page 43: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

42

Similar Trends in Extended Mortality Prediction

Page 44: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

43

Part 2: We Want To Add Information About Evolution of Signals• Learn a new latent representation to evaluate multi-dimensional

function similarity (θ).

MTGP models capture movements within and

between signals.

Transform signals into MTGP hyperparameter representation.

Compare patient similarly in the new representation.

Page 45: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

44

Learning Single Task Gaussian Processes (STGP)• Model each signal as a GP task with mean and covariance functions.

• GP’s commonly used to predict at new indices.

• Learn the parameters (θ) of the kernel from data.

Page 46: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

45

Learning Multi-Task Gaussian Processes (MTGP) As Representation

• Use an MTGP representation to relate m inputs through Kt and Kc.

[1] Bonilla, Edwin V., Kian M. Chai, and Christopher Williams. "Multi-task Gaussian process prediction." Advances in neural information processing systems. 2007.[2] Carl Rasmussen’s minimize.m was used for gradient-based optimization of the marginal likelihood.

Movement within a signalMovement between signals

Page 47: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

46

MTGP Representations Improve Signal Forecasting and Outcome Prediction

• MTGPs outperform STGPs in signal reconstruction.

• Automatically estimate cerebrovascular autoregulation.

* Final cohort consisted of 10,202 patients, with 313,461 notes.

Case Study 1: Performance on Signal Forecasting

Case Study 2: Performance on Mortality Prediction

• MTGP hyperparameter representations improve short-term mortality prediction.

Page 48: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

47

Post-Discharge Representation Summary

•Used topics as an interpretable representation of clinical notes.

•In this case, model complexity hasn’t improved results (so far).

•Demonstrated that MTGP kernels can be used as useful multidimensional similarity functions in noisy, sparse data.

•Defined state-of-the-art prediction results on forward-facing ICU and post-discharge mortality.

47

Page 49: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

48

Data Science in Health: Covering The Spectrum

1. Early Prediction of Actionable In-Patient ICU Interventions Ghassemi, M.*, Wu*, Feng, Celi, Szolovits, and Doshi-Velez. (JAMIA 2016); Ghassemi, M*, Wu*, Hughes*, Szolovits, and Doshi-Velez. (AMIA-CRI 2017)

2. Good Representations for Post-Discharge Outcome Prediction Ghassemi, M., Naumann, Doshi, Brimmer, Joshi, Rumshisky, and Szolovits. (KDD 2014); Ghassemi, M.*, Pimentel*, Naumann, Brennan, Clifton, Szolovits, and Feng. (AAAI 2015); Rumshisky, Ghassemi, M., Naumann, Szolovits, Castro, McCoy, and Perlis. (Translational Psychiatry

– Nature 2016)

3. Evidence-based Feedback in Wearable Out-patient Devices Ghassemi, M., Van Stan, Mehta, Zañartu, Cheyne, Hillman, and Guttag. (IEEE TBME 2014); Ghassemi, M., Syed, Mehta, Van Stan, Hillman, and Guttag. (MLHC 2016/JMLR W&C V56)

Voice Cardio … Resp

Page 50: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

49

Want to Understanding Voice Disorders Non-invasively

• Voice disorders affect ~14M Americans.1

• No accepted definition of bad vocal behaviors.2

• In non-phonotraumatic patients, no pathology.

• No objective basis to evaluate the impact of voice therapy.

[1] Roy, Nelson, et al. "Voice disorders in the general population: prevalence, risk factors, and occupational impact." The Laryngoscope 115.11 (2005): 1988-1995.[2] Mehta, Daryush D., et al. "Using ambulatory voice monitoring to investigate common voice disorders: research update." Frontiers in bioengineering and biotechnology 3 (2015).

Page 51: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

50

Result 1: Detecting Pathology in Phonotraumatic Patients

• We are able to correctly detect windows that corresponded to patients with vocal pathology.

• Features being used to in biofeedback studies.1

50[1] Van Stan, Jarrad H., et al. "Integration of Motor Learning Principles Into Real-Time Ambulatory Voice Biofeedback and Example Implementation Via a Clinical Case Study With Vocal Fold Nodules." American Journal of Speech-Language Pathology (2017): 1-11.

Page 52: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

51

Result 2: Behavioral Screening Tool

• Representation can non-invasively detect behaviors.

• In this space, vocal therapy moves subjects toward “normal.”

• Unlike baselines, our symbolic representations consistently:• Separate the Pre-Therapy subjects from Normal subjects• Demonstrate Post-Therapy subjects move to Normal subjects

Voice disorders affect ~14 million working-aged Americans, but MTD patients are notoriously difficult to characterize. We want to

recognize characteristic classes of vocal fold activity in long-term time-series data, but lack appropriately labeled datasets.

Segment accelerometer data from patients and matched controls into 100M+ single glottal pulses (closure of the vocal folds). Cluster

segments into symbols, and use symbolic mismatch to uncover differences pre- and post-treatment.

Vector Acoustic Features (VAF) use statistical features over 5-minute windows, and Mean Acoustic Features (MAF) uses the feature-wise mean. Our symbolic features (SF) consistently separate the Pre-Tx

subjects from Normal, and put Post-Tx subjects with Normal, unlike VAF and MAF baselines.

Clustering of PreTx vs. Control subject/days.Clustering of PostTx vs. Control subject/days.

Page 53: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

52

Future Work

• New Dimensions of Representation Discovery

• Evidence-based Clinical Interventions

• New Opportunities in Wearable Monitoring

Page 54: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

53

Integrating All The ICU Data

53

Signals

Numerical

Narrative

Traditional

Nurse Note

Doc Note

Discharge Note

Doc Note

Path Note

00:00 12:00 24:00 36:00 48:00

Billing CodesDiagnoses

AgeGender

Risk Score

Misspelled Acronym-laden Copy-paste

Irregular SamplingSporadic

Spurious Data Missing Data

Biased

• Profile on BIDMC ICUs; expand to 300+ hospitals from Phillips ICU.

Page 55: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

54

New Dimensions of Representation Discovery

• Time-varying phenotype representation across time-scale/granularity.

• Improved modeling of multi-task inputs and outputs.

[ Xt = 0 ]

[ Xt = 1 , yt = 0 ]

[0 0] [0 1] [0 1]

[ Xt = n , yt = n-1 ]

[ S ]

LSTM layers Feedfwd layers

AU

C, 4

hr

gap

pred

icti

on

ventilation vasopressors

words + yt-1 + tabsX + yt-1 + tabswordswords + X

Page 56: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

55

Evidence-Based of Clinical Interventions

• Large heterogenous patient data approximates counterfactuals.

• Evaluate treatment effect; quantify current bias.… e.g. why more false positives for black patients?1

[1] Manrai, Arjun K., et al. "Genetic misdiagnoses and the potential for health disparities." New England Journal of Medicine 375.7 (2016): 655-665.

Page 57: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

56

Much More Happens Outside the Clinic!

56

Page 58: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

57

New Opportunities in Wearable Monitoring

• Most of life happens outside of the clinical environment.

• Can we detect harmful behaviors in sea of voice/glucose/etc. data?

• Gathering new data sets to estimate the impact of a targeted unconscious behavior while preserving other privacy.

•How you speak? (Vocal fry? Uptalk?)•How you eat? (Amount? Type?)•How you work? (Sitting? Socializing?)

Page 59: Training grant (NIH/NLM 2T15 LM007092-22) National Library ...€¦ · We can use machine learning to derive insight from health data! [1] "Big Data in Health Care". The National

58

Summary of Contributions

Proved We Can Make Earlier Prediction of Treatment•Modeling physiological states and words.•Latent belief states create information gain.•Beats SOA in early treatment prediction.

Demonstrated the Power of Representation in Post-Discharge Outcomes

• Interpretable representative for clinician notes.• MTGPs kernels model movement between/within signals.• Best performance on multiple forward-facing tasks.

Created Non-Invasive Evidence For Bio-Feedback Treatment•Predicting harmful pathologies. •Detecting harmful vocal patterns. •Understanding what moves subjects towards “normal.”

Voice Cardio … Resp