exploring healthcare strategies by deep reinforcement …[9] raghu, aniruddh, et al. "deep...

Exploring Healthcare Strategies by Deep Reinforcement Learning

Lecturer: Yinglong Dai

Organization: Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University

AEIT 2019, 9th International Workshop on Assistive Engineering and Information Technology, Guangzhou University, Guangzhou, China

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING

• TREATMENT EXPLORING

• FUTURE WORK

• CONCLUSION

Deep Learning in Healthcare

• Deep learning can featurize and learn from a variety of data types

[1] Esteva, Andre, et al. "A guide to deep learning in healthcare." Nature medicine 25.1 (2019): 24.

Deep Learning Aided Diagnosis

[2] Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in

Retinal Fundus Photographs. JAMA, 2016, 316(22):2402-2410.

[3] Esteva A, Kuprel B, Novoa R A, et al. Corrigendum: Dermatologist-level classification of skin cancer with deep neural networks. Nature,

2017, 542(7639):115-118.

[4] Hannun A Y, Rajpurkar P, Haghpanahi M, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms

using a deep neural network. Nature Medicine, 2019, 25:65-69.

[5] Gurovich, Yaron, et al. "Identifying facial phenotypes of genetic disorders using deep learning." Nature Medicine 25 (2019): 60.

……

• Gulshan et al. [2] validated a deep learning algorithm that could perform comparable to ophthalmologist.

• Esteva et al. [3] declared that their convolutional neural networks (CNNs) achieved performance on par with dermatologists.

• Rajpurkar et al. [4] trained a deep CNN which reached the cardiologist-level.

• Gurovich et al. [5] used deep learning to identify facial phenotypes of genetic disorders

• ……

• Gulshan et al. [2] validated a deep learning algorithm that could perform comparable to ophthalmologist.

• Esteva et al. [3] declared that their CNNs achieved performance on par with dermatologists.

• Rajpurkar et al. [4] trained a deep CNN which reached the cardiologist-level.

• Gurovich et al. [5] used deep learning to identify facial phenotypes of genetic disorders

• ……[2] Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in

Retinal Fundus Photographs. JAMA, 2016, 316(22):2402-2410.

[3] Esteva A, Kuprel B, Novoa R A, et al. Corrigendum: Dermatologist-level classification of skin cancer with deep neural networks. Nature,

2017, 542(7639):115-118.

[4] Hannun A Y, Rajpurkar P, Haghpanahi M, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms

using a deep neural network. Nature Medicine, 2019, 25:65-69.

[5] Gurovich, Yaron, et al. "Identifying facial phenotypes of genetic disorders using deep learning." Nature Medicine 25 (2019): 60.

……

Deep Learning Aided Diagnosis

All the tasks focus only on static data analysis！

From Observation to Intervention

• End-to-end?

[6] Hu, Yang, et al. "Automatic Construction of Chinese Herbal Prescriptions From Tongue Images Using CNNs and Auxiliary Latent

Therapy Topics." IEEE Transactions on Cybernetics (2019).

This is still static data mappingNot a treatment process

So it is just fitting the labelled data, without exploration.

Dynamic Healthcare Process

A closed-loop system

• Deliver insulin according to real-time glucose levels

• Three components: • a continuous glucose sensor, • an insulin pump, • and a control algorithm.

[7] Hovorka, Roman. "Closed-loop insulin delivery: from bench to clinical practice." Nature Reviews Endocrinology 7.7 (2011): 385.

Deep Reinforcement Learning (DRL)

• Deep Q-Network (DQN) [5]

[8] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.

Deep Reinforcement Learning in Medicine

• Taking into account not only the immediate effect of treatment, but also the long-term benefit to the patient.• Learn treatment policies for sepsis [9]• Optimization of dynamic treatment recommendation [10]• Optimization of critical care pain management with morphine [11]

• Obstacles:• Exploratory treatment strategies to patients is forbidden.• How to define the reward.

[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017).

[10] Wang, Lu, et al. "Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation." Proceedings

of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018.

[11] Lopez-Martinez, Daniel, et al. "Deep Reinforcement Learning for Optimal Critical Care Pain Management with Morphine using Dueling

Double-Deep Q Networks." arXiv preprint arXiv:1904.11115 (2019).

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING


• FUTURE WORK

• CONCLUSION

Doctor AI [12]

• A generic predictive model that covers observed medical conditions and medication uses.

• A temporal model using recurrent neural networks (RNN) and was developed and applied to longitudinal time stamped EHR data from 260K patients and 2,128 physicians over 8 years.

[12] Choi, Edward, et al. "Doctor AI: Predicting clinical events via recurrent neural networks." Machine Learning for Healthcare

Conference. 2016.

Sequential observed medical conditions

Sequential medication uses

Latent states

DeepCare [13]

• RNN (LSTM) with attention mechanism

[13] Pham, Trang, et al. "Predicting healthcare trajectories from medical records: A deep learning approach." Journal of Biomedical

Informatics 69 (2017): 218-229.

Deepr[14]

• Convolutional neural network (CNN) architecture

[14] P. Nguyen, T. Tran, N. Wickramasinghe and S. Venkatesh, Deepr: A Convolutional Net for Medical Records," IEEE Journal of

Biomedical and Health Informatics, 21.1, (2017): 22-30.

Deepr[14]

• Convolutional neural network (CNN) architecture

[14] P. Nguyen, T. Tran, N. Wickramasinghe and S. Venkatesh, Deepr: A Convolutional Net for Medical Records," IEEE Journal of

Biomedical and Health Informatics, 21.1, (2017): 22-30.

The frameworks fulfill the sequential prediction, but no exploration for healthcare.

Healthcare Process Abstraction

DiagnosisTreatment

Body

Observations

Health

state

Interventions

Can we use DNNs to approximate the functions, h(•), f(•), and g(•)?

A Deep Inference Learning Framework [15]

• Three modules

[15] Dai, Yinglong, and Guojun Wang. "A deep inference learning framework for healthcare." Pattern Recognition Letters (2018).

f(•) g(•)

A Deep Reinforcement Learning Framework

• Fuse the diagnosis and treatment planning into one module

[16] Dai, Yinglong, et al. "Using deep neural networks to simulate human body." 2017 IEEE International Symposium on Parallel and

Distributed Processing with Applications (ISPA). IEEE, 2017.

DRL Treatment Module

Body Simulation Module

ObservationsIntervention

g( f(•) )

h(•)

x y

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING


• FUTURE WORK

• CONCLUSION

Deep Patient

• Miotto et al. [17] proposed to learn deep patient representations from the EHRs

[17] Miotto, Riccardo, et al. "Deep patient: an unsupervised representation to predict the future of patients from the electronic health

records." Scientific Reports 6 (2016): 26094.

Patient Sequence Model

• Utomo et al. present a patient sequence model by defining state space, action space, and reward function.• State used qSOFA variables

• ABP systolic, respiratory rate, and Glasgow coma scale.

• Action used vasopressor variables• Epinephrine, dopamine, and phenylephrine

• Reward• 𝑅 = +1000, for terminal state and the patient is survived

• 𝑅 = −1000, for terminal state and the patient is dead

• 𝑅 = −1, for any non-terminal state

[18] Utomo, Chandra Prasetyo, Xue Li, and Weitong Chen. "Treatment recommendation in critical care: A scalable and interpretable

approach in partially observable health states." International Conference on Information Systems, 2018.

A DNN-based Body Simulator Framework

• Fit the dynamic system characteristics of input and output

. . .

. . .. . .

. . .. . .

. . .. . .

. . .. . .

Intervention

x

Observation

y

Regulating

Network

r(x)

Decoding

Network

d(h)

Health State Layer

h

State

Unit

State

Unit

[19] Dai, Yinglong, Xiangyong Liu, and Guojun Wang. "A Body Simulator with Delayed Health State Transition." IEEE Ubiquitous

Intelligence & Computing (UIC). IEEE, 2018.

A DNN-based Body Simulator Framework

• Fit the dynamic system characteristics of input and output

. . .

. . .. . .

. . .. . .

. . .. . .

. . .. . .

Intervention

x

Observation

y

Regulating

Network

r(x)

Decoding

Network

d(h)

Health State Layer

h

State

Unit

State

Unit

[19] Dai, Yinglong, Xiangyong Liu, and Guojun Wang. "A Body Simulator with Delayed Health State Transition." IEEE Ubiquitous

Intelligence & Computing (UIC). IEEE, 2018.

This framework can model high-dimensional state spaces and action spaces.

It can also emit high-dimensional observations that simulate partially observable MDP.

Regulating Network

• Assume objective health state is sobj, current state is scur, then the healthcare strategy is to solve

• However, it has no solution sometimes. E.g. • for a simple linear layer,

• It has no solution when rank(W) > rank([W, (∆s - b)]) for any intervention x.

• Whatever, we can find an approximate solution approaching the objective health state.

1( ), where obj curf x s s s s

s Wx b

( ) Wx s b

Decoding Network

• Representation space of decoding network

xi

xn

...

x1

... h1(1)

hk1(1)

...

h1(j)

hkj(j)

...

Input layer Output layer

Hidden layer 1

...

h1(l)

hkl(l)

... yi

yn

y1

Hidden layer l

......

...

2

21

1

2

n

i i

i

R y xn

2

21

1

2

m

j j

j

C h lm

J R C

Conceptual alignment deep autoencoder (CADAE)

[20] Dai, Yinglong, and Guojun Wang. "Analyzing tongue images using a conceptual alignment deep autoencoder." IEEE Access 6 (2018):

5962-5972.

Generative Adversarial Networks (GANs)

• The training cost more time to converge

• The generated images of GAN are uncontrollable

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING


• FUTURE WORK

• CONCLUSION

Basic Task: Intervention Evaluation

• Estimate the state transition probability given an action (MDP)

• For strategy planning, we need to consider a long term effect of an action.

111 112 121 122

211 212 221 222

( , ) ( , )

( , ) ( , )

p p p p

p p p p

P

Optimal Medication Dosing

• Non-optimal medical treatments can lead to unnecessary risks to patients, extend duration stays, or waste hospital resources

• Unfractionated Heparin (UH)• If over-dosed, increased risk of

bleeding;• If under-dosed, increased risk

of clot formation.

[21] Nemati, Shamim, Mohammad M. Ghassemi, and Gari D. Clifford. "Optimal medication dosing from suboptimal clinical examples: A

deep reinforcement learning approach." Annual International Conference of the IEEE Engineering in Medicine and Biology Society

(EMBC). IEEE, 2016.

HMM

DQN

Dynamic Treatment Regime

• Combine the benefits of supervised learning (exploitation) and reinforcement learning (exploration).

[22] Wang, Lu, et al. "Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation."

Proceedings of the 24th International Conference on Knowledge Discovery & Data Mining (KDD). ACM, 2018.

Clinician-in-the-loop framework

• Adjusting IV heparin dose using deep reinforcement learning

[23] Lin, Rongmei, et al. "A Deep Deterministic Policy Gradient Approach to Medication Dosing and Surveillance in the ICU." Annual

International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2018.

A Simulation Framework

• Explore effective healthcarestrategies by using DRL onbody simulator.

• Advantages:• Quick feedbacks• More explorations

[16] Dai, Yinglong, et al. "Using deep neural networks to simulate human body." 2017 IEEE International Symposium on Parallel and

Distributed Processing with Applications (ISPA). IEEE, 2017.

Discrete Action Space

• The value function architecture of DQN

Continuous Action Space

• The actor-critic architecture of DDPG

Experimental setup

• Body simulator• Input: Random, 20 dimensional continuous action space• Latent health state: BC types, 9 dimensional continuous state space• Output: Tongue images, 32×32×3 pixels

Experimental setup

• DRL treatment module• Input: Tongue images, 32×32×3 pixels• Output: Random, 20 dimensional continuous action space

Computational Results

• Objective state:• [1,0,0,0,0,0,0,0,0]

Computational Results

• Tests for different scale architectures of regulating networks

Harder to find an optimal strategy as

the hidden layer scale becomes larger

(model complexity)

More likely to find an optimal strategy

as the input layer scale becomes larger

(available interventions)

Outline

• INTRODUCTION

• FRAMEWORK

• BODY SIMULATOR


• FUTURE WORK

• CONCLUSION

Multimodal data simulation

• An instance will emit multimodal data• Image modality • Text modality• Audio modality• Sensor signal modality

Make the DRL algorithm more stable

• Reduce the gap betweenthe diagnostic health stateand the latent health state

• DRL algorithms suffer theserious problem of unstable training process

Outline

• INTRODUCTION

• FRAMEWORK

• BODY SIMULATOR


• FUTURE WORK

• CONCLUSION

Conclusion

• Human body system is an extremely complex system, deep reinforcement learning will be a promising approach to explore the healthcare strategies.

• Simulation framework can facilitate the research of closed-loop healthcare process to discover optimal interventions or strategies.

• We believe that deep reinforcement learning will promote the efficiency and effectiveness in the healthcare loop.

Thanks for all your attention!

exploring healthcare strategies by deep reinforcement …[9] raghu, aniruddh, et al. "deep...

Documents