exploring healthcare strategies by deep reinforcement …[9] raghu, aniruddh, et al. "deep...
TRANSCRIPT
Exploring Healthcare Strategies by Deep Reinforcement Learning
Lecturer: Yinglong Dai
Organization: Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University
AEIT 2019, 9th International Workshop on Assistive Engineering and Information Technology, Guangzhou University, Guangzhou, China
Outline
• INTRODUCTION
• FRAMEWORK
• BODY MODELING
• TREATMENT EXPLORING
• FUTURE WORK
• CONCLUSION
Outline
• INTRODUCTION
• FRAMEWORK
• BODY MODELING
• TREATMENT EXPLORING
• FUTURE WORK
• CONCLUSION
Deep Learning in Healthcare
• Deep learning can featurize and learn from a variety of data types
[1] Esteva, Andre, et al. "A guide to deep learning in healthcare." Nature medicine 25.1 (2019): 24.
Deep Learning Aided Diagnosis
[2] Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in
Retinal Fundus Photographs. JAMA, 2016, 316(22):2402-2410.
[3] Esteva A, Kuprel B, Novoa R A, et al. Corrigendum: Dermatologist-level classification of skin cancer with deep neural networks. Nature,
2017, 542(7639):115-118.
[4] Hannun A Y, Rajpurkar P, Haghpanahi M, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms
using a deep neural network. Nature Medicine, 2019, 25:65-69.
[5] Gurovich, Yaron, et al. "Identifying facial phenotypes of genetic disorders using deep learning." Nature Medicine 25 (2019): 60.
……
• Gulshan et al. [2] validated a deep learning algorithm that could perform comparable to ophthalmologist.
• Esteva et al. [3] declared that their convolutional neural networks (CNNs) achieved performance on par with dermatologists.
• Rajpurkar et al. [4] trained a deep CNN which reached the cardiologist-level.
• Gurovich et al. [5] used deep learning to identify facial phenotypes of genetic disorders
• ……
• Gulshan et al. [2] validated a deep learning algorithm that could perform comparable to ophthalmologist.
• Esteva et al. [3] declared that their CNNs achieved performance on par with dermatologists.
• Rajpurkar et al. [4] trained a deep CNN which reached the cardiologist-level.
• Gurovich et al. [5] used deep learning to identify facial phenotypes of genetic disorders
• ……[2] Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in
Retinal Fundus Photographs. JAMA, 2016, 316(22):2402-2410.
[3] Esteva A, Kuprel B, Novoa R A, et al. Corrigendum: Dermatologist-level classification of skin cancer with deep neural networks. Nature,
2017, 542(7639):115-118.
[4] Hannun A Y, Rajpurkar P, Haghpanahi M, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms
using a deep neural network. Nature Medicine, 2019, 25:65-69.
[5] Gurovich, Yaron, et al. "Identifying facial phenotypes of genetic disorders using deep learning." Nature Medicine 25 (2019): 60.
……
Deep Learning Aided Diagnosis
All the tasks focus only on static data analysis!
From Observation to Intervention
• End-to-end?
[6] Hu, Yang, et al. "Automatic Construction of Chinese Herbal Prescriptions From Tongue Images Using CNNs and Auxiliary Latent
Therapy Topics." IEEE Transactions on Cybernetics (2019).
This is still static data mappingNot a treatment process
So it is just fitting the labelled data, without exploration.
Dynamic Healthcare Process
A closed-loop system
• Deliver insulin according to real-time glucose levels
• Three components: • a continuous glucose sensor, • an insulin pump, • and a control algorithm.
[7] Hovorka, Roman. "Closed-loop insulin delivery: from bench to clinical practice." Nature Reviews Endocrinology 7.7 (2011): 385.
Deep Reinforcement Learning (DRL)
• Deep Q-Network (DQN) [5]
[8] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
Deep Reinforcement Learning in Medicine
• Taking into account not only the immediate effect of treatment, but also the long-term benefit to the patient.• Learn treatment policies for sepsis [9]• Optimization of dynamic treatment recommendation [10]• Optimization of critical care pain management with morphine [11]
• Obstacles:• Exploratory treatment strategies to patients is forbidden.• How to define the reward.
[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017).
[10] Wang, Lu, et al. "Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation." Proceedings
of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018.
[11] Lopez-Martinez, Daniel, et al. "Deep Reinforcement Learning for Optimal Critical Care Pain Management with Morphine using Dueling
Double-Deep Q Networks." arXiv preprint arXiv:1904.11115 (2019).
Outline
• INTRODUCTION
• FRAMEWORK
• BODY MODELING
• TREATMENT EXPLORING
• FUTURE WORK
• CONCLUSION
Doctor AI [12]
• A generic predictive model that covers observed medical conditions and medication uses.
• A temporal model using recurrent neural networks (RNN) and was developed and applied to longitudinal time stamped EHR data from 260K patients and 2,128 physicians over 8 years.
[12] Choi, Edward, et al. "Doctor AI: Predicting clinical events via recurrent neural networks." Machine Learning for Healthcare
Conference. 2016.
Sequential observed medical conditions
Sequential medication uses
Latent states
DeepCare [13]
• RNN (LSTM) with attention mechanism
[13] Pham, Trang, et al. "Predicting healthcare trajectories from medical records: A deep learning approach." Journal of Biomedical
Informatics 69 (2017): 218-229.
Deepr[14]
• Convolutional neural network (CNN) architecture
[14] P. Nguyen, T. Tran, N. Wickramasinghe and S. Venkatesh, Deepr: A Convolutional Net for Medical Records," IEEE Journal of
Biomedical and Health Informatics, 21.1, (2017): 22-30.
Deepr[14]
• Convolutional neural network (CNN) architecture
[14] P. Nguyen, T. Tran, N. Wickramasinghe and S. Venkatesh, Deepr: A Convolutional Net for Medical Records," IEEE Journal of
Biomedical and Health Informatics, 21.1, (2017): 22-30.
The frameworks fulfill the sequential prediction, but no exploration for healthcare.
Healthcare Process Abstraction
DiagnosisTreatment
Body
Observations
Health
state
Interventions
Can we use DNNs to approximate the functions, h(•), f(•), and g(•)?
A Deep Inference Learning Framework [15]
• Three modules
[15] Dai, Yinglong, and Guojun Wang. "A deep inference learning framework for healthcare." Pattern Recognition Letters (2018).
f(•) g(•)
A Deep Reinforcement Learning Framework
• Fuse the diagnosis and treatment planning into one module
[16] Dai, Yinglong, et al. "Using deep neural networks to simulate human body." 2017 IEEE International Symposium on Parallel and
Distributed Processing with Applications (ISPA). IEEE, 2017.
DRL Treatment Module
Body Simulation Module
ObservationsIntervention
g( f(•) )
h(•)
x y
Outline
• INTRODUCTION
• FRAMEWORK
• BODY MODELING
• TREATMENT EXPLORING
• FUTURE WORK
• CONCLUSION
Deep Patient
• Miotto et al. [17] proposed to learn deep patient representations from the EHRs
[17] Miotto, Riccardo, et al. "Deep patient: an unsupervised representation to predict the future of patients from the electronic health
records." Scientific Reports 6 (2016): 26094.
Patient Sequence Model
• Utomo et al. present a patient sequence model by defining state space, action space, and reward function.• State used qSOFA variables
• ABP systolic, respiratory rate, and Glasgow coma scale.
• Action used vasopressor variables• Epinephrine, dopamine, and phenylephrine
• Reward• 𝑅 = +1000, for terminal state and the patient is survived
• 𝑅 = −1000, for terminal state and the patient is dead
• 𝑅 = −1, for any non-terminal state
[18] Utomo, Chandra Prasetyo, Xue Li, and Weitong Chen. "Treatment recommendation in critical care: A scalable and interpretable
approach in partially observable health states." International Conference on Information Systems, 2018.
A DNN-based Body Simulator Framework
• Fit the dynamic system characteristics of input and output
. . .
. . .. . .
. . .. . .
. . .. . .
. . .. . .
Intervention
x
Observation
y
Regulating
Network
r(x)
Decoding
Network
d(h)
Health State Layer
h
State
Unit
State
Unit
[19] Dai, Yinglong, Xiangyong Liu, and Guojun Wang. "A Body Simulator with Delayed Health State Transition." IEEE Ubiquitous
Intelligence & Computing (UIC). IEEE, 2018.
A DNN-based Body Simulator Framework
• Fit the dynamic system characteristics of input and output
. . .
. . .. . .
. . .. . .
. . .. . .
. . .. . .
Intervention
x
Observation
y
Regulating
Network
r(x)
Decoding
Network
d(h)
Health State Layer
h
State
Unit
State
Unit
[19] Dai, Yinglong, Xiangyong Liu, and Guojun Wang. "A Body Simulator with Delayed Health State Transition." IEEE Ubiquitous
Intelligence & Computing (UIC). IEEE, 2018.
This framework can model high-dimensional state spaces and action spaces.
It can also emit high-dimensional observations that simulate partially observable MDP.
Regulating Network
• Assume objective health state is sobj, current state is scur, then the healthcare strategy is to solve
• However, it has no solution sometimes. E.g. • for a simple linear layer,
• It has no solution when rank(W) > rank([W, (∆s - b)]) for any intervention x.
• Whatever, we can find an approximate solution approaching the objective health state.
1( ), where obj curf x s s s s
s Wx b
( ) Wx s b
Decoding Network
• Representation space of decoding network
xi
xn
...
x1
... h1(1)
hk1(1)
...
h1(j)
hkj(j)
...
Input layer Output layer
Hidden layer 1
...
h1(l)
hkl(l)
... yi
yn
y1
Hidden layer l
......
...
2
21
1
2
n
i i
i
R y xn
2
21
1
2
m
j j
j
C h lm
J R C
Conceptual alignment deep autoencoder (CADAE)
[20] Dai, Yinglong, and Guojun Wang. "Analyzing tongue images using a conceptual alignment deep autoencoder." IEEE Access 6 (2018):
5962-5972.
Generative Adversarial Networks (GANs)
• The training cost more time to converge
• The generated images of GAN are uncontrollable
Outline
• INTRODUCTION
• FRAMEWORK
• BODY MODELING
• TREATMENT EXPLORING
• FUTURE WORK
• CONCLUSION
Basic Task: Intervention Evaluation
• Estimate the state transition probability given an action (MDP)
• For strategy planning, we need to consider a long term effect of an action.
111 112 121 122
211 212 221 222
( , ) ( , )
( , ) ( , )
p p p p
p p p p
P
Optimal Medication Dosing
• Non-optimal medical treatments can lead to unnecessary risks to patients, extend duration stays, or waste hospital resources
• Unfractionated Heparin (UH)• If over-dosed, increased risk of
bleeding;• If under-dosed, increased risk
of clot formation.
[21] Nemati, Shamim, Mohammad M. Ghassemi, and Gari D. Clifford. "Optimal medication dosing from suboptimal clinical examples: A
deep reinforcement learning approach." Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(EMBC). IEEE, 2016.
HMM
DQN
Dynamic Treatment Regime
• Combine the benefits of supervised learning (exploitation) and reinforcement learning (exploration).
[22] Wang, Lu, et al. "Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation."
Proceedings of the 24th International Conference on Knowledge Discovery & Data Mining (KDD). ACM, 2018.
Clinician-in-the-loop framework
• Adjusting IV heparin dose using deep reinforcement learning
[23] Lin, Rongmei, et al. "A Deep Deterministic Policy Gradient Approach to Medication Dosing and Surveillance in the ICU." Annual
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2018.
A Simulation Framework
• Explore effective healthcarestrategies by using DRL onbody simulator.
• Advantages:• Quick feedbacks• More explorations
[16] Dai, Yinglong, et al. "Using deep neural networks to simulate human body." 2017 IEEE International Symposium on Parallel and
Distributed Processing with Applications (ISPA). IEEE, 2017.
Discrete Action Space
• The value function architecture of DQN
Continuous Action Space
• The actor-critic architecture of DDPG
Experimental setup
• Body simulator• Input: Random, 20 dimensional continuous action space• Latent health state: BC types, 9 dimensional continuous state space• Output: Tongue images, 32×32×3 pixels
Experimental setup
• DRL treatment module• Input: Tongue images, 32×32×3 pixels• Output: Random, 20 dimensional continuous action space
Computational Results
• Objective state:• [1,0,0,0,0,0,0,0,0]
Computational Results
• Tests for different scale architectures of regulating networks
Harder to find an optimal strategy as
the hidden layer scale becomes larger
(model complexity)
More likely to find an optimal strategy
as the input layer scale becomes larger
(available interventions)
Outline
• INTRODUCTION
• FRAMEWORK
• BODY SIMULATOR
• TREATMENT EXPLORING
• FUTURE WORK
• CONCLUSION
Multimodal data simulation
• An instance will emit multimodal data• Image modality • Text modality• Audio modality• Sensor signal modality
Make the DRL algorithm more stable
• Reduce the gap betweenthe diagnostic health stateand the latent health state
• DRL algorithms suffer theserious problem of unstable training process
Outline
• INTRODUCTION
• FRAMEWORK
• BODY SIMULATOR
• TREATMENT EXPLORING
• FUTURE WORK
• CONCLUSION
Conclusion
• Human body system is an extremely complex system, deep reinforcement learning will be a promising approach to explore the healthcare strategies.
• Simulation framework can facilitate the research of closed-loop healthcare process to discover optimal interventions or strategies.
• We believe that deep reinforcement learning will promote the efficiency and effectiveness in the healthcare loop.
Thanks for all your attention!