exploring healthcare strategies by deep reinforcement …[9] raghu, aniruddh, et al. "deep...

45
Exploring Healthcare Strategies by Deep Reinforcement Learning Lecturer: Yinglong Dai Organization: Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University AEIT 2019, 9th International Workshop on Assistive Engineering and Information Technology, Guangzhou University, Guangzhou, China

Upload: others

Post on 21-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Exploring Healthcare Strategies by Deep Reinforcement Learning

Lecturer: Yinglong Dai

Organization: Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University

AEIT 2019, 9th International Workshop on Assistive Engineering and Information Technology, Guangzhou University, Guangzhou, China

Page 2: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING

• TREATMENT EXPLORING

• FUTURE WORK

• CONCLUSION

Page 3: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING

• TREATMENT EXPLORING

• FUTURE WORK

• CONCLUSION

Page 4: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Deep Learning in Healthcare

• Deep learning can featurize and learn from a variety of data types

[1] Esteva, Andre, et al. "A guide to deep learning in healthcare." Nature medicine 25.1 (2019): 24.

Page 5: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Deep Learning Aided Diagnosis

[2] Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in

Retinal Fundus Photographs. JAMA, 2016, 316(22):2402-2410.

[3] Esteva A, Kuprel B, Novoa R A, et al. Corrigendum: Dermatologist-level classification of skin cancer with deep neural networks. Nature,

2017, 542(7639):115-118.

[4] Hannun A Y, Rajpurkar P, Haghpanahi M, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms

using a deep neural network. Nature Medicine, 2019, 25:65-69.

[5] Gurovich, Yaron, et al. "Identifying facial phenotypes of genetic disorders using deep learning." Nature Medicine 25 (2019): 60.

……

• Gulshan et al. [2] validated a deep learning algorithm that could perform comparable to ophthalmologist.

• Esteva et al. [3] declared that their convolutional neural networks (CNNs) achieved performance on par with dermatologists.

• Rajpurkar et al. [4] trained a deep CNN which reached the cardiologist-level.

• Gurovich et al. [5] used deep learning to identify facial phenotypes of genetic disorders

• ……

Page 6: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

• Gulshan et al. [2] validated a deep learning algorithm that could perform comparable to ophthalmologist.

• Esteva et al. [3] declared that their CNNs achieved performance on par with dermatologists.

• Rajpurkar et al. [4] trained a deep CNN which reached the cardiologist-level.

• Gurovich et al. [5] used deep learning to identify facial phenotypes of genetic disorders

• ……[2] Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in

Retinal Fundus Photographs. JAMA, 2016, 316(22):2402-2410.

[3] Esteva A, Kuprel B, Novoa R A, et al. Corrigendum: Dermatologist-level classification of skin cancer with deep neural networks. Nature,

2017, 542(7639):115-118.

[4] Hannun A Y, Rajpurkar P, Haghpanahi M, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms

using a deep neural network. Nature Medicine, 2019, 25:65-69.

[5] Gurovich, Yaron, et al. "Identifying facial phenotypes of genetic disorders using deep learning." Nature Medicine 25 (2019): 60.

……

Deep Learning Aided Diagnosis

All the tasks focus only on static data analysis!

Page 7: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

From Observation to Intervention

• End-to-end?

[6] Hu, Yang, et al. "Automatic Construction of Chinese Herbal Prescriptions From Tongue Images Using CNNs and Auxiliary Latent

Therapy Topics." IEEE Transactions on Cybernetics (2019).

This is still static data mappingNot a treatment process

So it is just fitting the labelled data, without exploration.

Page 8: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Dynamic Healthcare Process

Page 9: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

A closed-loop system

• Deliver insulin according to real-time glucose levels

• Three components: • a continuous glucose sensor, • an insulin pump, • and a control algorithm.

[7] Hovorka, Roman. "Closed-loop insulin delivery: from bench to clinical practice." Nature Reviews Endocrinology 7.7 (2011): 385.

Page 10: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Deep Reinforcement Learning (DRL)

• Deep Q-Network (DQN) [5]

[8] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.

Page 11: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Deep Reinforcement Learning in Medicine

• Taking into account not only the immediate effect of treatment, but also the long-term benefit to the patient.• Learn treatment policies for sepsis [9]• Optimization of dynamic treatment recommendation [10]• Optimization of critical care pain management with morphine [11]

• Obstacles:• Exploratory treatment strategies to patients is forbidden.• How to define the reward.

[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017).

[10] Wang, Lu, et al. "Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation." Proceedings

of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018.

[11] Lopez-Martinez, Daniel, et al. "Deep Reinforcement Learning for Optimal Critical Care Pain Management with Morphine using Dueling

Double-Deep Q Networks." arXiv preprint arXiv:1904.11115 (2019).

Page 12: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING

• TREATMENT EXPLORING

• FUTURE WORK

• CONCLUSION

Page 13: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Doctor AI [12]

• A generic predictive model that covers observed medical conditions and medication uses.

• A temporal model using recurrent neural networks (RNN) and was developed and applied to longitudinal time stamped EHR data from 260K patients and 2,128 physicians over 8 years.

[12] Choi, Edward, et al. "Doctor AI: Predicting clinical events via recurrent neural networks." Machine Learning for Healthcare

Conference. 2016.

Sequential observed medical conditions

Sequential medication uses

Latent states

Page 14: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

DeepCare [13]

• RNN (LSTM) with attention mechanism

[13] Pham, Trang, et al. "Predicting healthcare trajectories from medical records: A deep learning approach." Journal of Biomedical

Informatics 69 (2017): 218-229.

Page 15: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Deepr[14]

• Convolutional neural network (CNN) architecture

[14] P. Nguyen, T. Tran, N. Wickramasinghe and S. Venkatesh, Deepr: A Convolutional Net for Medical Records," IEEE Journal of

Biomedical and Health Informatics, 21.1, (2017): 22-30.

Page 16: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Deepr[14]

• Convolutional neural network (CNN) architecture

[14] P. Nguyen, T. Tran, N. Wickramasinghe and S. Venkatesh, Deepr: A Convolutional Net for Medical Records," IEEE Journal of

Biomedical and Health Informatics, 21.1, (2017): 22-30.

The frameworks fulfill the sequential prediction, but no exploration for healthcare.

Page 17: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Healthcare Process Abstraction

DiagnosisTreatment

Body

Observations

Health

state

Interventions

Can we use DNNs to approximate the functions, h(•), f(•), and g(•)?

Page 18: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

A Deep Inference Learning Framework [15]

• Three modules

[15] Dai, Yinglong, and Guojun Wang. "A deep inference learning framework for healthcare." Pattern Recognition Letters (2018).

f(•) g(•)

Page 19: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

A Deep Reinforcement Learning Framework

• Fuse the diagnosis and treatment planning into one module

[16] Dai, Yinglong, et al. "Using deep neural networks to simulate human body." 2017 IEEE International Symposium on Parallel and

Distributed Processing with Applications (ISPA). IEEE, 2017.

DRL Treatment Module

Body Simulation Module

ObservationsIntervention

g( f(•) )

h(•)

x y

Page 20: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING

• TREATMENT EXPLORING

• FUTURE WORK

• CONCLUSION

Page 21: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Deep Patient

• Miotto et al. [17] proposed to learn deep patient representations from the EHRs

[17] Miotto, Riccardo, et al. "Deep patient: an unsupervised representation to predict the future of patients from the electronic health

records." Scientific Reports 6 (2016): 26094.

Page 22: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Patient Sequence Model

• Utomo et al. present a patient sequence model by defining state space, action space, and reward function.• State used qSOFA variables

• ABP systolic, respiratory rate, and Glasgow coma scale.

• Action used vasopressor variables• Epinephrine, dopamine, and phenylephrine

• Reward• 𝑅 = +1000, for terminal state and the patient is survived

• 𝑅 = −1000, for terminal state and the patient is dead

• 𝑅 = −1, for any non-terminal state

[18] Utomo, Chandra Prasetyo, Xue Li, and Weitong Chen. "Treatment recommendation in critical care: A scalable and interpretable

approach in partially observable health states." International Conference on Information Systems, 2018.

Page 23: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

A DNN-based Body Simulator Framework

• Fit the dynamic system characteristics of input and output

. . .

. . .. . .

. . .. . .

. . .. . .

. . .. . .

Intervention

x

Observation

y

Regulating

Network

r(x)

Decoding

Network

d(h)

Health State Layer

h

State

Unit

State

Unit

[19] Dai, Yinglong, Xiangyong Liu, and Guojun Wang. "A Body Simulator with Delayed Health State Transition." IEEE Ubiquitous

Intelligence & Computing (UIC). IEEE, 2018.

Page 24: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

A DNN-based Body Simulator Framework

• Fit the dynamic system characteristics of input and output

. . .

. . .. . .

. . .. . .

. . .. . .

. . .. . .

Intervention

x

Observation

y

Regulating

Network

r(x)

Decoding

Network

d(h)

Health State Layer

h

State

Unit

State

Unit

[19] Dai, Yinglong, Xiangyong Liu, and Guojun Wang. "A Body Simulator with Delayed Health State Transition." IEEE Ubiquitous

Intelligence & Computing (UIC). IEEE, 2018.

This framework can model high-dimensional state spaces and action spaces.

It can also emit high-dimensional observations that simulate partially observable MDP.

Page 25: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Regulating Network

• Assume objective health state is sobj, current state is scur, then the healthcare strategy is to solve

• However, it has no solution sometimes. E.g. • for a simple linear layer,

• It has no solution when rank(W) > rank([W, (∆s - b)]) for any intervention x.

• Whatever, we can find an approximate solution approaching the objective health state.

1( ), where obj curf x s s s s

s Wx b

( ) Wx s b

Page 26: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Decoding Network

• Representation space of decoding network

xi

xn

...

x1

... h1(1)

hk1(1)

...

h1(j)

hkj(j)

...

Input layer Output layer

Hidden layer 1

...

h1(l)

hkl(l)

... yi

yn

y1

Hidden layer l

......

...

2

21

1

2

n

i i

i

R y xn

2

21

1

2

m

j j

j

C h lm

J R C

Conceptual alignment deep autoencoder (CADAE)

[20] Dai, Yinglong, and Guojun Wang. "Analyzing tongue images using a conceptual alignment deep autoencoder." IEEE Access 6 (2018):

5962-5972.

Page 27: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Generative Adversarial Networks (GANs)

• The training cost more time to converge

• The generated images of GAN are uncontrollable

Page 28: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Outline

• INTRODUCTION

• FRAMEWORK

• BODY MODELING

• TREATMENT EXPLORING

• FUTURE WORK

• CONCLUSION

Page 29: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Basic Task: Intervention Evaluation

• Estimate the state transition probability given an action (MDP)

• For strategy planning, we need to consider a long term effect of an action.

111 112 121 122

211 212 221 222

( , ) ( , )

( , ) ( , )

p p p p

p p p p

P

Page 30: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Optimal Medication Dosing

• Non-optimal medical treatments can lead to unnecessary risks to patients, extend duration stays, or waste hospital resources

• Unfractionated Heparin (UH)• If over-dosed, increased risk of

bleeding;• If under-dosed, increased risk

of clot formation.

[21] Nemati, Shamim, Mohammad M. Ghassemi, and Gari D. Clifford. "Optimal medication dosing from suboptimal clinical examples: A

deep reinforcement learning approach." Annual International Conference of the IEEE Engineering in Medicine and Biology Society

(EMBC). IEEE, 2016.

HMM

DQN

Page 31: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Dynamic Treatment Regime

• Combine the benefits of supervised learning (exploitation) and reinforcement learning (exploration).

[22] Wang, Lu, et al. "Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation."

Proceedings of the 24th International Conference on Knowledge Discovery & Data Mining (KDD). ACM, 2018.

Page 32: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Clinician-in-the-loop framework

• Adjusting IV heparin dose using deep reinforcement learning

[23] Lin, Rongmei, et al. "A Deep Deterministic Policy Gradient Approach to Medication Dosing and Surveillance in the ICU." Annual

International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2018.

Page 33: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

A Simulation Framework

• Explore effective healthcarestrategies by using DRL onbody simulator.

• Advantages:• Quick feedbacks• More explorations

[16] Dai, Yinglong, et al. "Using deep neural networks to simulate human body." 2017 IEEE International Symposium on Parallel and

Distributed Processing with Applications (ISPA). IEEE, 2017.

Page 34: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Discrete Action Space

• The value function architecture of DQN

Page 35: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Continuous Action Space

• The actor-critic architecture of DDPG

Page 36: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Experimental setup

• Body simulator• Input: Random, 20 dimensional continuous action space• Latent health state: BC types, 9 dimensional continuous state space• Output: Tongue images, 32×32×3 pixels

Page 37: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Experimental setup

• DRL treatment module• Input: Tongue images, 32×32×3 pixels• Output: Random, 20 dimensional continuous action space

Page 38: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Computational Results

• Objective state:• [1,0,0,0,0,0,0,0,0]

Page 39: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Computational Results

• Tests for different scale architectures of regulating networks

Harder to find an optimal strategy as

the hidden layer scale becomes larger

(model complexity)

More likely to find an optimal strategy

as the input layer scale becomes larger

(available interventions)

Page 40: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Outline

• INTRODUCTION

• FRAMEWORK

• BODY SIMULATOR

• TREATMENT EXPLORING

• FUTURE WORK

• CONCLUSION

Page 41: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Multimodal data simulation

• An instance will emit multimodal data• Image modality • Text modality• Audio modality• Sensor signal modality

Page 42: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Make the DRL algorithm more stable

• Reduce the gap betweenthe diagnostic health stateand the latent health state

• DRL algorithms suffer theserious problem of unstable training process

Page 43: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Outline

• INTRODUCTION

• FRAMEWORK

• BODY SIMULATOR

• TREATMENT EXPLORING

• FUTURE WORK

• CONCLUSION

Page 44: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Conclusion

• Human body system is an extremely complex system, deep reinforcement learning will be a promising approach to explore the healthcare strategies.

• Simulation framework can facilitate the research of closed-loop healthcare process to discover optimal interventions or strategies.

• We believe that deep reinforcement learning will promote the efficiency and effectiveness in the healthcare loop.

Page 45: Exploring Healthcare Strategies by Deep Reinforcement …[9] Raghu, Aniruddh, et al. "Deep reinforcement learning for sepsis treatment." arXiv preprint arXiv:1711.09602 (2017). [10]

Thanks for all your attention!