predictive analytics in health monitoring

174
Predictive Analytics in Health Monitoring by Alireza Manashty Master of Science, Shahrood University of Technology, Iran, 2012 Bachelor of Science, Razi University, Iran, 2010 A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy (Ph.D.) In the Graduate Academic Unit of Computer Science Supervisor(s): Janet Light-Thompson, Ph.D., Dept. of Computer Science Examining Board: Suprio Ray, Ph.D., Faculty of Computer Science Huajie Zhang, Ph.D., Faculty of Computer Science Mary Ann Campbell, Ph.D., Dept. of Psychology External Examiner: Evangelos E. Milios, Ph.D., Faculty of Computer Science, Dalhousie University A dissertation is accepted by the Dean of Graduate Studies THE UNIVERSITY OF NEW BRUNSWICK February, 2019 Alireza Manashty, 2019

Upload: others

Post on 31-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Predictive Analytics in Health Monitoring

by

Alireza Manashty

Master of Science, Shahrood University of Technology, Iran, 2012Bachelor of Science, Razi University, Iran, 2010

A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OF

Doctor of Philosophy (Ph.D.)

In the Graduate Academic Unit of Computer Science

Supervisor(s): Janet Light-Thompson, Ph.D., Dept. of Computer ScienceExamining Board: Suprio Ray, Ph.D., Faculty of Computer Science

Huajie Zhang, Ph.D., Faculty of Computer ScienceMary Ann Campbell, Ph.D., Dept. of Psychology

External Examiner: Evangelos E. Milios, Ph.D., Faculty of Computer Science,Dalhousie University

A dissertation is accepted by the

Dean of Graduate Studies

THE UNIVERSITY OF NEW BRUNSWICK

February, 2019

©Alireza Manashty, 2019

Abstract

Predictive analytics in healthcare can prevent patients emergency health con-

ditions and reduce costs in the long term. Accurate and timely anomaly pre-

dictions focusing on recent events can save lives. Nevertheless, for such ac-

curate predictions, machine learning algorithms require processing long-term

historical big data, which is infeasible in wearable devices due to their mem-

ory constraints and low computing power. Current techniques either ignore

a large amount of historical data or convert temporal sequences to pattern

sequences, eliminating valuable properties for prediction such as time and

recency. In addition, missing values in data collection can impair the predic-

tion. Hence, the motivation of this research is to efficiently model historical

data with missing values in a precise form of multivariate temporal sequences

to detect and forecast emergency events.

The proposed model is named as life model (LM). LM creates a new concise

sequence to represent the history and the future as an intensity temporal

sequence (ITS) tensor. LM maps arbitrary-length multivariate discrete time-

series data to another concise sequence, called multivariate interval sequence

ii

(MIS). ITS and MIS retain the original data properties such as time, recency,

and scale, without being much susceptible to missing values. Since long short-

term memory (LSTM) recurrent neural networks are proved to be effective

models for modeling sequence data, the LM algorithms and their properties

enable ITS and MIS tensors to train LSTM and other machine learning

techniques efficiently in order to predict in real-time, even in the absence of

some values.

LM is tested to predict and forecast emergency event such as the mortality

of a patient from the MIMIC III intensive care unit dataset. Based on their

diagnosis and procedure codes over a span of 11 years, the model achieved

84.2% and 99.6% accuracy on 34k and 10k patient records respectively.

In addition, the LM model is tested to predict the approximate time of

certain human activities, with different granularity of seconds and up even

to years. When tested on the URFD fall dataset, the experimental results

show that, compared to a previous study using a complex LSTM network,

LM achieves the same 100% accuracy in fall prediction using 80× less weight

parameters and computing power. LM is observed to forecast human fall up

to 14 seconds in advance with 86.96% accuracy with all available data and

85.56% accuracy with 50% missing values.

Finally, a new LM -powered predictive health analytics and real-time monitor-

ing schema (PHARMS) is developed which uses deep learning for predictive

analysis in a medical internet of things environment using wearable devices.

iii

Dedication

To the love of my life, Zahra. To our blossom, Pania. For all the days and

nights that I could not be with them.

To Professor Janet Light, my dearest supervisor, who always supported me

with her wisdom, experience, diligence, and patience.

iv

Acknowledgements

The authors would like to thank Microsoft Research for providing Microsoft

Azure cloud services for this research as part of the Azure for Research grant

program (2016-2018).

v

Table of Contents

Abstract ii

Dedication iv

Acknowledgments v

Table of Contents x

List of Tables xi

List of Figures xv

Abbreviations xvi

1 Introduction 1

1.1 Research Challenges in Modeling Medical History . . . . . . . 2

1.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Motivation and Main Contributions . . . . . . . . . . . . . . . 7

1.4 Thesis Road Map . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Background 12

vi

2.1 Time Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Temporal Data Modeling . . . . . . . . . . . . . . . . . . . . . 14

2.3 Detection, Prediction, and Forecasting . . . . . . . . . . . . . 18

2.4 Temporal Sequence Modeling . . . . . . . . . . . . . . . . . . 19

2.5 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5.1 Recurrent Neural Networks (RNN) . . . . . . . . . . . 21

2.5.2 Long Short-Term Memory (LSTM) . . . . . . . . . . . 22

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Health Monitoring Systems 24

3.1 Predictive health monitoring . . . . . . . . . . . . . . . . . . . 26

3.2 Data Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Ambient-Assisted Living (AAL) . . . . . . . . . . . . . . . . . 31

3.3.1 Context Awareness . . . . . . . . . . . . . . . . . . . . 32

3.3.2 Knowledge Sharing . . . . . . . . . . . . . . . . . . . . 33

3.3.3 Real-time Decision Making . . . . . . . . . . . . . . . . 34

3.3.4 Efficient Service Delivery . . . . . . . . . . . . . . . . . 35

3.3.5 Comprehensive Monitoring System . . . . . . . . . . . 35

3.4 Existing Frameworks . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.1 AAL-based Frameworks . . . . . . . . . . . . . . . . . 37

3.4.2 Cloud Prediction platforms . . . . . . . . . . . . . . . 39

3.5 Roadblocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5.1 Policies, Privacy, and Trust . . . . . . . . . . . . . . . 41

vii

3.5.2 Security . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.6 Research Trends in internet of everything (IoE) Knowledge

Sharing Platforms . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Health Data Representation for Predictive Analytics 46

4.1 Related Works in Health Data Representation . . . . . . . . . 46

4.2 Data Representation Taxonomy . . . . . . . . . . . . . . . . . 51

4.3 Current Techniques . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Life Model 57

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 Life Model Definitions . . . . . . . . . . . . . . . . . . . . . . 60

5.2.1 Life Model for Time-series . . . . . . . . . . . . . . . . 60

5.2.2 Life Model for Multivariate State Sequences . . . . . . 65

5.3 LM Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.3.1 Unit of time . . . . . . . . . . . . . . . . . . . . . . . . 72

5.3.2 Compression Ratio δ . . . . . . . . . . . . . . . . . . . 72

5.4 Prediction and Forecasting using Life Model . . . . . . . . . . 74

5.5 Evaluation and Loss Metrics . . . . . . . . . . . . . . . . . . . 76

5.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

viii

6 Life Model Case Studies 82

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2 Test Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.3 Test Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.4 Mortality Models . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.4.1 Mortality Forecasting . . . . . . . . . . . . . . . . . . . 87

6.4.2 Mortality Detection . . . . . . . . . . . . . . . . . . . . 89

6.4.3 Diagnosis and Procedures Forecasting . . . . . . . . . . 93

6.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.5 Human Fall Prediction and Forecasting . . . . . . . . . . . . . 96

6.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 96

6.5.2 Hardware Considerations . . . . . . . . . . . . . . . . . 97

6.5.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.5.3.1 Binary Prediction . . . . . . . . . . . . . . . . 99

6.5.4 Fall Forecasting . . . . . . . . . . . . . . . . . . . . . . 101

6.5.5 Fall Forecasting with Missing Values . . . . . . . . . . 102

6.6 Comparison with Recent Temporal Patterns (RTPs) . . . . . . 102

6.6.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . 103

6.6.2 Prediction Model . . . . . . . . . . . . . . . . . . . . . 104

6.6.3 Results and Comparison . . . . . . . . . . . . . . . . . 105

6.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.7 Human Activity Forecasting . . . . . . . . . . . . . . . . . . . 108

6.7.1 Forecasting Model . . . . . . . . . . . . . . . . . . . . 108

ix

6.7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7 Predictive Health Analytics and Real-time Monitoring Schema

(PHARMS) 114

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.2 Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.3 Health Event Aggregation Lab (HEAL) . . . . . . . . . . . . . 117

7.3.1 Aggregators . . . . . . . . . . . . . . . . . . . . . . . . 120

7.3.2 Predictors . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.4.1 Remote Dialysis . . . . . . . . . . . . . . . . . . . . . . 124

7.4.2 Mortality Prediction API . . . . . . . . . . . . . . . . . 127

7.4.3 Fall Forecasting Mobile App . . . . . . . . . . . . . . . 128

7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

8 Conclusion and Future Work 129

Bibliography 149

Vita

x

List of Tables

6.1 Mortality forecasting results using different metrics modeled

as LM period index as outcome. . . . . . . . . . . . . . . . . . 88

6.2 Accuracy, area under receiver operating characteristic (Au-

ROC), and Brier score for LM versus fixed-sized periods map-

pings for Mortality Prediction. . . . . . . . . . . . . . . . . . . 91

6.3 Comparison between LM and previous work on dataset. . . . 100

6.4 Performance of the LM and fixed size periods for fall prediction101

6.5 Fall forecast results for up to 14 seconds with various metrics

and levels of missing values. . . . . . . . . . . . . . . . . . . . 102

6.6 Accuracy (Average Recall) results for 10,000 patients using

different techniques. . . . . . . . . . . . . . . . . . . . . . . . . 106

6.7 Accuracy (Average Recall) results for 100,000 patients using

different techniques. . . . . . . . . . . . . . . . . . . . . . . . . 106

6.8 Accuracy and loss for LM versus fixed-size periods mappings

for activity recognition. . . . . . . . . . . . . . . . . . . . . . . 109

6.9 Comparison summary among LM and other techniques. . . . 112

xi

List of Figures

1.1 Sequence length per sample for a variety of sensory data for

specific time periods. . . . . . . . . . . . . . . . . . . . . . . 3

1.2 How fixed-length representations (b) of variable-length tem-

poral records (a) can create a meaningful input for different

learning algorithms in order to provide a better prediction. . 9

1.3 An example of how deep learning and LM -powered PHARMS

can create a minimally-invasive, intelligent remote monitoring,

and prediction platform using regular cameras only. . . . . . . 10

1.4 Remote dialysis assessment case study. . . . . . . . . . . . . . 11

2.1 Comparing multivariate temporal health data and time-series

techniques for forecasting. . . . . . . . . . . . . . . . . . . . . 13

2.2 Trend and value abstractions for creatinine values over time . 15

2.3 Several possible architectures of an recurrent neural network

(RNN). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 It is often too late to detect an emergency event . . . . . . . . 20

2.5 Forecasting based on mapping from history . . . . . . . . . . . 20

2.6 RNN diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

xii

2.7 LSTM diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1 Joint directors of laboratories (JDL) model levels . . . . . . . 30

3.2 How remote monitoring systems work in an ambient-assisted

living (AAL) environment. . . . . . . . . . . . . . . . . . . . . 31

3.3 How predicting future trends and anomalies require train data

from past events. . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Predicting an anomaly with the help of an intelligent detection

and prediction system. . . . . . . . . . . . . . . . . . . . . . . 36

3.5 AAL Spaces and AAL Platforms interaction. . . . . . . . . . . 38

3.6 Fleet Management system demo utilizing Microsoft internet

of things (IoT) suite. . . . . . . . . . . . . . . . . . . . . . . . 41

4.1 Hand-engineering and combining different techniques to model

health data by Forkan et al. . . . . . . . . . . . . . . . . . . . 47

4.2 Recent temporal pattern (RTP) with a minimum gap by Iyad

et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 Number of patients that had at least one admission in the last

year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4 An illustration of how an actual health dataset look like . . . 54

4.5 First approach for data modeling is to fill-in the missing values

with zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6 Second approach for data modeling is to use none, but the

most recent data . . . . . . . . . . . . . . . . . . . . . . . . . 55

xiii

4.7 Third approach for data modeling is to remove the gaps (the

missing data) to create short sequences . . . . . . . . . . . . . 55

5.1 How LM models the data. . . . . . . . . . . . . . . . . . . . . 60

5.2 An example of LM mapping . . . . . . . . . . . . . . . . . . . 68

5.3 Relative position of temporal states. . . . . . . . . . . . . . . . 70

5.4 The effect of different values of time unit and δ on fill rate and

n. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.5 LM mapping diagram for history and future. . . . . . . . . . . 74

5.6 The heatmap for mean squared error (MSE) versus tolerance

error (TE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.1 How forecasting data is prepared . . . . . . . . . . . . . . . . 87

6.2 Training and testing plots for mortality prediction on medical

information mart for intensive care (MIMIC) III dataset. . . . 92

6.3 The boxplot for mean tolerance error (MTE). . . . . . . . . . 95

6.4 Training and validation set accuracy and loss function plots of

activity prediction. . . . . . . . . . . . . . . . . . . . . . . . . 110

7.1 PHARMS , health event aggregation lab (HEAL), and the 3-

tier LM engine architectures. . . . . . . . . . . . . . . . . . . 118

7.2 HEAL Architecture . . . . . . . . . . . . . . . . . . . . . . . 119

7.3 An overview of HEAL framework. . . . . . . . . . . . . . . . . 121

7.4 Proposed aggregator model for HEAL. . . . . . . . . . . . . . 122

7.5 Proposed predictor model for HEAL platform . . . . . . . . . 123

xiv

7.6 HEAL core framework, an implementation of the HEAL ar-

chitecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.7 Four stages of the remote dialysis assessment study using HEAL

framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

xv

List of Abbreviations

AAL ambient-assisted living xiii, 24, 26–28, 30, 31, 37, 38, 46, 47, 116

ACM association for computing machinery 102

AIaaS artificial intelligence as a service 128

API application programming interface 40, 128

AuROC area under receiver operating characteristic xi, 48, 83, 91, 92, 100,

101

BSN body sensor network 120

CDSS clinical decision support system 115

CEP complex event processing 119, 126

CN2 CN2 algorithm 101

CNN convolutional neural network 98

CNTK Microsoft cognitive toolkit 104, 105, 108

xvi

CoCaMAAL cloud-oriented context-aware middleware in ambient assisted

living 25, 37, 38, 45, 46

DDSS diagnosis decision support system 115

DFF deep feed-forward neural network 105–108

ECG electrocardiography 27, 30, 120

EEG electroencephalography 27, 30, 120

EHR electronic health record 48

EMG electromyography 120

EMS emergency medical services 31

FN false negatives 83, 127

FP false positives 83

GBM gradient boosting machine 105–108

GMM generalized method of moments 48

GPU graphics processing unit 21, 90, 91, 94

HEAL health event aggregation lab xiv, xv, 25, 37, 43, 45, 114, 116–125,

127

HMM hidden Markov model 19, 47

xvii

HTTPS hypertext transfer protocol secure 42

Hz hertz 4, 86, 99, 108

ICA independent component analysis 48

ICD international classification of diseases 6, 14, 16, 49, 85, 89, 90

ICU intensive care unit 1, 6, 12–14

INR international normalized ratio 126

IoE internet of everything viii, 24–28, 30, 34–36, 39, 40, 42–45

IoMT internet of medical things 2, 40, 50

IoT internet of things xiii, 2, 6, 9, 10, 12, 39–43, 94, 118, 125, 129, 130

ITS intensity temporal sequence ii, iii, 8, 49, 59, 65–69, 71–74, 102, 105–108,

129

JDL joint directors of laboratories xiii, 29, 30

LM life model ii, iii, viii, xi, xii, xiv, 8–11, 22, 50, 56, 57, 59–63, 65, 66, 68,

71–78, 81, 83, 84, 86–94, 99–102, 108–114, 116, 118, 128–131

LMts life model for timeseries 59, 100–102

LR linear regression 14, 101

LSTM long short-term memory iii, xiii, 8, 18, 21–23, 48–53, 60, 74, 83, 88,

90, 91, 94, 97, 99–101, 104–108, 112, 129

xviii

MIMIC medical information mart for intensive care iii, xiv, 6, 14, 85, 87,

89, 92, 130

MIS multivariate interval sequence ii, iii, 8, 59, 61, 64, 65, 72–75, 78, 90,

92, 97, 98, 108, 110, 112, 129

MLP multilayer perceptron 48

MSE mean squared error xiv, 76, 79, 84, 88, 93, 95, 102

MSS multivariate state sequences 17, 51, 54, 59, 65–69, 71, 74

MTAS multivariate temporal abstraction sequence 103, 104

MTE mean tolerance error xiv, 77, 78, 84, 93–95

MTS multivariate temporal sequence 59–61, 64, 65, 74, 93

MVC model-view-controller 118

NB naiive bayes 101

NBHRF New Brunswick health research conference 124

NFC near-field communication 32

NHS National Health Service 52

OSGi open services gateway initiative 37

PaaS platform as a service 33, 39, 118

xix

PCA principle component analysis 48

PHARMS predictive health analytics and real-time monitoring schema iii,

xii, xiv, 9–11, 114–118, 124, 127, 128, 130

PI pattern injector 104

PIPEDA personal information protection and electronic documents act 41

PIR pattern injection rate 104

PR patient record 103, 104

RF random forests 101, 105, 106, 108

RFID radio-frequency identification 27, 32

RGB-D red-green-blue-depth 32

RNN recurrent neural network xii, 8, 18, 19, 21–23, 48, 50, 64, 65, 68, 81,

97, 98, 104, 108, 111, 112

ROC receiver operating characteristic 83, 91, 92

RTP recent temporal pattern xiii, 15, 49, 66, 102, 105–107, 111, 112

SaaS software as a service 114, 118

SDA stack of denoising autoencoders 48

Seq2Seq sequence to sequence 75–77

xx

SSL secure socket layer 42

SVM support vector machines 49, 101

TE tolerance error xiv, 76, 77, 79, 84, 88, 101, 102

TN true negatives 83

ToF time-of-flight 30

TP true positives 83

UI user-interface xiv, 119

URFD University of Rzeszow fall dataset 86, 99, 101

VM virtual machine 94

xxi

Chapter 1

Introduction

Predictive analytics in healthcare can prevent patients having emergency

health conditions, save lives, and reduce the cost of healthcare in the long

term. The USA budget for healthcare in 2017 was just over a trillion dollars

[1]. A 2012 study [2] showed that 61% of acute hospital patients experience

discharge delay, which causes delays for other patients, raises the costs, and

increases patient admission complications due to lack of emergency symptom

monitoring. In July 2017, a cohort Canadian study [3] showed that dying

risk for patients experiencing emergency surgery delay is 4.9% compared

with 3.2% for those without delay. Hence, predictive analytics plays an

important role in improving healthcare processes. Recently, researchers have

developed tools to predict hospital readmission rates [4], mortality risks in

the hospitals and particularly in the intensive care units (ICUs), and assign

severity scores to patients [5, 6]. The next step in this trend is disease

1

diagnosis and anomaly prediction, by which the hospital information system

can automatically identify a patient’s diagnosis code and forecast a disease

quickly and accurately in real-time for an emergency medical situation.

With the emerging internet of medical things (IoMT), modeling long his-

torical temporal health records for a patient with missing data is a major

challenge for predictive analytics. IoMT is a network of medical internet of

things (IoT) devices connected to the healthcare ecosystem. Recent studies

are using deep learning and data abstraction techniques to model health data

in such an environment [7, 8, 9]. However, it is difficult to train a model to

predict anomalies based on temporal sparse data. Specifically, representing

more than few seconds of an individual’s medical history in a short, con-

cise sequence is the keystone challenge for training deep learning algorithms.

Moreover, despite the missing data, the model should be robust and preserve

the concept of time and recency for a variety of samples, which is critical in

an IoMT environment.

1.1 Research Challenges in Modeling Medi-

cal History

To accurately predict the imminent health anomalies or events from real-

time medical history, it is necessary to properly model the long sequences

of an individual’s health and activity records. A temporal sequence is an

array of time-stamped records. For instance, family physician visits can be a

2

Figure 1.1: Sequence length per sample for a variety of sensory data forspecific time periods.

3

temporal sequence. If the interval between each record is a fixed value (e.g.,

every hour), the array is a time-series. An example of time-series data is

the recorded vital signs of a patient in a hospital bed. The problem with

using time-series modeling and activity recognition techniques for modeling

long periods of time is the length of the data and the presence of missing

data. Fig. 1.1 shows how long the sequence length of a single sample can be.

For example, accelerator sensor data for 3 days consists of approximately

13 million time-stamped records. The first step is usually discretizing the

real-time (continuous) data in order to create fewer time steps for easier pro-

cessing. In addition to discretization errors [10] in temporal data abstraction,

discrete value sequences obtained from historical medical data may require

missing value imputation first. Moreover, each data interval (short-term vs

long-term) generates a similar sequence length as any other interval in the

history. This similar sequence length causes one or both of the following

problems:

The resulting discretized sequences grow linearly as long as the medical

history is present. For example, if a person’s medical history for a day

is recorded at 50 hertz (Hz), 4.32 million records are recorded, which

exceeds the input dimension of many machine learning algorithms. Fig-

ure 1.1 illustrates this problem.

The above sequences are not the same for different patients with dif-

ferent available histories (length and quality of data). Patients do not

4

wear sensors 24/7, and even if they try to, such devices are unavailable

during charging. The resulting variation in sequence lengths (a few

seconds compared to hours, and days of data) makes it even harder (if

not impossible) to optimize a model for prediction.

Techniques are available [8] to create an abstract version of history by ex-

tracting patterns in data, which may ignore the missing values; however,

they are unable to produce an arbitrary length of history. A fixed concise

representation of history has many computational advantages. First, most

learning algorithms require fixed-size input. Even autoencoders, which can

create condensed representation of data, require a fixed input-length in the

first place. Furthermore, if a normalized representation implicitly handles

missing values, it can resolve a major challenge in sequence learning and

thus health prediction.

The summary of challenges addressed in this thesis are as follows:

Modeling long temporal sequences of sparse health data for

prediction

Modeling long-term sparse temporal data and training a machine learn-

ing model to properly benefit from critical dependencies, and distin-

guish that information from irrelevant noise, is an open problem. For

example, the hourly averages of 12 variables a day for 10 years, results

in an input sequence of more than one million records per patient. Fit-

ting decades of medical history, lifestyle, and activities into a concise

5

sequence—as to optimize machine learning—is a challenge.

Predicting patient mortality and diagnosis

There are many diagnostic classes for automated classification using

machine learning. In one of the largest datasets available (medical in-

formation mart for intensive care (MIMIC) III [11]), for around 40,000

patients in ICU , there are more than 15,000 unique international clas-

sification of diseases (ICD)-9 diagnosis codes defined by physicians.

At the first glance, we are facing a classification algorithm with 15,000

classes with fewer than 3 samples per class. For mortality prediction,

there are also more parameters to consider as most of the patient’s

data is based on hospital records—often only during the final admis-

sion process. With the help of medical IoT and real-time monitoring,

prediction can be extended to the patient’s day-to-day life rather than

only to the hospital visits.

Real-time health predictive analytics

An intelligent and practical system that can provide smart real-time

predictive health anomaly decision support for physicians and patients

is not yet available in the literature. Such a system should be able to

receive data from many IoT edge sensors, provide predictive analytics,

and send feedback in a timely manner.

6

1.2 Research Question

In this research we seek to answer the following question:

“How can temporal sequences be modeled to improve the analytic process in

real-time prediction?”

We divide the research problem further into three questions:

1. How can the multivariate sparse temporal data be modeled from an

individual’s lifetime medical records for the learning algorithms? More

specifically, how to model the data in such a way that both long-term

and short-term (and even real-time data) could be fed into the same

model for it to be able to predict events as accurate as possible?

2. Which learning algorithm fits the above model better?

3. What architecture/framework is best suited for the above purposes?

1.3 Motivation and Main Contributions

The main objective of this research is to address the challenges in the develop-

ment of a system that can provide predictive analytics for health monitoring.

Anomaly detection is not adequate for many scenarios as it may be already

too late to detect an emergency event. For example, in detection, we ask

the question: “Do I have cancer?” or “Has my father fallen today/now?”

whereas in forecasting/prediction, we ask: “Will I get cancer? When?” or

“Is it likely for my father to have an accident (fall) today?”. Finding the

7

answers to the above forecasting questions are more challenging than the

detection problem. The motivation of this research is to model long-term

temporal sequences—usually with missing values—to not only detect, but to

forecast events in either wearable deep learning hardware [12] or cloud-based

services.

In this research, a novel time-mapping model called Life model (LM) is pro-

posed for modeling temporal sequences to achieve a concise sequence of an

individual’s data records. (See Fig. 1.2). The LM provides an n-bit sequence

to represent the data in history or the future named as either an intensity tem-

poral sequence (ITS) or multivariate interval sequence (MIS) tensor1, based

on the type of input (explained in Chapter 5). LM algorithms and properties

enable these tensors to train machine learning models efficiently, especially

long short-term memory (LSTM) recurrent neural networks (RNNs).

The development and testing of the novel models, algorithms, processes, and

the architectures listed below to address the above challenges, are the main

contributions in this research:

1. A novel modeling of health records, activities, and future predictions

2. Temporal abstraction techniques for modeling long-term sparse multi-

variate temporal data for optimized learning

3. An architecture/framework for real-time health analytics

1In this thesis, ITS or MIS are vectors of tensors, and tensors can safely be assumedas multidimensional arrays in this document.

8

Binary Classification

Sequence to SequenceClassification/Regression

Life ModelMIS

Autoencoder

(a) Variable historical records with missing values

(b) Fixed-length representations

Mapping

Reducing Predicting FeedbackIndividuals' Records

Figure 1.2: How fixed-length representations (b) of variable-length temporalrecords (a) can create a meaningful input for different learning algorithms inorder to provide a better prediction.

The proposed LM -powered predictive health analytics and real-time moni-

toring schema (PHARMS) promises to provide a solution to improve pre-

dictive health analytics via IoT edge devices and wearables. It enables

real-time minimally-invasive intelligent activity monitoring and predictive

analysis based on various deep learning techniques. It is also the testbed for

evaluating the LM in a cloud environment, using real-world and simulated

data.

Testing with different scenarios show how smart health using real-time moni-

toring and predictive analysis can improve healthcare synergistically. Figure

1.3 shows how a remote patient monitoring system can use the LM -enabled

PHARMS to detect and predict anomalies to recover from an emergency

condition (here, it predicts a ‘fall’). The cloud-based backend provides ad-

9

Figure 1.3: An example of how deep learning and LM -powered PHARMScan create a minimally-invasive, intelligent remote monitoring, and predictionplatform using regular cameras only.

vanced intelligence to notify the caregivers in real-time. Figure 1.4 shows

another example of how a remote dialysis assessment system can benefit

from PHARMS to help renal patients avoid early/late visits to hospitals us-

ing a self-assessment device at home. Combined with real-time monitoring

and IoT , accidents such as falls, heart attacks, and seizures, can be pre-

vented with health anomaly prediction. Warning users of complications of a

drug, or providing early predictions of a disease, are among the many other

applications of PHARMS .

10

Figure 1.4: Remote dialysis assessment case study.

1.4 Thesis Road Map

In chapter 1, an overview of the challenges and research questions was dis-

cussed. Chapter 2 covers the background on time modeling, data abstrac-

tion, and deep learning, which is required to understand the rest of the thesis.

Chapter 3 reviews cloud-based health monitoring systems that are facilitat-

ing predictive analytics. Chapter 4 reviews related works in more depth and

covers the theoretical background for temporal modeling. Chapters 5 and 6

describe the proposed LM and its applications, including evaluation for var-

ious predictive test cases. Chapter 7 covers the proposed PHARMS schema

and Chapter 8 concludes the thesis.

11

Chapter 2

Background

Unlike a time-series with fixed intervals, health data is often collected spo-

radically. For instance, the patient visits a doctor and a medical record is

added; then a few months later there is another record, and then maybe no

records are added for a few years. Moreover, wearable devices are not always

worn and IoT edge devices are not always monitoring patients. In emergency

conditions, for patients in hospitals and ICUs , more tests are performed and

more data is available. However, even in hospitals, years of family and med-

ical history are summarized in a paragraph or two, making it challenging to

integrate with the rest of the data. In disease prediction and health mon-

itoring we are interested in temporal sequence data. Time-series modeling

techniques are not applicable for sparse medical temporal data sequences;

therefore, other prediction techniques should be used. Fig. 2.1 compares

time-series forecasting with multivariate temporal health data modeling.

12

Figure 2.1: Comparing multivariate temporal health data and time-seriestechniques for forecasting.

2.1 Time Modeling

Time in temporal sequences is either modeled implicitly, as in time-series, or

explicitly using either a time point or a period. Time-series are continuous

time points with fixed-intervals and usually have only a few dimensions. Such

characteristics are not suitable for modeling discrete health data. Unless the

patient is connected to ICU bed sensors, or are wearing or connected to

sensors in real-time, health data is usually recorded at different intervals or

as needed. This type of data is not recorded in fixed intervals and contain

many missing values. Family doctor visits are great examples of this type of

data.

13

2.2 Temporal Data Modeling

Iyad et al. [8] show that regular time-series techniques are not suitable for

multivariate temporal medical records, as these records are usually collected

at different intervals and contain large gaps. Time-series techniques usually

require equally spaced time intervals. When such data is available, for exam-

ple in the MIMIC II real-time ICU signal dataset [11], we could predict the

values using a linear regression (LR) analysis, as done in a 120 minutes pe-

riod for heartrate and blood pressure in a recent study [13]. This type of data

is not usually available unless the patient is present in ICD and monitored

continuously in real-time.

To train a predictive model based on historical records, a sequence of tempo-

ral patterns is required. For example, in cardiovascular disease, the choles-

terol plaques formed inside the veins are more likely to build up in a decade,

rather than just overnight. The trend towards this plaque build-up could be

predicted by observing the cholesterol levels in a series of sporadic checkups

of the patient. These time point sequences, however, leave us with some gaps

in time, which could be as long as a year or a decade. Thus, instead of using

time-series technique to model such sequences, data abstraction techniques

can be used for modeling long-term data as a sequence of similar patterns.

For example, to create temporal sequences, Iyad et al. [8] proposed to initially

create temporal states of the form (variable, value) denoted as (F, V ) where

variable F is a temporal variable, such as “Blood Pressure” or “Cholesterol”

14

Figure 2.2: Trend and value abstractions for creatinine values over time.Courtesy of [8].

and value V is an abstracted value from a range of value abstractions Σ =

V ery Low, Low, . . . , V ery High (Figure 2.2).

Time points are converted into time intervals and temporal patterns of size

k are created and named as k-patterns. Each pattern is a series of temporal

states plus a matrix R representing the relationship between two state in-

tervals. For example, (“Creatinine”, “High”) BEFORE (“Blood Pressure”,

“Low”) is a 2-pattern. A full example can be found in [8].

The authors also introduce recent temporal pattern (RTP) which limits the

pattern mining to only a recent gap, as recent data are supposed to have

more relevant information. However, the results show little or no difference

between RTPs and temporal patterns. So, it can be concluded that using

recent data does not help in prediction significantly.

The problem with temporal mining is that finding k-patterns are computa-

15

tionally expensive. The reason is that for each new pattern, the sequences

in all the samples should be processed. Then we are able to create a larger

pattern. All the patterns start with 1-patterns, then 2-patterns are created

based on 1-patterns and so on. Unfortunately, the data from [8] and [14] are

not available for comparison due to intellectual property rights. Even the

details of categorizing 602 ICD-9 diagnosis codes into eight categories using

a medical expert in [8] could not be replicated. Next, we explain these tem-

poral abstractions further as it is used as the basis of one of our algorithms.

Temporal Abstraction Temporal abstractions are the result of applying

a series of abstraction techniques to multivariate temporal intervals. There

are two types of temporal abstractions: trend abstraction and value abstrac-

tion [15, 8]. Each abstraction has a variable (F) and a value (V) and is shown

as the tuple (F, V). For trend abstractions:

V ∈ “Decreasing”, “Steady”, “Increasing”

and for value abstractions:

V ∈ “V ery Low”, “Low”, “Normal”, “High”, “V ery High”.

For example, if creatinine values for a patient are normal at time points A

and B, and high at time points C and D, an example for creatinine value

abstraction in time interval [A, B], would be: (“Creatinine”, “Normal”, A,

B). And similarly (“Creatinine”, “High”, C, D) for the time interval [C, D].

A state interval (E) is then defined for an interval, denoted by a 4-tuple

(F, V, s, e) where s and e are the start time and end time of the state

16

interval. Finally, multivariate state sequences (MSS) are defined as a series

of state intervals (E) for multiple variables in time:

Z = 〈E1, E2, . . . , El〉; Ei.s ≤ Ei+1.s ∀i ∈ 1, . . . , l − 1 (2.1)

An example of a MSS is:

〈(“Creatinine”, “Normal”, 14, 18), (“Glucose”, “High”, 16, 21)〉

Temporal patterns are then defined as a subset of MSS as follows: For in-

stance, < (“Creatinine”, “Normal”), (“Glucose”, “High”) > is a temporal

pattern containing two temporal abstractions. These patterns are useful be-

cause they can create a high-level abstraction of otherwise uninterpretable

numerical values. However, unlike MSS , to extract temporal patterns for a

dataset, all samples from a particular class should be processed and often

multiple times, using a computationally complex recursive algorithm, unless

making it limited to the recent data only [8].

Although the end temporal patterns are interpretable and can be used to

find similar patterns in a new example, they are not suitable to train other

machine learning algorithms, such as the state-of-the-art deep learning mod-

els.

17

2.3 Detection, Prediction, and Forecasting

Here we consider modeling the process of predicting health anomalies and

disease diagnosis from past activity and health records. Medical records of

a patient, including any past diagnoses, along with a health profile, such as

age, gender, and race, constitutes the prior information denoted as Φ. The

objective is to predict the probability distribution of anomalies Υ, given the

past activities Ω, regarding the patient’s profile Φ :

p(Υ|Ω,Φ) (2.2)

Not all learning algorithms can estimate this model. In a real-time predic-

tion and monitoring environment, we model activities Ω and anomalies Υ as

tensors in time. Thus, a LSTM network would be the most suitable model

to learn the dependencies to predict anomalies. RNN can be used in many

formats. They are capable of sequence to sequence mapping which enables

them to be used for prediction, given a history (Figure 2.3). This figure

shows several possible architectures of an RNN . Input sequences/cells are in

red, hidden layers are in green and blue rectangles are the output sequence

or units. Detection is not enough for many scenarios as it may be already

too late to detect an emergency event as illustrated in Figure 2.4. In health

anomaly prediction, we are interested in a many to many architecture shown

in Figure 2.5.

The terms detection, prediction, and forecasting are sometimes used

18

interchangeably. More specifically, detection and prediction are used to

determine a time point or event which occurs immediately in future, or which

already occurred (e.g., fall detection). On the other hand, prediction is also

used with the meaning of forecasting an event in future (e.g., predicting

earthquakes or forecasting weather). In this thesis, the meaning of the

word prediction is context-specific (e.g., fall detection is compared with fall

prediction (forecasting)).

Figure 2.3: Several possible architectures of a RNN . Input sequences/cellsare in red, hidden layers are in green and blue rectangles are the outputsequence or units. Image courtesy of Andrej Karpathy [16]

2.4 Temporal Sequence Modeling

Two popular sequence classification methods are either Markovian models or

RNNs . The problem with Markovian models, such as hidden Markov model

(HMM) is that they assume each state is only dependent on the previous

state. In long-term health data prediction, we believe this might not be true.

Certain life-style and diagnosis in the past may affect a patient’s current

19

Figure 2.4: It is often too late to detect an emergency event. Even if anemergency is detected, the patient may suffer from severe damage before theemergency team arrives. By forecasting and prediction rather than simplydetection, early intervention can reduce such damages.

Figure 2.5: The goal is to predict future temporal sequences from historicalsequences using a machine learning algorithm.

20

diagnosis —for example, history of certain drug consumption or surgery.

Thus, first order Markovian chains do not seem suitable for this type of

classification as they ignore long-term correlations. One solution might be

using higher order Markovian chains [17]. However, they are known to be

complex and computationally expensive as the order increases (e.g., using

orders higher than two). Therefore, RNNs can be a good alternative. RNN

are proved to be Turing complete [18] thus seem to be able to handle this

task given enough resources. However, the regular RNN cells are shown to

be inefficient in remembering long dependencies. LSTM [19] cells instead

perform better in remembering history.

2.5 Deep Learning

Deep neural networks became popular as the required data and computa-

tion power (specifically graphics processing units (GPUs)) became available.

Compared to hand-engineering features for different machine learning prob-

lems, deep learning methods can capture the non-linearity and the relation

and importance of each feature via training.

2.5.1 Recurrent Neural Networks (RNN)

Deep neural networks can approximate any function (mapping from X:Y)

[20] without considering independent and identically distribution (i.i.d.) of

input variables [21]. Among several popular deep learning architectures,

21

RNN (Figure 2.6) is selected for our research as it is suitable for sequen-

tial inputs, (such as inputs in speech recognition, machine translation, and

natural language processing) and is the most suitable model for sequence to

sequence classification [22].

2.5.2 Long Short-Term Memory (LSTM)

The vanilla RNN cells suffer from a vanishing gradient problem, in which

the backpropagation signal vanishes before reaching the beginning cells and

thus long-term dependencies are not learned efficiently [23]. RNNs with LM

cells [23] address the problem by adding an internal memory to each cell

(Figure 2.7). They still prove to be robust in most scenarios even after other

variations were proposed [19]. Hence, as a start in this research, we use

LSTM variations as the base model for our proposed solution.

2.6 Summary

In this chapter we covered some necessary backgrounds regarding time and

temporal sequence modeling, the difference between detection, prediction,

and forecasting, and how deep learning sequence modeling, specifically LSTM

can be used to model sequence to sequence modeling. The next chapter cov-

ers some background and the literature review of health monitoring architec-

tures.

22

(RNN)

(Unrolled RNN)

Figure 2.6: (Top) An RNN diagram. A series of neural networks, A, looksat some input xt and outputs a value ht. The loop indicates a feedbackfrom each node from the output of previous nodes. (Bottom) The unrolledrepresentation of the RNN , which is usually used in implementations. Imagescourtesy of Christopher Olah [24].

Figure 2.7: LSTM adds memory to each cell using four interacting layers inthe repeating module. Image courtesy of Christopher Olah [24].

23

Chapter 3

Health Monitoring Systems

Healthcare monitoring is a major part of the internet of everything (IoE),

which targets to connect not only physical devices, but people and processes

as well [25]. In this chapter, the focus is on outlining the technical challenges

and discussing the possible solutions. Privacy in healthcare is also discussed

briefly, however, healthcare privacy depends mainly on government legisla-

tions and corporate policies and thus requires a separate in-depth review.

Therefore, context awareness and knowledge sharing will be discussed here

as the main technological challenges towards an interconnected IoE health-

care platform.

Due to the growing elderly population, research in healthcare monitoring us-

ing ambient-assisted living (AAL) technology is crucial to provide improved

care while at the same time contain healthcare costs. Although the number

of health monitoring sensors are increasing as part of the IoE growth, there

24

are no robust systems to connect different sensors and systems to facilitate

knowledge sharing to empower health anomaly detection and prediction ca-

pabilities. These systems cannot use the data and knowledge of other similar

systems due to interoperability issues. Storing the information is also a chal-

lenge due to a high volume of sensor data generated by every sensor in the

IoE environment. However, state-of-the-art cloud platforms provide services

to solution developers to leverage the previously processed similar data and

the corresponding detected symptoms. Cloud-based platforms such as health

event aggregation lab (HEAL) (developed here) and cloud-oriented context-

aware middleware in ambient assisted living (CoCaMAAL) can provide ser-

vices for input sensors, IoE devices and processes, and context providers all

at the same time. The goal of these systems is to bridge the gap between cur-

rent symptoms and diagnosis trend data in order to accurately and quickly

predict health anomalies.

In this chapter, some of the state-of-the-art approaches to create a frame-

work that can act as a middleware between processed raw data and trends

and predicting knowledge are discussed. These systems are not only useful

for the data provider itself, but also for other systems that might lack the

necessary historical knowledge required to successfully detect and predict the

unforeseen anomalies.

A proposed HEAL model that seeks to act as a bridge between different

platforms is described in detail. This platform provides web services not

only for sensors and third-parties, but also tools for developers to leverage

25

previously processed similar data and the corresponding detected symptoms.

The proposed architecture is based on cloud and provides services for input

sensors, IoE devices, processes and people, and context providers. RESTful

services for developers of other systems are provided as well. A prototype of

the model is implemented and tested on a Microsoft Azure cloud platform

(the details are presented in section 7.3).

3.1 Predictive health monitoring

Population ageing, the phenomenon by which older people become a pro-

portionally larger share of the total population, is occurring throughout the

world. World-wide, the share of older people (aged 60 years or older) in-

creased from 9 per cent in 1994 to 12 per cent in 2014 and is expected to

reach 21 per cent by 2050 [26]. Due to technological advancements, older

people also live longer. This ageing population will create many challenges

for the health-care systems such as increase in diseases, healthcare costs, and

shortage of care givers. Thus, systems and processes are needed that will

help managing the healthcare demands of this population. One such solu-

tion known as ambient intelligent systems, may provide the answer to such

challenges. Ambient intelligent systems render their service in a sensitive

and responsive way and are unobtrusively integrated into our daily environ-

ment [27, 28]. Similarly, AAL has become a popular topic of research in

recent years. AAL tools such as medication management tools and medica-

26

tion reminders allow the older adults to take control of their heath conditions

[29, 30]. Usually, an AAL system consists of smart sensors, user apps, actua-

tors, wireless networks, wearable devices, and software services that provide

real-time data that can show the physical and medical condition of the pa-

tient [31]. However, as higher level insights from the data are required to

positively affect the life of the patients, an AAL system alone cannot provide

the necessary prediction and intelligent insights for such interventions.

IoE which consists of not only sensors, but people and processes as well, can

create a bigger picture of the daily data that is being recorded by AAL sys-

tems. In AAL, most of the data are collected from sensors, video, cameras,

and etc. at the low level. The resulting data to be processed is then stored

in a data lake with various types and formats. Processing and aggregation

of such data is a major challenge, especially when analyzing large streams

of physiological data in real-time, such as electroencephalography (EEG) and

electrocardiography (ECG). An efficient system depends on improved hard-

ware and software support [32]. Cloud computing and IoE devices are two

endpoint technologies that can support the above challenge of remote health-

care and data processing.

IoE can address the problems of inter connectivity between patients, physi-

cians and the ambient devices helping the care receiver. AAL devices (such

as laptops, smartphones, on board computers, medical sensors, medical belts

and wristbands, household appliances, intelligent buildings, wireless sensor

networks, ambient devices, and radio-frequency identification (RFID) tagged

27

objects) are identifiable, readable, recognizable, addressable and even con-

trollable via the IoE [33]. The enormous amount of information produced by

them, if processed and aggregated, can help in solving long-term problems

and can accurately predict emergencies. Of course, there are some challenges

when dealing with a large amount of heterogeneous patient data.

Each patient’s physiological data varies with different activities, age, and

from one individual to another. In order to process such data and to aggre-

gate it efficiently with other available data sources, a very large memory space

and high computing power are required. A comprehensive system requires a

complete knowledge repository and it must remain context sensitive to sat-

isfy different behavior profiles based on an individual’s specialized needs. But

performing such a massive task on a centralized model and location is failure

prone and slow [34]. However, cloud based and distributed frameworks are

more easily scalable and accessible from anywhere especially when combined

with IoE devices.

Several systems and middleware are proposed to address AAL data aggrega-

tion, processing, detection and even prediction [34, 14, 35, 36, 37, 38]. Most

of these systems are only tested in limited simulated areas and the data and

techniques are not actually used and leveraged by the elderly in the way

they require. Furthermore, their proposed solutions offer totally different

architecture for storing, processing, aggregating, and decision making. The

problem identified in all of the above systems is the absence of a single plat-

form that could act as a middleware for such systems to provide services that

28

all developers and healthcare systems can use to share trends, detection and

prediction knowledge among them.

Data fusion and integration is the first step towards gaining valuable knowl-

edge from multiple sources of data (i.e., sensors).

3.2 Data Fusion

Data fusion techniques are the methods and algorithms used to aggregate

the data from two or more sensors. Also called multisensor and sensor data

fusion, there are several techniques when dealing with either low-level or

high-level sensor data. Low level data fusion often deals with the raw input

of sensors and the techniques used to process and cleanse the imperfect in-

put data. Higher level data fusion techniques are often needed to retrieve

meaningful information from input sensors. Fig. 3.1 shows the basic joint

directors of laboratories (JDL) model for sensor fusion that addresses the

different sensor levels. This model was originally used for thread detection.

When dealing with raw sensor data, the process always starts at level zero.

There are many processing steps that should be applied to the raw sensor

data at each step.

Depending on the input sensor data quality, sensor fusion algorithms should

be able to deal with imperfect, correlated, inconsistent, and/or disparate data

[25]. At higher levels of data fusion, when objects and high-level information

are acquired, data cleaning algorithms, such as duplicate removal, are widely

29

Figure 3.1: JDL model levels

used. At the highest levels of sensor fusion, events are detected and extracted

from the fused sensor data. For example, in a system that detects a heart

attack, the input sensor data are binary bits from different wired and wireless

devices such as ECG , EEG , oxygen sensor, heart rate monitor, and probably

pixels from a 2D or 3D time-of-flight (ToF) video camera. At the higher

levels, the system is expected to detect anomalies from each device. At the

highest levels, events that can only be detected by fusing multiple sensor

data are detected and reported as the output of the system.

When dealing with IoE sensors, most of the times multisensor data fusion

is required and applied to the input sensors. Then the higher level data

fusion is applied to the events reported in the previous steps. Finally, events,

usually along with location, define the current context in which a device or

person is. Context awareness is the key in autonomous control and AAL.

30

Figure 3.2: How remote monitoring systems work in an AAL environment.

3.3 Ambient-Assisted Living (AAL)

AAL technologies provide a complete set of services ranging from input sen-

sors and context awareness to output actuators and third parties; all to

support an individual’s daily life. AAL systems can specifically assist people

who need special monitoring and care, e.g., patients with Alzheimer (See Fig.

3.2). These systems can monitor a patient’s daily activities and report any

anomalies to care takers or in a case of emergency, directly notify the emer-

gency medical services (EMS). Although these systems can be effective in

detecting and monitoring, they are usually not intelligent enough to predict

events based on historical data. Thus, they can currently be considered as

practical solutions for in-home patient monitoring and event detection; but

there are still many challenges for event prediction.

31

3.3.1 Context Awareness

An intelligent system’s capability to aid a person is maximized when it is

context aware, i.e., information about the location and surroundings of the

person being monitored is available. Knowing where the person is and the

activities he/she is engaged with, through a variety of sensors placed in dif-

ferent locations, brings in these context data. In a home environment, for

example, whether a person is brushing his teeth, washing his hands or simply

looking at the mirror cannot be distinguished by simply using the location

of the person. Using ID tags (such as near-field communication (NFC) or

RFID) for context identification, and complex video processing (e.g., using

red-green-blue-depth (RGB-D) cameras) are required. All these help context

aware systems to provide a better living environment by providing intelligent

support while monitoring.

Adopting a context aware environment is often challenging for users. Having

so many sensors around and especially some always worn by user (e.g., ac-

celerometer sensors for fall detection) is not welcomed by many users. Thus,

non-invasive approaches are naturally more acceptable to users. Locating a

user’s location at home using floor sensors are less invasive. Whereas carry-

ing a belt or smart phone 24/7 can be quite challenging in the adoption of

context aware systems.

32

3.3.2 Knowledge Sharing

Exchanging detection and prediction knowledge between monitoring systems

is vital especially in dealing with rare anomaly events. Training data is the

key to prediction and detection of events. An unknown event cannot be

detected or predicted with a system which has no historical data about the

sources or exposure with the event itself. In order for a system to predict an

event, it must have prior information about the event.

Often, it is quite unlikely that a new system has information about a rare

anomaly for a person, e.g., a heart attack. Nevertheless, this data can be

made available from the captured data in another monitoring system. Up

to this point, we could not find any comprehensive system that can act as

a link between two or more real-time health monitoring systems in order

to share historical data. This knowledge sharing is valuable, as it can save

lives. Especially in the spread of epidemic diseases, if there is no real-time

knowledge exchange mechanism for sharing the symptoms of a new type

of disease, the number of casualties may increase and disease containment

would be slower. Solving this problem requires a new model and computing

environment that can always be accessible for other monitoring systems.

Cloud computing platform as a service (PaaS) can be used in solving this

problem. Scalability and distributed design for both data sharing and com-

puting can help solve this problem. Plus, most prediction algorithms and

techniques are now vastly available in the cloud environment for further in-

tegration with other systems; making the cloud ecosystem suitable for this

33

task.

3.3.3 Real-time Decision Making

Accurate real-time decision making also requires dedicated computing power

and historical knowledge. Wearable devices usually do not possess these

capabilities and hence a central processing system can help with complex

prediction and classification computations. In addition to complex process-

ing, an always up-to-date knowledge base may be critical for time-critical

situations, e.g., a fast-spreading epidemic disease and multiple data center

failures. Thus, a cloud-based data warehouse, real-time data mining, and de-

cision making computing power can be critical even for the wearable sensor

devices, people, and processes in an IoE environment.

To achieve a reliable prediction capability, some previously seen anomalies

and events are usually required as shown in Fig. 3.3. The train data for

accurate future prediction may not actually be present in the current system

(e.g., a wide spread disease in another country with possible symptoms in a

new country). Thus, real-time data integration and historical data analysis

are necessary parts of anomaly detection and prediction for real-time decision

making.

34

Figure 3.3: How predicting future trends and anomalies require train datafrom past events.

3.3.4 Efficient Service Delivery

Most in-home care systems, such as Microsoft Health [39], IBM Watson

Healthcare [40] CareLink Advantage [41], only report events and emergen-

cies to specific family members and/or directly to the emergency units. This

might result in either missing an emergency situation (due to unavailability of

the care taker) or overcrowding the emergency units with false alarms. Thus,

intelligence plays an important role in IoE environments where every sen-

sor, person and operational process matters. Thus, such systems can make

current remote monitoring systems smarter by providing proactive detection

and prediction services (as illustrated in Fig. 3.4).

3.3.5 Comprehensive Monitoring System

Although many projects and systems have been proposed and implemented in

different research centers and industries, most of them only work with specific

35

Figure 3.4: Predicting an anomaly with the help of an intelligent detectionand prediction system.

equipment and in controlled scenarios. Not only do researchers have diffi-

culty accessing non-sensitive knowledge from such systems, but consumers

also suffer from a lack of affordable home-care solutions. If there existed

some comprehensive monitoring system standards, a competitive market for

wearable devices and monitoring hardware could help lower the prices and

increase the shared knowledge. In the same way, technologies like ZigBee

could help grow home-monitoring technologies, so a standard comprehensive

monitoring platform could also help join homogeneous sensors in a controlled

IoE scenario.

36

3.4 Existing Frameworks

There are some network-based and cloud-based scenarios for AAL scenarios.

OpenAAL [42] and universAAL[43], CoCaMAAL [34], and cloud prediction

platforms are some of the frameworks that have been developed recently to

address the challenges explained earlier. HEAL is the framework developed

from this research detailed in section 7.3.

3.4.1 AAL-based Frameworks

OpenAAL and its descendant universAAL have been implemented and tested

in some real-world scenarios. OpenAAL was a project supported by the Eu-

ropean union which became part of universAAL in 2010. UniversAAL was a

four-year project supported by the European union which is now continued

by ReAAL [44] to implement the project in real environments. The out-

come is that the universAAL platform currently being piloted in 9 counties

with 6000+ users [44]. UniversAAL is context-aware, especially on location,

and provides a network platform based on open services gateway initiative

(OSGi). Nodes are called AAL Spaces and can communicate with each other

as shown in Fig. 3.5. There is also a Native Android version available for

further development. This platform can be considered one of the most signif-

icant projects in the AAL movement, especially in Europe. It can be a very

good infrastructure or middleware, yet it is not providing a cloud-based plat-

form for setting up AAL Spaces and can automatically communicate with

37

Figure 3.5: AAL Spaces and AAL Platforms interaction.

AAL nodes. Although it can be deployed on cloud, there are many possible

challenges regarding its setup.

CoCaMAAL is another cloud-based platform proposed by Forkan et al [34].

The proposed platform is quite detailed and its authors have considered a

variety of services, sensor interactions, and ontology modelling. The platform

suggests the concept of context providers as high-level data providers. How-

ever, only some services deployed have been cloud-based and CoCaMAAL

has been only tested with simulation data. Yet, it lacks the notion of pre-

dictors for prediction and detection of anomalies. Forkan et al. proposed

an anomaly prediction schema for AAL later [14], but it lacks generalization

required to be used as part of a platform.

38

3.4.2 Cloud Prediction platforms

Because of the rapid advancement of cloud platforms, cloud-service providers

are now providing machine learning and prediction as part of their PaaS

services. Microsoft Azure Machine Learning [45] provides a platform capable

of predictive analytics for data scientists. Most of the machine learning

algorithms are implemented and available as drag and drop nodes in its

online studio. At the time of this writing, Azure ML is the newest amongst

others and it provides an excellent user interface for the customization of

prediction algorithms. It supports Python and R language scripts which

are used to manipulate the data and use several already implemented data

mining functions. It also supports deploying web services for each experiment

directly from its studio.

Apache MLLib [46] and Google Prediction [47] are also available to provide

prediction functionalities on cloud with implemented libraries and scalable

performance. These platforms can be used in conjunction with a health event

aggregation platform to provide data mining and prediction anomaly services

for an IoE environment. More detailed information on data analysis in cloud

can be found in the book by Talia et al. [48].

Microsoft is also providing a complete package for IoE , with Azure IoT suite

[49] . Combining Microsoft Azure’s cloud services with Power BI’s reporting

and analysis capabilities, Microsoft IoT suite delivers everything from real-

time sensor data ingestion and event processing to predicting analysis and

online reporting.

39

Starting with a fleet management demo (illustrated in Fig. 3.6), Microsoft

shows how the current health status of a truck driver can be seen live in an

app. Sensors send the information to the IoT suite; the sensor data goes

through different Azure cloud services, including Event Hubs and Stream

Analytics. Finally, the required event information reaches Power BI, which

enables rich data visualization, especially on Bing map. This suit and demo

can be beneficial in developing scalable cloud-based applications which in-

clude A to Z of an IoE monitoring platform.

Despite of the availability of these frameworks, there is no framework or

platform designed and implemented to address real-time health predictive

analytics. The cognitive application programming interfaces (APIs) are not

designed specifically for IoMT devices and the nature of forecasting is limited

to uni-variate time-series analysis. The challenge of designing an architecture

and model for multivariate temporal forecasting is addressed in this research.

3.5 Roadblocks

The advancement of technology and models is slower in healthcare compared

to consumer services. Below are some of the bottlenecks that slow down

advancements in an AI-based digital transformation in healthcare.

40

Figure 3.6: Fleet Management system demo utilizing Microsoft IoT suite.

3.5.1 Policies, Privacy, and Trust

Government policies are quite strict when dealing with privacy and informa-

tion exchange. The personal information protection and electronic documents

act (PIPEDA) in Canada, for example, creates rules for how private sector

organizations may collect, use, and disclose personal information. The law

gives individuals the right to access their personal information and governs

businesses for sharing information for commercial activities. Although such

legislation can protect personal information, they can limit the access to nec-

essary healthcare sensor data that is required to provide further analysis on

patients’ data.

Even if companies can access personal health information, protecting the in-

41

formation can become another issue. Security measures should be taken to

protect the information from external intrusions. Thus, security in every sec-

ondary site in which the personal information can be accessed is as important

as the primary information site. Both policies and security measures aim to

build trust in users’ mind. However, there are still concerns about the ap-

plication of the personal information. In particular, whether the information

retrieved is to be used in favor or against the individual is still a concern.

For example, insurance companies are interested in placing premiums based

on the current health and possible predictions of user’s health status. On

the other hand, similar predictions can help patients prevent diseases.

3.5.2 Security

Securing the data from unauthorized users should be a top priority from the

IoT devices to the cloud server and further into the front-end. Unauthorized

access to personal medical data has severe security consequences, both for

the company and the user. Thus, data transmission should be secured and

users should be authenticated and authorized.

To ensure data privacy, network messages between IoE services and devices

should never travel unencrypted. Depending on the type of the service, spe-

cific message and transport security algorithms are available. Secure socket

layer (SSL) can be used to secure most common RestAPI communications

via hypertext transfer protocol secure (HTTPS). However, the IoE devices re-

quire more powerful processors and should be able to update the encryption

42

algorithms as they become obsolete. As this is not possible in most cases in

the device side, message security can easily become obsolete due to lack of

upgradability in most IoT sensor devices.

Devices and users accessing a centralized IoE server should be authenticated

and authorized. Security tokens are widely used to authenticate each re-

quest to server. Bearer tokens enable authentication in each request and

expire after a specific time to enable full authentication. After authentica-

tion, a role-based authorization enables several levels of access to the system.

Authorization in an IoE system enables devices and people to interact with

a single system, accessing different layers of secured information.

3.5.3 Scalability

Regarding scalability, when the need arises for higher processing power, stor-

age or network bandwidth, dedicated servers are not easy to upgrade. Es-

pecially for real-time services, it is critical for a system to be able to scale

up without interruption. Cloud services are usually capable of scalability.

The performance of the system can be increased without the extensive need

for planning ahead for data migration and shutting down services during the

process. Thus, due to the changing nature of real-time event aggregation,

a cloud platform with scalability capabilities is required for IoE and in this

case, HEAL.

IoE devices and processes require a 24/7 available backend. One of the main

benefits of cloud-based platforms is the already enabled redundancy (also

43

available geo-redundancy) and high reliability. In case of a primary system

failure, the backup system automatically receives and processes the requests.

In a large-scale system, this can be critical as even seconds of failure can

cost losing millions of messages. Therefore, the importance of reliability and

availability of the backend servers should be considered in healthcare IoE

applications.

3.6 Research Trends in IoE Knowledge Shar-

ing Platforms

The platforms discussed in this chapter are the state of the art in IoE cloud

computing and have not yet been adopted and used in practice. Testing such

platforms in real scenarios require a variety of sensors and processes already

in place. The current research and evaluation is mostly limited to simulated

scenarios using data at rest. Thus, future research that can test different

case studies using these platforms in real-time and using streaming health

data can determine their strength and weaknesses. Future models can be

then designed to overcome the possible flaws.

Interconnecting different systems of sensors in IoE may infringe some poli-

cies or lead to conflict of interest between engaged people and processes from

different organizations. Research on the effects of these policies to the perfor-

mance and scalability of IoE cloud platforms can reveal limitations of these

systems in practice. Also, suggestions to change policies can facilitate the

44

operation of these systems.

3.7 Summary

In this chapter, challenges towards designing healthcare knowledge sharing

platforms, such as context awareness, knowledge sharing, real-time decision

making, efficient service delivery, and the need for a comprehensive monitor-

ing system are discussed. Some of the efforts to address these challenges in

a framework are then introduced, such as in OpenAAL, unversALL, CoCa-

MAAL, and the state-of-the art cloud prediction platforms. Then to address

these challenges in our work, the HEAL framework is proposed which tries to

act as a bridge between different monitoring systems. In all these platforms,

there are still some possible concerns regarding policies, privacy, security,

and scalability which should always be considered in designing and develop-

ing these systems. Finally, it is expected that future research trends cover

some of the mentioned challenges by developing and testing IoE knowledge

exchange frameworks in real-world scenarios.

45

Chapter 4

Health Data Representation for

Predictive Analytics

In this chapter we review the related technical studies regarding health data

representation and several approaches to data modeling for predictive ana-

lytics.

4.1 Related Works in Health Data Represen-

tation

Among researchers in health prediction frameworks, Forkan et al [34] pro-

posed a cloud-based middleware for AAL called CoCaMAAL. They tested

their concept with some performance tests (response time and arrival rate).

Later, in another work [14], they proposed a context-aware approach for

46

Figure 4.1: Hand-engineering and combining different techniques to modelhealth data by Forkan et al. [14]. Image courtesy of [14].

long-term behavioral change detection and abnormality prediction in AAL,

in which they assumed a linear trend model and used the Holt’s linear trend

method along with the HMM to forecast anomalies. They used partially

and fully synthetic data for testing. For this work, they hand-engineered a

solution for their data and combined four different models as shown in Fig.

4.1. Their prediction method ignores non-linearity between many parame-

ters involved in real-world scenarios. We believe that deep neural networks

can model efficiently the non-linearity for diagnosis prediction. Furthermore,

it can capture all the properties without the need of hand-engineering the

features required for prediction and forecasting.

The deep patient [7] study by Miotto et al. was a 2016 endeavor in dis-

ease diagnosis prediction using unsupervised deep learning. They used an

47

electronic health records (EHRs) dataset of 700,000 patients and achieved

an abstract representation of patient records using stack of denoising au-

toencoders (SDA). They compared their methods with principle component

analysis (PCA), K-means, generalized method of moments (GMM), and in-

dependent component analysis (ICA) by evaluating their disease prediction

preprocessing technique on 76,214 patients over the course of a year. The

results showed an improvement in accuracy, and in area under receiver operat-

ing characteristic (AuROC) from 0.88 to 0.93, and 0.69 to 0.77, respectively,

from the second-best performing reported technique, ICA. Nevertheless, this

work is limited in the number of diseases diagnosed (i.e. 78), reports poor

classification results for some diagnosis codes, and provides no model for

time, ignoring recency in data processing.

LSTM networks perform well even with minimal prior knowledge about a do-

main [50, 51]. Lipton et al [9] published a work in March 2017 which claimed

to be the first work on learning to diagnose with LSTM RNNs . They tested

LSTM varieties comparing with a base-line classifier using hand-engineered

features and a multilayer perceptron (MLP). Although their data included

429 diagnostic labels, they chose to only predict 128. Their proposed LSTM

model performs better than the baseline and similar MLP . LSTM proved to

be suitable for this task, requiring no manual feature engineering. In their

next published paper addressing the missing value issue [52], the authors

compared imputation and adding zeros along with signaling the LSTM net-

work when data is missing. However, they also discarded the time from the

48

Figure 4.2: RTP with a minimum gap by Iyad et al. [8]. Image courtesy of[8].

input data.

Iyad et al. [8] suggested RTPs for diagnosis prediction based on recent chains

of abstract temporal patterns (Fig. 4.2). Using a database with 602 differ-

ent ICD9 codes, they only diagnosed eight groups of diseases formed from

those codes using support vector machines (SVM). Then they focused on op-

timizing their pattern mining algorithm as it seemed rather inefficient. The

advantage of their method is the expressivity of the patterns. However, it

discards the long-term dependencies which can contain important informa-

tion. Furthermore, RTPs do not show significant improvement, and their

dataset of 13,558 health records is not generally available. The proposed

ITS method here shows some improvement over this method.

A 2018 study by Theodoridis et al. [53] on fall detection using 3D accelera-

tor data used LSTM to classify records of 70 subjects. The authors handled

variable input sequence lengths by keeping only the last second of the data.

49

Also, they used a multilayer deep RNNs , with 320k estimated weight pa-

rameters. We compare our model with this study and show how LM can be

trained by a smaller network even while considering all historical data.

Many studies limit the range of their dataset because of computational limita-

tions. For example, in a 2016 study on association between entropy measures

and mortality [54], the authors were not able to use all of the 24 hours of

data for such limitations. LM enables extensive and thorough studies over

any length of data.

Singh et al. [55] proposed an activity recognition modeling on Tim van

Kastern’s dataset to detect the activities using LSTM . Despite using a dif-

ferent technique, the problem of forecasting activities before they occur is

not addressed. It is often too late to detect anomalies and fall.

While IoT for health or IoMT has been the focus of studies in recent years

[56, 57, 58] and despite many solutions for recording data using wearable

devices and anomaly detection [59], the notion of long-term forecasting of

anomalies is still not addressed.

In chapter 5 we will explain the proposed LM representation of multivariate

historical data, which has properties that include handling variable long-

sequences, tolerance to missing values, prioritizing recent data and modeling

time implicitly in the sequence.

50

4.2 Data Representation Taxonomy

For modeling the process of predicting health anomalies from past activity

and health records, the ideal representation mapping model should keep the

following properties of data:

Completeness. Considers most relevant data available (both historical

and recent)

Recency. Recent data is more important than historical data.

Consistent Time Representation. Preserves time during processing.

LSTM understands sequences implicitly, but not necessarily the time

representation.

Sparsity handling. Health data is discrete with many missing records.

Fig. 4.3 shows how few times terminal cancer patients visited hospital

in the year before passing away.

Scalability. Handles long-sequences of data.

Fig. 4.4 displays how discrete health data with missing values look like.

Each row indicates a possible MSS of a subject (intervals in red). First,

such multivariate data covers a long period of time. Second the periods

are different for each patient (variable length). Finally, missing values are

dominant and shown as blanks in space.

51

Figure 4.3: Number of patients that had at least one admission in the last12, 6, 3, and 1 month/s of life, by cause of death, England 2004-2008. Ter-minal cancer patients visited hospital only a few times in the year beforepassing away. Source: Linked Mortality File, Office for National Statistics,annual mortality file and National Health Service (NHS) information center(Hospital Episode Statistics) [60]. Image courtesy of P Lyons and J Verne[61].

4.3 Current Techniques

There are generally three main approaches towards data of this nature.

1. Filling Missing Data

The first approach is to fill-in the missing data with zeros or imputation

(Fig. 4.5). One of the recent studies used this approach for health data

prediction by LSTM . Lipton et al. [52] filled the missing values with

zeros and hinted the LSTM with a set bit (1) about the missing value.

The problem with this approach is that it does not address the scale

of the data spanning many years. The variable length of data is not

addressed explicitly, as the records are made to have the same length

by zero-padding. The study also indicates no significant performance

52

boost using this approach.

2. Discarding Historical Data

More studies are just ignoring historical data by keeping nothing but

the most recent data [53, 8, 62], as shown in (Fig. 4.6). This approach

may sound intuitive, but it simply ignores potentially significant histor-

ical data (patient profile and history). In addition, in forecasting, and

in cases when the immediate recent data is not available, the approach

cannot be used robustly. This approach handles only long-sequences,

but does not handle missing values and variable-length data.

3. Discarding Missing Data (Removing Gaps)

The final approach that is taken when handling long sequences with

missing values is to simply remove the gap between the data points to

create a concise sequence (Fig. 4.7). Chan et. al [63] and Pam et al.

[64] used this approach. This approach does not result in a very long

sequence; however the short sequences will not have the same length.

This can be handled with a smaller padding compared with the other

two approaches. The main concern in this approach is the representa-

tion of time. For example, this approach does not differentiate between

two records containing three events each, occurring every other year and

every other day, respectively. LSTM understands sequences, not time.

Simply providing a field containing the date to LSTM would not solve

this problem.

53

Figure 4.4: An illustration of how an actual health dataset look like. MSS fordifferent subjects have missing values, variable length, and span over manyyears. NOTE: For illustration only. Not from an actual health dataset. Baseillustration image courtesy of [65].

Figure 4.5: First approach for data modeling is to fill-in the missing valueswith zeros. NOTE: For illustration only. Not from an actual health dataset.Base illustration image courtesy of [65].

54

Figure 4.6: Second approach for data modeling is to use none, but the mostrecent data. NOTE: For illustration only. Not from an actual health dataset.Base illustration image courtesy of [65].

Figure 4.7: Third approach for data modeling is to remove the gaps (themissing data) to create short sequences. NOTE: For illustration only. Notfrom an actual health dataset. Base illustration image courtesy of [65].

55

4.4 Summary

In this chapter we reviewed the recent works in health data representation

and predictive analytics and the challenges that are still present in the liter-

ature. The next chapter introduces the proposed LM .

56

Chapter 5

Life Model

5.1 Introduction

Section 1.1 described the major challenges in multivariate health data repre-

sentation and identified research questions in section 1.2. In this chapter the

proposed model and algorithm for modeling temporal sequences is provided.

The idea behind the LM was developed when looking for a way to feed all the

information available during an individual’s lifetime to a model. The ultimate

goal in this research is to train a model to predict future events based on all

of the information available in spite of having missing values and noise in the

data sets. For example, comparing two individuals, one exercising every day

for the past 10 years and another skipping exercise during the same period, a

system should be trained to predict higher fitness for the former individual.

However, recent events such as an accident in the gym or a mental issue could

57

change the prediction outcome drastically. Moreover, anomalies (such as a

stroke) may start showing symptoms just a few hours before they occur. The

question here is, how to model the data in such a way that both long-term

and short-term (and even real-time data) could be fed into the same model

for it to be able to predict events as accurate as possible?

The two challenges to get access to such a system is data acquisition and

machine learning. Data is already being collected in huge volumes from in-

dividuals every day and this trend will continue in the future. However, a

novel model to enable models to learn from years of historical data is still

missing. Machine learning models do not have an infinite capacity for many

dimensions. Among many features, it is often easier to impose a limit for the

time dimension, as it is a known feature and even the collected data could

determine this limitation. Thus, all the available data should be modeled

using a limited number of temporal data elements. For higher compatibility

with binary computer systems and wearable devices, we assume a k = 2n

elements limit can be provided as a hyperparameter to the system indicat-

ing the maximum number of elements that can be stored in the temporal

dimension. An example is n=5, and thus k=32 elements, to store the time

dimension. Now we need to find a way to use this limited number of elements

to place an emphasis on the most recent data.

The proposed model is defined as a novel way to model the time, with more

focus on the recent events and lower weight on the events far from the past

(without completely eliminating those long-term clues). To emphasize on

58

recent data, more intervals can cover the recent data and fewer intervals to

represent historical data. Starting from the most recent element, LM defines

such periods by doubling the interval covered by each element. For exam-

ple, if the most recent element is covering one second of real-time data, the

previous element would cover two seconds and so on. The data available in

each period can be transformed and reduced by an arbitrary function so that

each period contains the same data dimensionality. By only 32 iterations, the

32nd interval will cover 2,147,483,648 seconds, which is just over 68 years.

This simply means that we can put the most recent events and the most

historical events in a single array of 32 elements. This also makes it easier

to create similar 32-element arrays of different individual’s lives for better

comparison and an enhanced training by machine learning models.

There are two main types of temporal sequences to be modeled: multivariate

temporal sequence (MTS) and MSS . MTS can be explained as multivariate

time-series data with missing values and irregular intervals, mostly in the

form of a series of time-stamped records. MSS is already explained in detail

in section 2.2. The proposed model for mapping MTS to MIS is called as

life model for timeseries (LMts) and the model for mapping MSS to ITS is

called as LM for MSS (or simply LM ).

59

Figure 5.1: How LM models the data using mapping the data into periods toretain most properties of the data and address challenges of long multivariatedata. NOTE: For illustration only. Not from an actual health dataset. Baseillustration image courtesy of [65]

5.2 Life Model Definitions

5.2.1 Life Model for Time-series

The following proposed model produces a sequence of vectors with fixed

time intervals which is suitable for training hardware-constrained unrolled

LSTMs . Sequences are extracted from an MTS 1 so that it offers a concrete

representation of time, i.e., values are represented in a sequence of exponen-

tial intervals.

LM can handle the two major challenges in health prediction, which are long

variable-length sequences and missing values, as shown in Figure 5.1.

1Pronounced EmTeeEs

60

Having (Xt) defined as a vector at time t in the MTS X, the proposed

mapping model is defined as follows:

Definition 5.2.1. Let MTS ~X covering time ∆T , and ~MIS be two temporal

data vectors. The LM mapping function for a temporal sequence denoted

by ΛTS for a given δ ∈ R+, a compression factor set to 1.0 by default,

is defined over a period P , with n ∈ Z+, and k = 2n, chosen to satisfy

∆T < 2δk as:

ΛTSn, δ : Rl×s → Rk×s, l, k, s ∈ N ≡

MTS 7→ MIS ≡ 〈Xt1 , Xt2 , . . . , Xt1〉 7→ 〈V0, V1, . . . , Vi, . . . , Vk−1〉 (5.1)

where Vi is a vector of size s = |Xt|, mapping the period pi ⊂ P with length

∆ti ≤ ∆T which are defined as either:

pi =[−2δ(k−i) − 1, −2δ(k−(i+1)) − 1

)(5.2)

∆ti = 2δ(k−(i+1)) (5.3)

for modeling history over the period P =[−2δ(k) − 1, 0

); or:

pi =[2δ(i) − 1, 2δ(i+1) − 1

)(5.4)

61

∆ti = 2δ(i) (5.5)

for modeling future predictions over the period P =[0, 2δ(k) − 1

). The time

0 is considered as part of the future.

Lemma 5.2.1. Given ∆T and δ, n can be chosen as follows:

n = dlog2(log2(∆T )

δ)e (5.6)

Proof. From the definition of LM , given ∆T and δ, we have k = 2n, ∆T <

2δk. Therefore:

δk > log2(∆T ) ∴ (5.7)

k >log2(∆T )

δ∴ (5.8)

k = d log2(∆T )

δe ∴ (5.9)

n = dlog2(log2(∆T )

δ)e (5.10)

Lemma 5.2.2. Given different circumstances, parameters of LM can be cal-

culated directly using the following:

δ = d( log2(∆T )

k)e (5.11)

62

k = d( log2(∆T )

δ)e (5.12)

∆T = d2δke (5.13)

Proof. Similar to Lemma 5.2.1

V fi is the aggregated value of all Xf

t in the period pi and s is the number of

temporal variables (e.g,. “Accelerometer X”). For s = 4, the V ki vector can

look like the following:

Vi =

0.21

−0.45

0.93

0.71

−0.23

(5.14)

where V 4i = 0.71 indicates that the fourth temporal variable (e.g., “Ac-

celerometer Z”) had an average of “0.71” during the period of pi. The ag-

gregation function can be average, or any other defined mapping function.

This representation creates a concise sequence with more elements represent-

ing recent history. For instance, for an individual’s life, LM mapping with

n = 5 and k = 32 time steps, represents 68 years of life, as shown in Fig. 5.5.

The recent 12 elements of the sequence represent 212 seconds, or just over

an hour of history, while the last 20 elements represent a week. Thus, event

63

recency is incorporated into this concise representation. Future predictions

are also modeled similarly; however, with more focus on the near future.

Depending on the architecture of the many-to-many RNN , the length of the

prediction sequences can either match the history sequences or be different.

Even if the history and future sequence lengths match, the time representa-

tion can be different. For example, the future prediction period P in Fig. 5.5

is 8 years, using the same MIS length with a different δ = 0.9 compression.

Constraining k to be 2n creates fixed number of inputs for most situations.

Most sequences require a k between 16 and 32. Thus, with n=5, most of

such sequences fall within the k=32 sequence length.

Theorem 5.2.1. ∀n ∈ Z+, and δ ∈ R+, ∪2n−1i=0 pi = P .

Proof. The proof follows directly from the definition of pi and P .

Theorem 5.2.1 shows that the periods pi in Λn, δ map the entire period P

to MIS , leaving no gap in time.

Mapping and Reducing algorithm

An MIS tensor mapped from an MTS can be then reduced using an aggre-

gation function a defined as:

a : Rm×s → Rs,m, s ∈ N

The algorithm to map an MTS to an MIS is shown in Algorithm 1. The

algorithm finds the minimum value of n automatically based on the size of

64

X. If complexity of the mapping function a is linear (O(m)), there are k*s

mapping calculations (O(log(m)∗s)) plus a loop on input samples (O(m)) for

finding points in each period. Thus, the mapping and reducing complexity

can be as low as O(m + log(m) ∗ s). When s and k are small integers (for

example s=100 and k=32), the complexity can be considered linear (O(m)).

Algorithm 1: LM mapping of a MTS

Data: MTS X=〈Xt1 , Xt2 , . . . , Xtl〉, δ,s = |Xt|, a : Rm×s → Rs,m, s ∈ N

Result: MIS V = 〈V0, V1, . . . , Vi, . . . , Vk−1〉1 Finding parameters ∆T, and k (Assuming X is normalized to end in

0):2 ∆T = −Xt1 .start

3 Using Lemma 5.2.1, find n, k so that ∆T < 2δ(k).4 Start mapping:5 points=[]6 for i = 0; i < k; i ++ do7 Create pi8 points= Find all Xj in pi9 Vi = a(points)

10 end

5.2.2 Life Model for Multivariate State Sequences

Similar to the previous model, the following proposed model produces a

sequence of matrices with fixed intervals which is suitable for training RNNs .

The difference is in the form of the input (MSS vs MTS ) and the output (ITS

tensors vs MIS vectors). Even though our sequence is mainly extracted from

an MSS , it offers a concrete representation of time, i.e., values are represented

65

in sequence of exponential intervals. For example, a surgery that occurred

six months ago, followed by a four-month gap and two months of hospital

stay, is still present as a recent record in our ITS tensor, whereas in an MSS

vector, it has lost its recency and probably ignored when extracting RTPs

proposed by Iyad et al. [8].

The proposed mapping model for an MSS (introduced in equation 2.1) is

defined as:

Definition 5.2.2. Let MSS ~Z covering time ∆T , and ~LM be two vectors.

The LM mapping function denoted by Λ for a given δ ∈ R+, a compression

factor set to 1.0 by default, is defined over a period P with n ∈ Z+, and k =

2n, chosen to satisfy ∆T < 2δ(k) as:

Λn, δ : MSS 7→ ITS ≡ 〈E1, E2, . . . , El〉 7→ 〈S0, S1, . . . , Si, . . . , Sk−1〉

(5.15)

where Si is a matrix, mapping the period pi ⊂ P with length ∆ti ≤ ∆T which

are defined similarly to the period definitions Eq. 5.2 and Eq. 5.4.

Si is a matrix of size |F |× |V |, where F is the set of temporal variables (e.g.,

a temporal variable can be “Glucose”) and V is the set of abstraction values

(e.g, “High”).

Sf,vi is the total units of time where f th temporal variable is equal to vth

abstraction value in period pi. For |F | = 4 and |V | = 5, the Sik matrix can

66

look like the following:

Si = F

V

0 0 2 4 2

0 4 1 3 0

0 0 8 0 0

0 0 6 2 0

(5.16)

where S2,4i = 3 indicates that the second temporal variable (e.g., Creatinine)

had a “High” abstraction value three units of times (e.g, second) during the

period of pi. Another example of mapping from a MSS to an ITS is shown in

Figure 5.2. In this example, there are three variables (C, G, and B) and five

states (from very low to very high). For each Si, the values are calculated as

the number of times each value is present in the period. For example, in S7,

there is only one value which is C = High. Therefore, S0,37 = 1 and the rest of

the matrix is zero. Please note that in the diagram, there is no data available

for S0, S1, S2, and the rest of S3. In this example, missing information is

filled by replaced with zeros. Missing information may indicate anything

from an individual’s health to simply lack of information. This policy can be

changed based on the design and architecture of the learning model.

To normalize the values, the following transformation is applied to each Sk

matrix:

SiNormalized

=Si

∆ti(5.17)

which yields values in [0, 1] and is more suitable for machine learning algo-

67

Figure 5.2: An example of Λn mapping from an MSS Z of a dataset (originaldiagram from [8]) to an ITS . ∆T = 24 days for this instance. Thus, n = 3→k = 8 is chosen to cover the entire P . Some Si ∈ ITS = 〈S0, S1, . . . , S7〉are calculated and shown (before normalization).

rithms.

Mapping algorithm

ITS tensor can be mapped from either an MSS or a continuous time-series

directly. Using naive abstractions technique, latter could be done in O(|W | ∗

|F |), where W is the data acquisition window, and F is the set of temporal

variables. Another interesting option is to generate each Si using the RNN

encoder/decoder proposed in [51]. Here, we only consider the mapping from

a previously calculated MSS to an ITS tensor using the LM Λn.

For each period pi, first we find two lists that contain state interval2 El that

are either covering the period (Overi) or start/end inside it (Ini) (shown in

2Covered in section 2.2

68

Figure 5.3):

1. Find the temporal states that start or end in each period pi denoted by

vector Invi .

Invi = El ∈ MSS | pi.start ≤ Esl < pi.end ∨ pi.start ≤ Ee

l < pi.end, Evl = v

(5.18)

2. Find the temporal states that cover period pi denoted by vector Overvi .

Overvi = El ∈ MSS | Esl ≤ pi.start ∧ Ee

l ≥ pi.end, Evl = f, (5.19)

where 0 ≤ v < |V |

As the temporal states are assumed to have no overlap, the |Overvi | ∈ 0, 1.

Then the values of each element of the matrix Si is calculated as following:

Sf,vi =∑

Eu∈Invi

(min(Eef , pi.end))−max(Es

f , pi.start)) + min (1, |Overvi |) ∗∆ti

(5.20)

The algorithm to map an MSS to an ITS is shown in Algorithm 2. In lines

1 and 2, the algorithm finds the minimum value of n automatically based on

the size of the MSS Z. Then in lines 3 to 20, for each period, the portion of

the temporal interval that falls into each period is added to the corresponding

element in the matrix. The four relative positioning of each temporal state

69

pi

Δti

pi.start pi.end

pi-1 pi+1

2δ*Δti δ/2*Δti

E2E3

E1

E4

Figure 5.3: Relative position of temporal states El intersecting with pi. E1

starts and ends inside pi; E2 and E3 either only end or start in pi; and E4

neither starts nor ends in pi.

El in Z intersecting with each pi is illustrated in Figure 5.3. Line 6 finds

all Ejs intersecting with the current pi, which are Ejs neither starting after

nor ending before that pi. Line 7 separates state intervals covering the entire

current period pi from those which either start or end inside pi. If they

cover the entire period, the length of the current period is added to the

corresponding element of the matrix. Line 10 to 20 cover the rest of the

possibilities. If an interval fits inside a period or just ends in a period, that

period will no longer be effective on another element and thus the interval is

removed from the list to improve efficiency (lines 13 and 20).

70

Algorithm 2: LM mapping of an MSS

Data: MSS Z=〈E1, E2, . . . , El〉, δ, |F |, |V |, where Ej = (f, v, s, e)

Result: ITS L = 〈S0, S1, . . . , Si, . . . , Sk−1〉/* Finding parameters ∆T, δ, and k (Assuming Z is normalized to end

in 0) */

1 ∆T = −E1.start

2 Using Lemma 5.2.1, find n, k so that ∆T < 2δ(k).

// Start mapping

3 for i = 0; i < k; i ++ do

4 Create Si as a zero matrix of size |F | × |V |5 Create pi

// Find and map all Ej intersecting pi (Ejs neither starting

after, nor ending before pi)

6 forall Ej where NOT (Ej.start ≥ pi.end or Ej.end ≤ pi.end) do

7 if Ej.start < pi.start and Ej.end > pi.start) then/* Ej is covering pi */

8 SEj .f,Ej .vi += pi.length

9 else

10 if Ej starts AND ends in pi // Ej is inside pi only

11 then

12 SEj .f,Ej .vi += Ej.length;

13 Z.remove(Ej);

14 else if Ej only starts in pi // Ej may cover other pis

15 then

16 SEj .f,Ej .vi += (pi.end− Ej.start)

17 else if Ej only ends in pi // Ej coverage ends here

18 then

19 SEj .f,Ej .vi += (Ej.end− pi.start)

20 Z.remove(Ei);

21 end

22 end

23 end

71

5.3 LM Properties

5.3.1 Unit of time

Time unit is usually a property of the data. However, if changed or converted

to other units, it can change the P mapped by LM . For example, changing the

time unit from second to minute can expand the P by 60 times. Increasing

the sampling time unit, however, may increase the error rate as demonstrated

by Tim et al. [66]. Fig. 5.6b shows how different time units can change the

size of the sequence (indicated by the value of n) for different periods of time

(x-axis). Furthermore, depending on the choice of time unit and the length

of the time period, the sequences in MIS/ITS can either be filled completely,

or be left partially empty. The fill-ratio ik

is defined as the total number of

MIS/ITS elements that intersect with P . This ratio is shown in Fig. 5.6a.

5.3.2 Compression Ratio δ

Compression ratio δ changes the P covered by LM mapping. If δ is doubled,

the P is expanded by 2k (and similarly P is compressed for δ < 1 values).

Fig. 5.6c, 5.6d show the effect of choosing different values of δ in the size

and fill-rate of the sequence for different periods P . As it can be seen, the fill

ratio can be increased if a different value for δ is chosen. Thus, depending

on the required prediction time unit, an efficient δ can be chosen to either

maximize fill-ratio and/or minimize the length of the sequence (k).

72

10

0 n

ss

10

0 u

ss

10

0 m

ss

10

0 S

ecs

10

0 M

ins

10

0 H

ours

10

0 D

ays

10

0 W

eeks

10

0 M

onth

s

10

0 S

easo

ns

10

0 Y

ears

Time Period

ns

us

ms

Sec

Min

Hour

Day

Week

Month

Season

Year

Tim

e U

nit

Sequence fill-rate for different time units and periods

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(a) Sequence fill-rate for different timeunits and periods

100 n

ss

100 u

ss

100 m

ss

100 S

ecs

100 M

ins

100 H

ours

100 D

ays

100 W

eeks

100 M

onth

s

100 S

easo

ns

100 Y

ears

Time Period

ns

us

ms

Sec

Min

Hour

Day

Week

Month

Season

Year

Tim

e U

nit

'n' for different time units and periods

0

1

2

3

4

5

6

7

(b) n for different time units and periods

100 S

ecs

100 M

ins

100 H

ours

100 D

ays

100 W

eeks

100 M

onth

s100 S

easo

ns

100 Y

ears

Time Period

0.10.30.50.70.91.11.31.51.71.92.04.06.08.0

10.0

Com

pre

ssio

n R

ati

o (

delt

a)

Sequence fill-rate for different delta values and time periods

0.00

0.08

0.16

0.24

0.32

0.40

0.48

0.56

0.64

0.72

(c) Sequence fill-rate for different δ andtime periods

10

0 S

ecs

10

0 M

ins

10

0 H

ours

10

0 D

ays

10

0 W

eeks

10

0 M

onth

s1

00

Seaso

ns

10

0 Y

ears

Time Period

0.10.30.50.70.91.11.31.51.71.92.04.06.08.0

10.0

Com

pre

ssio

n R

ati

o (

delt

a)

'n' for different delta values and time periods

0

1

2

3

4

5

6

7

8

9

(d) n for different δ and timeperiods

Figure 5.4: The effect of different values of time unit ((a), (b)) and δ ((c),(d)) properties in covering different periods of time on ((a), (c)) sequencesfill rate and ((b), (d)) the n chosen by LM algorithm. (a) Depending onthe choice of time unit and the length of the time period, the percentage ofelements filled (fill-rate= i

k) can vary (1.0 indicates full sequence utilization

and values towards zero means only a single element of the sequence is used.(b) Life-long periods (100 years) can be represented by sequences of sizek = 32, 64, or 128(n = 5, 6, or 7). Sequence size k = 128 is only requiredif 100 years or more is being represented with ns (nanosecond) time unit.(c), (b) Effect of choosing different δ values in covering different periods oftime on (c) the MIS/ITS sequences fill rate and (d) the n chosen by LMalgorithm.

73

0

5

10

15

20

25

30

35

68 Years 4 Years 3 Months 1 Week 1 Day 1 Hour 1 Minute 1 Second 30 Seconds 5 Minutes 1 Hour 1 Day 1 Month 8 Years

Log2(time) Log10(time)

History Prediction

Now

Log2(Δti) Log10(Δti)

Δti :i : 0, 1, 2, … …, k-2, k-1 | 0, 1, 2, … …, k-2, k-1

Figure 5.5: LM mapping with n = 5. (Left) Sixty-eight years of an individ-ual’s history is represented by an MIS/ITS sequence of size k = 25 = 32,which is suitable for training a wearable LSTM deep network. (Right) Pre-diction is represented using the same sequence length, with a different timemapping compression parameter of δ = 0.9, representing 8 years in future byfocusing more on the near future.

5.4 Prediction and Forecasting using Life Model

For prediction and forecasting using LM , there are three scenarios:

1. Binary Prediction (Classification): In this scenario, the LM is

used to model the past, and the immediate future will be predicted.

This type of prediction can be also referred to as detection. Examples

are mortality or fall prediction. The format of the data would be:

k × d 7→ 1 where d is the dimension of the multivariate input. For an

MSS , d may have more than one dimensions. For a MTS , d is a scalar.

For this type of prediction, any binary classification method could be

used, and metrics such as Brier score can be used.

74

2. Binary Forecasting (regression or LM ): This input for this sce-

nario has the same format as binary prediction, however, the output

is now a forecast, not just a detection. In this case, we can model the

future in one of the following two ways:

(a) Period index (regression)

The time in future in which the binary event has occurred can be

represented using a single number, indicating the time in the fu-

ture. If we are looking at binary mortality forecasting, the forecast

of a person’s mortality in n months from now on can be expressed

as either:

i. n, if the output is representing a linear representation of the

time unit (e.g., month)

ii. i, the index of a LM period in which n falls within

For instance, a mortality prediction in 12 months can be expressed

linearly as 12, or 3, if we create LM periods with k = 4 and month

time-unit (periods would be < [0− 1), [1− 3), [4− 7), [8− 15) >

months).

(b) MIS (sequence to sequence (Seq2Seq))

In this model, the future would be modeled as a binary sequence

using LM periods. The elements after which the event occurred

would become 1. A sequence with all 0s would indicate a negative

class. Modeling 16 months using LM , and a positive occurrence

75

at month 9 would be modeled as < 0, 0, 0, 1 >. At this point, we

believe this would not give us any advantage over 2(a)ii for binary

forecasting, especially considering complexity of the model, and

the custom learning and evaluation metrics required for it (which

is not explored).

3. Multivariate Forecasting: Multivariate forecasting can be done us-

ing time-series or other techniques via LM . When the input data is

not suitable for time-series forecasting (explained in the previous chap-

ters), LM can be used. In this case, the only model would be a Seq2Seq

model with both input and output modeled using LM . For example,

diagnosis forecasting model with the shape of k × d 7→ k′ × d′.

5.5 Evaluation and Loss Metrics

Loss functions used for training neural networks are critical for proper train-

ing. Common loss functions do not account for element-wise exponential

increase in value. The proposed metrics calculates the actual errors in time

units used by LM .

LM periods add a skewness in time modeling due to exponentiation, and

thus for regression using LM periods 2(a)ii regular metrics such as mean

squared error (MSE) are not ideal. Thus, for binary forecasting using period

index (regression method) via LM periods, we introduce a new loss function

and metric called as tolerance error (TE), defined as follows:

76

Definition 5.5.1.

TE =

∣∣∣∣(2y − 2p)

2t

∣∣∣∣ (5.21)

where y=true value, p=predicted value, and t ∈ 1, 2, ..., k, is a tolerance

parameter defined for the problem.

TE calculates the exact difference between two values in terms of time unit.

For example, an error of 1 unit has a different meaning when this difference

is between 1 and 2 versus 4 and 5. The latter is 16 units of time difference

compared to only 2 units of time in difference. That is why common metrics

are not accurate for tensors mapped with LM .

The denominator determines a tolerance that is acceptable for each prob-

lem. For instance, in mortality prediction, we may or may not be interested

in whether the system is able to forecast accurate mortality withing a few

seconds or even days. This indication can be seen in the metric’s custom

design.

TE is suitable only for binary forecasting using the period index framed as

a regression problem. For sequence to sequence models using LM , a new

metric is required.

The following metric called as mean tolerance error (MTE) is suitable for

Seq2Seq analysis and is defined as follows:

Definition 5.5.2.

MTE =

√√√√ 1

n× F ×K

F∑i=1

K∑j=1

(|Yij − Pij| ∗ 2j

2t

)2

(5.22)

77

where Y and P , are true values and predicted values, respectively, and t ∈

1, 2, ..., k, is a tolerance parameter defined for the problem. F and K are

number of variables and the MIS length respectively and n is the number of

samples.

MTE can be used as a metric; however, to be used as a loss function as well,

it needs to be a differentiable function. Thus, given Y and P are tensors, we

rewrite MTE in a tensor multiplication form as follows:

MTE(Y, P ) =√∑

(|Y − P | ∗M)2 (5.23)

where M is a constant matrix defined as:

Mji =2i−t√F ×K

, for i ∈ 0, ..., K − 1 and j ∈ 0, ..., F − 1 (5.24)

MTE in tensor notation is now easily defined using differentiable functions

of the tensorflow kernel functions in order to be used as a loss function.

5.6 Applications

LM opens the door to many predictive analytics areas, healthcare in par-

ticular, by addressing the challenge of mapping long-term periods to concise

representations.

1. Healthcare

78

1 2 3 4 5 6 7 81 0 1 2 3 4 5 6 72 1 0 1 2 3 4 5 63 2 1 0 1 2 3 4 54 3 2 1 0 1 2 3 45 4 3 2 1 0 1 2 36 5 4 3 2 1 0 1 27 6 5 4 3 2 1 0 18 7 6 5 4 3 2 1 0

(a) MSE heatmap for k=8

1 2 3 4 5 6 7 81 0 0 0 1 2 4 8 162 0 0 0 1 2 4 8 163 0 0 0 1 2 4 8 164 1 1 1 0 1 3 7 155 2 2 2 1 0 2 6 146 4 4 4 3 2 0 4 127 8 8 8 7 6 4 0 88 16 16 16 15 14 12 8 0

(b) TE heatmap for k=8 and t=4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 321 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 312 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 303 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 294 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 285 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 276 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 267 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 258 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 249 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 2211 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2112 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2013 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1914 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1815 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1716 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1518 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1419 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 1320 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 1221 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 1122 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1023 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 924 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 825 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 726 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 627 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 528 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 429 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 330 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 231 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 132 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Predicted (p)

Actual (y)

(c) MSE heatmap for k=8 and t=4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 321 0 0 0 0 0 0 0 0 1 2 4 8 15 31 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##2 0 0 0 0 0 0 0 0 1 2 4 8 15 31 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##3 0 0 0 0 0 0 0 0 1 2 4 8 15 31 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##4 0 0 0 0 0 0 0 0 1 2 4 8 15 31 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##5 0 0 0 0 0 0 0 0 1 2 4 8 15 31 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##6 0 0 0 0 0 0 0 0 1 2 4 8 15 31 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##7 0 0 0 0 0 0 0 0 1 2 4 8 15 31 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##8 0 0 0 0 0 0 0 0 0 1 3 7 15 30 61 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##9 1 1 1 1 1 1 1 0 0 1 3 7 15 30 61 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##

10 2 2 2 2 2 2 2 1 1 0 2 6 14 29 60 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##11 4 4 4 4 4 4 4 3 3 2 0 4 12 27 58 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##12 8 8 8 8 8 8 8 7 7 6 4 0 8 23 54 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##13 15 15 15 15 15 15 15 15 15 14 12 8 0 15 46 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##14 31 31 31 31 31 31 31 30 30 29 27 23 15 0 31 93 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##15 62 62 62 62 62 62 62 61 61 60 58 54 46 31 0 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##16 ## ## ## ## ## ## ## ## ## ## ## ## ## 93 62 0 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##17 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##18 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ## ## ## ## ## ## ## ##19 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ## ## ## ## ## ## ##20 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ## ## ## ## ## ##21 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ## ## ## ## ##22 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ## ## ## ##23 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ## ## ##24 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ## ##25 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ## ##26 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ## ##27 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ## ##28 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ## ##29 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ## ##30 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ## ##31 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ##32 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 0

Predicted (p)

Actual (y)

(d) TE heatmap for k=32 and t=23 (1month)

Figure 5.6: The heatmap for MSE versus TE for k = 8 ((a), (b)) and 32((c), (d)). Both axis are index values upto k for either true value y (y-axis)or predicted value p (x-axis). The MSE cannot differentiate between thescale of difference on bigger index values while TE produces a larger error ifthe difference between predicted and true value has a different exponentialmeaning. For example, if the true value is 4 ((a), (b)), the predicted index2 is much closer to the actual value than the index 6. TE considers this,whereas MSE is unable to differentiate the two.

79

(a) Disease Diagnosis

(b) Anomaly Prediciton

(c) Cancer Care

(d) Post-surgery Monitoring

(e) Self Care

2. Lifestyle

(a) Life-style Choices

(b) Career Planning

(c) Investment Options

(d) Weight Management

(e) Parenting

3. Management

(a) Hiring Decisions

(b) Career Predictions

(c) Team Management

(d) Risk Management

(e) Project Planning

80

5.7 Summary

The proposed LM opens the door to many predictive analytics areas, health-

care in particular, by addressing the challenge of mapping long-term periods

to concise representations. For example, in healthcare, LM could be used

for disease diagnosis, anomaly prediction, cancer, post-surgery monitoring,

and in-home care. It could also be applied in lifestyle planning (e.g. career

planning, investment, fitness, weight management, and parenting) and in

management (e.g. team and risk management, project planning, and hiring

decision).

The main advantages of LM can be summarized in the following:

Enabling the modeling of history and future in a concise sequence

Decreasing the input size and the parameters of deep RNNs

Scalable (Maps long sequences)

Creating fixed-sized (or short variable sized) sequences

Missing values tolerance

Maintaining time representation

Emphasizing on recent data

Customizability

Can become a standard for temporal modeling

81

Chapter 6

Life Model Case Studies

6.1 Introduction

In this chapter, the experimental results for the following case-studies using

LM and comparison with previous study results are provided:

1. Fall Prediction and Forecasting

2. Mortality Prediction and Forecasting

3. Diagnosis and Procedures Forecasting

4. Diagnosis Prediction using simulated data

5. Activity Forecasting

82

6.2 Test Metrics

To evaluate the proposed solution, several metrics are used for different sub-

systems. Based on the case studies, predictive performance with focus on

comparison with the recent techniques (if possible) is considered.

Common metrics in machine learning literature to be used are as follows

1. Classification Metrics

Most machine learning algorithm and data structure techniques are

evaluated and compared by the set of precision and recall definitions

based on true positives (TP), true negatives (TN), false positives (FP),

and false negatives (FN). These metrics may have different names and

labels in different domains. In medicine for example, Type I and Type

II errors stand for FP and FN , respectively.

The two most commonly used metrics to compare binary prediction

models are AuROC and Brier score. Brier score is a score function

that measures the accuracy of probabilistic predictions. The lower the

Brier score, the more accurate and calibrated the predictions are. In

the common form, it is the mean squared error of the prediction.

These are used to compare LM algorithms and LSTM -based experi-

mental results with other techniques. Calibration and receiver operat-

ing characteristic (ROC) plots is also provided for such models.

2. Big O analysis and Responsiveness

83

Algorithm complexities are already provided for key algorithms in the

proposed solution. To test the deep learning convergence and mapping

processing speeds, we conduct empirical tests. Speed is the measure of

how fast each component of the system can finish their specific task.

3. Regression and Forecasting

For forecasting using the proposed methods described earlier in sec-

tion 5.4, in addition to regular regression metrics like MSE , proposed

metrics for LM which are TE and MTE are also used.

6.3 Test Datasets

Finding suitable medical and health data for research is always challenging.

There are three reasons for this difficulty:

1. Privacy issues.

2. Finding patients with desired diseases.

3. Long-term and robust monitoring for all subjects in the study.

Although electronic medical health records are now available for many of

the patients, these data sets are highly confidential, and do not contain real-

time health data. Thus, for novel techniques and approaches, most of the

proposed methods use either simulated data [14, 34], or datasets that are

not usually accessible by other researchers [34, 8, 7, 67]. Some datasets are

84

available that are too specific, e.g., for a specific group of diseases (patients

with diabetes only [68]). For example, the Tim Van Kastern’s public datasets

[69] provides activity recognition data of three homes. However, the data is

about activities only, and does not contain any health sensory data.

In this research, we used the following sources of data:

1. MIMIC III: MIMIC III [11, 70] is a large, freely-available database

comprising de-identified health-related data associated with over 45,000

patients who stayed in critical care units of the Beth Israel Deaconess

Medical Center between 2001 and 2012. The database includes in-

formation such as demographics, vital sign measurements made at

the bedside, laboratory test results, procedures, medications, caregiver

notes, imaging reports, and mortality (both in and out of hospital).

It specifically contains 68k ICD9 diagnosis codes, 80k lab codes, 116k

medication codes (from RxNorm), and the data contains variable length

records with missing values. The database is around 80GB in size with-

out the waveform data. MIMIC supports a diverse range of analytic

studies spanning epidemiology, clinical decision-rule improvement, and

electronic tool development. It is notable for three factors:

It is freely available to researchers worldwide

It encompasses a diverse and very large population of ICD patients

It contains high temporal resolution data including lab results,

electronic documentation, and bedside monitor trends and wave-

85

forms.

2. University of Rzeszow fall dataset (URFD): The proposed LM

model has also been tested and verified on a dataset called URFD

[71, 72]. The dataset contains 70 cases (30 falls and 40 normal daily

activities). We used the raw accelerometer data provided which in-

cludes 3D acceleration vector of ax(t), ay(t), and az(t) with time stamp

in milliseconds. The norm of the acceleration vector (magnitude) was

also provided. The data had been recorded in 30 Hz and each sample

is around 5-16 seconds.

3. Tim Van Kastern’s: Tim Van Kastern’s public datasets [69] for one

home for 28 days is used for activity forecasting. Each activity had a

start and end date.

4. Simulated Data

For preliminary testing and comparing algorithms, a temporal health

data simulator is implemented which is able to generate temporal ab-

stractions for any number of patients, temporal variables, and diagno-

sis/anomaly classes. The simulator starts with creating a normal pa-

tient and then injects disease patterns for a specific class by a specific

ratio. For example, for a patient with high blood pressure, the simu-

lator can replace 15% of blood pressure abstraction values to “High”

and 5% to “Very High”. Often each patient may have more than one

disease. For instance, a patient may have high blood pressure and high

86

last admissionprevious admissions

time

(training data) (forecasting target)

cut-off point for time shift

Figure 6.1: The forecasting data in MIMIC III dataset is prepared by shiftingthe final admission of each patient into future and forecasting based on theprevious admissions.

glucose symptoms simultaneously. Then each patient will be labeled

with one or more than one diagnosis for testing.

6.4 Mortality Models

To forecast mortality in the future, from over 34,000 valid cases in MIMIC

III dataset, only a little over 5200 patients had two or more admissions. Here,

both a mortality forecasting model is created on this subset and a mortality

prediction/detection model is trained on all valid records.

6.4.1 Mortality Forecasting

LM mapping enables forecasting for temporal sequences. The future is mod-

eled using LM mappings and the index of the binary target in the mapping

87

Table 6.1: Mortality forecasting results using different metrics modeled asLM period index as outcome.

Input Mapping MSE TE Recall (Accuracy)

LM 0.03 0.0447 98.37%Regular (gap removal) 0.04 0.0591 98.17%

array is used for prediction as the output of the model. The input itself, is

either mapped using LM or is generated by removing all the gaps and zero-

padding the array from the left. Accuracy or recall metric calculates how

many exact predictions are made by the model —ignoring any close calls.

MSE is calculated as a numeric difference between the actual and the pre-

dicted output. The model is trained using MSE as the loss function over 100

epochs on 5000 records with 80% for train and 20% for the test. For mor-

tality forecasting, the last admission is removed, the mortality flag is moved

into the future via a shift in time, and then a model is created to predict the

LM period index in which mortality flag is present (shown in Figure 6.1).

For negative flags (no mortality), the model is trained to generate -1 as the

output. The patient records are then modeled using an LSTM model. The

trained model achieved an MSE of 0.03 with 98.37% recall defined as the

exact difference between the predicted and forecasted mortality period. The

results of this method are shown in Table 6.1.

88

6.4.2 Mortality Detection

To evaluate the proposed LM for prediction, MIMIC III dataset is used to

predict patient mortality given the admission data. The goal is to predict

whether a patient has expired followed by their admissions as a sequence

binary classification.

Data From the total patients in MIMIC III hospital, those patients aged

15 and up, who are not organ donors, and have chart data are selected. The

result is a total of 34,755 patients. From these patients, 369 had invalid

discharge and death date which could not be used for this experiment. From

the remaining 34,419 patients, we used the first 1000, 10,000, and 34,000

subset of patient records to test the proposed method. For each patient,

a list of admissions, and for each admission, the assigned ICD9 diagnosis

codes and procedure codes were extracted. The patients had a total of 6,400

distinct assigned diagnosis codes and 1,971 distinct procedure codes, for a

total of 8,371 possible flags per admission per patient. Because the number

of diagnoses per patient per visit is 10, an example of the data for a patient

looks like the following list:

1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, ..., 0s, ..., 1, 0, 0, 1, 0, 1, ...0s

This list indicates a hospital visit in which ones (1s) in the list indicate the

presence of a specific diagnosis in that admission, followed by many zeros

89

which indicate the absence of other admissions and ending with ones and

zeros indicating the procedures that were performed on the patient. Each

record is accompanied by a date, which is used by LM to assign it to the

correct element.

LM is used to create a single fixed-size tensor for each patient, regardless

of number of admissions. An optimized implementation of LM written in

C# is used to map each patient data to a normalized MIS of length 32.

Each MIS element consists of a one-hot-code embedding of all ICD9 codes

(8,371 codes). To compare the effectiveness of LM mapping, a baseline MIS

which uses fixed-sized intervals is also generated. The periods in LM tensor

is mapped using the LM algorithm, and the periods in the fixed-size intervals

have the size 1k∆T = 1

32∆T .

Model The learning model used is the same LSTM network, with input

size of 8, 371× 32, followed by a recurrent dropout layer of 0.2, and another

dropout layer of 0.2, with a final 32× 1 dense layer as the output layer with

sigmoid activation. The dropout layers help prevent overfitting by ignoring a

random subset of weights during training. The model is trained using Adam

optimizer with binary crossentropy as the loss function to enable binary

output. For testing, Keras 2.0 wrapper on top of Tensorflow 1.4 toolkit is

used. Tensorflow is a popular open-source library for dataflow programming

mostly used for deep learning training. Models are trained on an Azure NC24

virtual machine with 4 Nvidia Tesla K80 GPUs , 24 Intel Xeon cores. This

90

Table 6.2: Accuracy, AuROC , and Brier score for LM versus fixed-sizedperiods mappings for Mortality Prediction.

Method LifeModel Fixed-sizeSamples-Metric Accuracy AuROC Brier Accuracy AuROC Brier

1,000 100% 100% 0.000 96% 95% 0.03310,000 99.6% 99.5% 0.0027 98.8% 98.6% 0.01134,000 84.2% 83.4% 0.122 80.3 80.0% 0.138

enabled us to train four models concurrently, each using a single GPU via

a Python 3.6 script. The code of this project is available at http://www.

github.com/manashty/lifemodel.

Results For each two mapping methods, i.e., LM and fixed-size, model

was trained on 1000, 10,000, and 34,000 samples. The final results are shown

in Table 6.2. For 1000 and 34,000 the LM helped the same model outperform

the fixed-size mapping. For 10,000 samples, the difference is negligible. Also,

the AuROC and Brier score indicate that the model trained by LM is a better

classifier in terms of precision-recall and calibration.

Fig. 6.2 shows the accuracy, loss, ROC curve, and calibration diagrams for

the best results for 1000, 10,000, and 34,000.

Performance Evaluation The results in this research are based on base-

line LSTM architectures that are not very deep which are suitable for wear-

able devices with limited resources. The results are expected to improve

using deeper architectures with enough training data. Such models may

take longer to train and be optimized; however, they are more likely to be

91

0 1 2 3 4 5epoch

0.76

0.78

0.80

0.82

0.84

0.86

0.88

acc

ura

cy

model accuracy

train LM 34k

test LM 34k

train FS 34k

test FS 34k

0 5 10 15 20 25 30epoch

0.75

0.80

0.85

0.90

0.95

1.00

acc

ura

cy

model accuracy

train LM 10k

test LM 10k

train FS 10k

test FS 10k

0 1 2 3 4 5 6 7 8 9epoch

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

acc

ura

cy

model accuracy

train LM 1k

test LM 1k

train FS 1k

test FS 1k

0 1 2 3 4 5epoch

0.25

0.30

0.35

0.40

0.45

0.50

0.55

loss

model loss

train LM 34k

test LM 34k

train FS 34k

test FS 34k

0 5 10 15 20 25 30epoch

0.0

0.1

0.2

0.3

0.4

0.5

0.6

loss

model loss

train LM 10k

test LM 10k

train FS 10k

test FS 10k

0 1 2 3 4 5 6 7 8 9epoch

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

loss

model loss

train LM 1k

test LM 1k

train FS 1k

test FS 1k

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

Tru

e P

osi

tive R

ate

Receiver operating characteristic

ROC curve LM 34k (area = 0.83)

ROC curve FS 34k (area = 0.80)

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

Tru

e P

osi

tive R

ate

Receiver operating characteristic

ROC curve LM 10k (area = 1.00)

ROC curve FS 10k (area = 0.99)

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

Tru

e P

osi

tive R

ate

Receiver operating characteristic

ROC curve LM 1k (area = 1.00)

ROC curve FS 1k (area = 0.95)

0.0 0.2 0.4 0.6 0.8 1.0Mean predicted value

0.0

0.2

0.4

0.6

0.8

1.0

Fract

ion o

f posi

tives

Calibration plots (reliability curve)

Perfectly calibrated

LM 34k (0.122)

FS 34k (0.138)

0.0 0.2 0.4 0.6 0.8 1.0Mean predicted value

0

100

200

300

400

500

600

Count

LM 34k FS 34k

(a)

0.0 0.2 0.4 0.6 0.8 1.0Mean predicted value

0.0

0.2

0.4

0.6

0.8

1.0

Fract

ion o

f posi

tives

Calibration plots (reliability curve)

Perfectly calibrated

LM 10k (0.003)

FS 10k (0.011)

0.0 0.2 0.4 0.6 0.8 1.0Mean predicted value

0

50

100

150

200

250

300

Count

LM 10k FS 10k

(b)

0.0 0.2 0.4 0.6 0.8 1.0Mean predicted value

0.0

0.2

0.4

0.6

0.8

1.0

Fract

ion o

f posi

tives

Calibration plots (reliability curve)

Perfectly calibrated

LM 1k (0.001)

FS 1k (0.033)

0.0 0.2 0.4 0.6 0.8 1.0Mean predicted value

0

5

10

15

20

25

30

Count

LM 1k FS 1k

(c)

Figure 6.2: Training and testing plots for mortality prediction on MIMICIII dataset for (a) 34,000 (34k), (b) 10,000 (10k), and (c) 1000(1k)patients. For each sample size, accuracy, loss, ROC curve (and AuROC(the higher the better)), and the model calibration with the brier score foreach method (the lower the better) is displayed. Results are for using LMmapping with n=5, and k=32 versus 32-element fixed-size (FS) MIS .

92

suitable for client-server environments. The proposed system architecture is

aiming to be considered real-time, because although the training takes hours

and days to complete, once completed, they can be used to evaluate records

within milliseconds, as the weights are transferred to the chips. The LM

mapping time is also real-time, with more than 30 samples per second in a

single thread for this large data-set. The implementation is using dictionary

hash maps which has O(1) complexity and can be transferred to hardware.

6.4.3 Diagnosis and Procedures Forecasting

Diagnosis forecasting is defined as predicting the exact diagnosis and pro-

cedures codes in the future defined by LM . To forecast MTS in the future,

by shifting the final admission of each patient to the future and placing the

second last admission to time zero, a sequence to sequence problem is mod-

eled using LM . With more than 250,000 (32 × 8731) input variables and the

same number of output variables, it may be ambitious to train a model with

only 5000 samples. However, to find the effectiveness of the proposed MTE

metric, this metric is compared with MSE for training effectiveness.

The proposed MTE metric is used both as the loss function and the metric

to evaluate the model, comparing it with MSE performance as loss function.

MSE and MTE are metrics that are calculated as a numeric difference be-

tween the actual and the predicted output. This model is again trained using

both MSE and MTE as the loss function over 100 epochs on 5000 records

(with 80%/20% train/test ratio).

93

Training the LSTM model using the proposed MTE function results in lower

MTE , which is a more meaningful metric for LM . The t-test statistics of the

differences of the means indicates that there is a significant difference between

the two results (MTE is significantly lower (p-value< 2.2e−16)) (Figure 6.3).

6.4.4 Discussion

From the loss and accuracy plots in Figure 6.2 for 10k and 34k samples,

it can be observed that the LM converges faster than fixed-sized mapping

although the fixed-size mapping eventually converges to the same point. This

is a critical factor for real-time and large-scale problems. For example, each

epoch of training for 34,000 patients takes about 2.5 hours using a modern

GPU in the virtual machine (VM) used. When redeploying a model, or

large-scale IoT data and million users are considered, a model that converges

faster has an edge on similar models with late convergence. This is shown

specially when the number of patients increase. LM can clearly capture the

temporal relation in different time-points faster than the fixed-size. This may

be due to the fact that recent data is repeated multiple times as the recency

is incorporated into LM .

94

Mean Tolerance Error (MTE) Mean Squared Error (MSE)

1.0e

−07

1.5e

−07

2.0e

−07

2.5e

−07

Loss Function

Mea

n To

lera

nce

Err

or (

MT

E)

(a)

1e−07

2e−07

3e−07

4e−07

0 100 200 300 400Epochs

Mea

n To

lera

nce

Err

or

Loss Function

MSE

MTE

(b)

Figure 6.3: (a) The boxplot for MTE of the training and testing error (mean)of a sequence to sequence model trained by either MTE or MSE over 100epochs. (b) shows the training MTE for different epochs.

95

6.5 Human Fall Prediction and Forecasting

6.5.1 Introduction

Recognizing internal activities of the human body based on biologically gen-

erated time series data is at the core of technologies used in wearable re-

habilitation devices [73] and health support systems [74]. Some commercial

examples include fitness trackers and fall detection devices. Wearable ac-

tivity recognition systems are composed of sensors, such as accelerometers,

gyroscopes or magnetic field/chemical sensors [75] and a processor used to

analyze the generated signals. Real–time and accurate interpretation of the

recorded physiological data from these devices can be considerably helpful in

the prevention and treatment of a number of diseases [76]. For instance, pa-

tients with diabetes, obesity or heart disease are often required to be closely

monitored and follow a specific exercise set as part of their treatments [77].

Similarly, patients with mental pathologies such as epilepsy can be monitored

to detect abnormal activities and therefore prevent negative consequences

[78].

However, most current commercial products only offer relatively simple met-

rics, such as step count or heartbeat and lack the complexity and computing

power for many time series forecasting problems of interest in real time. The

emergence of deep learning methodologies capable of learning multiple lay-

ers of feature hierarchies and temporal dependencies in time series problems

and increased processing capabilities in wearable technologies lay the ground

96

to perform more detailed data analysis on–node and in real time [79]. The

ability of performing more complex analysis, such as human activity classifi-

cation on the wearable device could potentially filter data streaming from the

device to host and save data bandwidth link. This data saving is more visible

in the cases where the classification task should be continuously preformed

on the patient such as in seizure detection for epileptic patients. However,

due to the high computational power and memory bandwidth required by

deep learning algorithms, full realization of such systems on wearable and

embedded medical devices is still challenging.

6.5.2 Hardware Considerations

Along with the useful aspects of the proposed model in improving the ac-

curacy of an RNN network, the reduction provided by the MIS input could

also be beneficial to hardware realizations. There are some variations on the

LSTM architecture. Consider the following model [80]:

hfn+1 = σ(W Tf · xn + bf )

hin+1 = σ(W Ti · xn + bi)

hon+1 = σ(W To · xn + bo)

hcn+1 = tanh(W Tc · xn + bc)

cn+1 = hfn cn + hcn hin;

hn+1 = hon tanh(cn);

97

where xn = [hn,un] and un, hn and cn are the input, output and cell state

vectors respectively at discrete time index, n. The operator denotes the

Hadamard element by element product. The variables hfn, hin, hon represent

the forgetting, input and output gating vectors. Finally, Wf , Wi, Wo, Wc

and bf , bi, bo, bc are the weights and biases for the different layers, respec-

tively. In this structure, the number of weights embedded in the layers is

defined as follows:

4× (HN + IWS)×HN (6.1)

where HN is the number of hidden neurons and IWS is the input window

size. Therefore, a reduction factor of X due to the MIS block could poten-

tially reduce 4×X ×HN number of weights and improve the performance

of the hardware in three directions:

Memory Consumption: Considering N bits for storing weights in

the memory, the MIS block can reduce the amount of memory stor-

age used in the hardware by a factor of 4 × X × HN × N bits. As

one of the main bottlenecks for the hardware realization of RNNs and

convolutional neural networks (CNNs) is the bandwidth required to

fetch weights in each operation, this significant reduction could be also

effective in this way.

Power Consumption: By reducing the number of weights by a factor

of 4×X×HN , the number of calculations required to produce an out-

put is also reduced. This reduction saves power consumption almost by

98

the same factor as there is a direct relationship between the operating

frequency of the system and power consumption. It should be noted

that power consumption is counted as the first hardware design priority

in wearable and portable applications.

Latency: This is the most obvious impact on the hardware design due

to the reduction of the number of calculations as fewer calculations

translates to less latency and response time of the system.

It should be noted that this reduction is even more significant when the

number of neurons or/and LSTM layers increases.

6.5.3 Models

Below are the possible models that can be used for prediction and forecasting.

6.5.3.1 Binary Prediction

LM mapping is used on the URFD fall records with an LSTM network as

a binary sequence classifier for fall/non-fall classification. The results are

compared with fixed-size periods, a 32-window mapping with each window

132

length of each sample. A two-layer neural network with 32 LSTM cells as

input layer and one dense layer as output is used (32x1). The input variable-

length (5-16 seconds, 30 Hz ) sequences cannot be used out of the box by

other classifiers. However, after LM mapping, the resulting fixed-size se-

quence is used to train other classifiers for comparison purposes. The results

99

Table 6.3: Comparison between LM and previous work on dataset.

Method/Feature Timespan Seq. Size LSTM layers Network Params Accuracy Params/Timespan ratioTheodoridis et al. [53], 2018 1s 30 1x2x1 320k 1.0 320,000

LM (Proposed) 16s 32 1x1 4.6k 1.0 2

are shown in Table 6.4. Machine learning models used in this experiment is

from Orange, an open source machine learning toolkit. The machine learning

algorithms used are commonly utilized by data scientists. As it can be seen,

the LSTM model could handle this sequence better than other models. Com-

pared to the previous work by Theodoridis et al. [53], we could achieve the

same 100% accuracy with LSTM using only the raw accelerometer data, and

not the extra feature previously proposed by the authors. However, using

the proposed LMts time mapping, we could consider all data from different

samples (up to 16x more data, compared to one second from their study),

with nearly the same sequence length (32 vs 30 from [53]). Also, this is

achieved with a single layer LSTM network with 80× fewer network weights

compared to their work—thus a suitable solution for wearable devices. It is

not possible to conclude that covering all 16 seconds of historical data was

beneficial for classification, as both models achieved the maximum accuracy.

However, the proposed method resulted in a fully calibrated model, with a

Brier score of 0.0 and AuROC of 1.0. Thus, we have used other scenarios for

further evaluation of the algorithm. Table 6.3 compares the proposed LMts

feature mapping with the previous study. The number of network parameters

in the table is calculated based on Eq. 6.1.

100

Table 6.4: Performance of the LM and fixed size periods for fall prediction

Design Classifier Accuracy Precision Recall AuROC

Naiive bayes (NB) 0.843 0.939 0.775 0.904LR* 0.900 0.923 0.900 0.958

Fixed Size SVM 0.843 0.872 0.850 0.958Random forests (RF) 0.857 0.895 0.850 0.921CN2 algorithm (CN2) 0.729 0.769 0.750 0.838LSTM 0.981 0.981 0.980 0.980

NB 0.886 0.900 0.900 0.967LR 0.871 0.878 0.900 0.958

LM SVM* 0.914 0.905 0.950 0.958RF 0.871 0.860 0.925 0.954CN2 0.843 0.809 0.950 0.879LSTM 1.00 1.00 1.00 1.00

6.5.4 Fall Forecasting

To forecast falls in the future, a new dataset is created based on the URFD

fall dataset. This dataset is created by shifting the temporal data to the

right, one second at a time, to generate a new test case in which the fall

moment occurs in the future, with all the future data being discarded except

the fall time stamp. Using this method, for all fall records ranging from 2-15

seconds, a little over 500 test cases are generated. The fall time stamp is then

modeled using LMts and the period index in which the fall has occurred is

given as the target of the prediction model. The history is modeled using

LMts . If no fall has occurred, the model output should be -1. Table 6.5

shows the results for the trained model using an LSTM network. As it can

be seen, using the model trained with LMts using TE as the loss function,

achieves a higher accuracy, despite missing values.

101

Table 6.5: Fall forecast results for up to 14 seconds with various metrics andlevels of missing values.

Input mapping MetricFall Forecast

all dataFall Forecast

10% missing dataFall Forecast

50% missing data

LMMSERecallTE

0.4586.96%0.01360

0.2083.15%0.01673

0.2785.56%0.02780

Fixed IntervalsMSERecallTE

0.5985.87%0.01143

0.1380.22%0.01114

0.2881.11%0.02346

6.5.5 Fall Forecasting with Missing Values

In order to evaluate the robustness of LMts with the presence of missing

values, two additional datasets with 10% and 50% of the data randomly

removed is also considered for testing and comparison. The results are in

Table 6.5. The results indicate that despite having missing values, LMts was

able to successfully forecast the fall up to 14 seconds before the fall.

6.6 Comparison with Recent Temporal Pat-

terns (RTPs)

To compare the effectiveness of the proposed ITS compared to RTP , an

experiment1 is conducted with 10,000 and 100,000 simulated patient records

using several machine learning algorithms.

1This study was presented in IDEAS 2017, UK and published in association for com-puting machinery (ACM) proceedings

102

6.6.1 Simulated Data

To generate the data, the data simulator initially generates a temporal record

of a patient p ∈ P with m different random time points and values where P

is the set of patients and m is a simulator hyper-parameter with the default

value of 100. Then segments of similar timepoints are merged together using

a sliding window to create intervals of the form (s, e)i | 0 ≤ s, e ≤ N for

temporal variables F and all random abstraction values Vf .

Defining the multivariate temporal abstraction sequence (MTAS) vector for

patient p in the form of:

~MTASp = 〈(f1, v1, s1, e1), (f2, v2, s2, e2), , (fn, vn, sn, en)〉 (6.2)

where n ≤ N, fn ∈ F is a temporal variable and F is the collection of all

temporal variables in simulation (|F | = 10 by default), and

vn ∈ V | V = “V ery Low”, “Low”, “Normal”, “High”, “V ery High”

(6.3)

is an abstract value, helps us define the initial state of each patient p in form

of a ~MTASp with vi = rand(i) | i ∈ [1, n].

After creating the random patients, each patient record (PR) is mapped to a

diagnosis d. This mapping can be showed as the following and is defined as

103

the PR:

PRp : (MTASp → dp) (6.4)

where di ∈ D is one of the |D| different diagnosis in simulation (|D| is 10

by default). The pattern injector (PI) is responsible for changing temporal

abstractions for each PR. Each di has two diagnosis signature: Primary (d1i )

and secondary (d2i ). Each diagnosis signature is a temporal pattern in form of

(F, V ). For example, (“Blood Pressure”, “V ery High”) is a diagnosis sig-

nature. The simulator randomly assigns two temporal patterns to each diag-

nosis di. Then using two global variables called pattern injection rate (PIR)

and shown as PIRj, j ∈ 1, 2 defined randomly between [0, 1) (by default

0.15 and 0.05 respectively), the PI replaces the temporal pattern (F, V ) part

of 100 ∗PIRj ∗ |MTASp| temporal abstractions to djp where j ∈ 1, 2. This

replacement changes up to 20% of the patient records based on the patient-

specific diagnosis.

6.6.2 Prediction Model

Presenting each PR as a sequence, makes it necessary to choose a suitable

classification algorithm. Thus, we use the LSTM cells using Microsoft cog-

nitive toolkit (CNTK) [81] to benchmark the proposed feature extractions.

As it is expected to have a sparse input data (i.e., with a lot of zeros), rec-

tifier units are used in both regular feedforward and RNN cells [82]. Next,

104

the classification results for classifying temporal sequence for diagnosis dp is

presented.

6.6.3 Results and Comparison

Simulated data is generated for |F | = 10, |V | = 5, |P | = 10, 000, 100, 000,

and K=400. RTP method is implemented based on [8] with a gap value of

5% of the time interval length N which is 500 units of time. Also, patterns up

to order 4 are collected (4-patterns). The results of the classification average

recalls are shown in Table 6.6. The first three algorithms are used from H2O

[83] machine learning package. RF is used with default 50 trees; gradient

boosting machine (GBM) is used with 50 trees as well and deep feed-forward

neural network (DFF) with rectifier activations with two hidden layers of

size 200 (200x200) is used. GBM and RF models are the most popular and

top performing machine learning models due to their boosting algorithms to

create strong models based on weak learners. For LSTM , CNTK 2.0rt is

used with the LSTM cells of size 200, dropout layer and a fully connected

layer (200) with softmax activation function for class outputs. For RTP , as

the patterns cannot be reshaped as sequential inputs, LSTM could not be

used. For others, the ITS features are presented as a flat feature list.

105

Table 6.6: Accuracy (Average Recall) results for 10,000 patients using dif-ferent techniques.

RF GBM DFF LSTM

RTP Train 76.04% 92.02% 99.99% N/ATest 69.85% 85.65% 74.25% N/A

Time (s) 67.0 184.6 47.3 N/AITS Train 77.92% 99.78% 99.99% 99.99%

Test 80.51% 88.60% 79.13% 89.4%Time (s) 100.4 214.2 56.4 80.1

Table 6.7: Accuracy (Average Recall) results for 100,000 patients using dif-ferent techniques.

RF GBM DFF LSTM

RTP Train 71.14% 76.16% 92.76% N/ATest 65.22% 54.35% 58.19% N/A

Time (s) 514 819 316 N/AITS Train 68.52% 82.24% 90.79% 95.14%

Test 69.09% 69.56% 65.07% 78.81%Time (s) 737.3 1628.2 648 1208.1

106

6.6.4 Discussion

RTP model is trained faster than ITS because it considers only the last 5% of

the time frame. However, the preparation time was twice as long compared

to ITS . Also, it seems like it is unable to capture some of the long-term

dependencies in the data. Generally, ITS is slightly better at classification

as it considers the long-term relationships. When using deep learning on

10,000 instances of data, the deep learning and GBM seems to overfit very

easily. Although the recognition rate is almost 100% in training, it can only

perform with around 80% of accuracy in testing which is slightly less than

random forests. GBM and LSTM have the same problem, although they

have the highest test average recall.

To see if larger amount of data can help with the overfitting problem, same

tests are performed using 100,000 instances. Results are shown in Table 6.7.

As it can be seen, this does not seem to solve the overfitting problem for

GBM and DFF . However, the LSTM network performs slightly better with

the sequence data given as ITS . After reviewing the confusion matrix of

the LSTM model training model, it was observed that the model is trained

efficiently by the train data. However, model does not perform well on the

test data. This is due to the overfitting problem in deep learning techniques

despite using drop-out algorithm for overcoming over-fitting. Although DFF

is prone to overfitting, the deep LSTM model seems to have been trained

more efficiently than all other models. The drop in the test results are more

tolerable than the DFF . Also, it performs better than all other algorithms

107

when trained with ITS features.

The system used for testing the algorithms is a Windows 10 Education on

a Dell OptiPlex 9020 PC with 16GB of Ram, Intel Core i7-4790 CPU @3.6

GHz with 4 cores and 8 hardware threads, and a Samsung 850 Evo 500 GB

solid state disk. H2O 3.10.4 is used for RF , GBM , and DFF and CNTK

2.0rc via Python 3.5.2 interface is used for the RNN with LSTM cells. The

running time for different models were comparable, however, RF was faster

than the other models.

6.7 Human Activity Forecasting

6.7.1 Forecasting Model

To test the LM for forecasting, Tim et. al [66] activity recognition database

is used. However, instead of activity recognition using raw sensors, the 28-

day list of activities of an individual is used to train a model to predict

the future activities based on the past activities. We generated 258 total

records from 28 days of data by sliding the pivot point of the temporal data

to create different past and future activity lists. The data was tested using

LM mapping versus a fixed-size (32) sequence represented by MIS and used

an encoder-decoder LSTM to predict a sequence of activities in the future.

The network has a dense hidden layer of size 256, with two LSTM encoder

and decoder layers of size 32 as input and output of the network. The system

was tested on the same environments as above.

108

Table 6.8: Accuracy and loss for LM versus fixed-size periods mappings foractivity recognition.

Periods Sequence Size Train Accuracy Loss Accuracy

Fixed 32 99.5 0.01 92.5LM 32 80 0.1 90.2

6.7.2 Results

The results are in table 6.8. As it can be seen, the model trained with LM

had difficulties converging however; both models achieved similar results in

test, compared to training with the same number of epochs.

6.7.3 Discussion

It appears that the network has some difficulties with LM input, due to

the fact that the activity input is in fact seasonal. The activities are being

repeated daily, and LM simply ignores it. This phenomenon is common in

time-series analysis. In fact, most time-series forecasting techniques (such as

ARIMA) first remove the following two from the data:

1. Seasonality

2. Trend

Then forecasting is performed and in the final stage, the removed seasonality

and linear trend is added back to the model. The removal and addition of

seasonality is not trivial in this case as our case is a multi-variate temporal

data with missing values and a linear trend or seasonality may be difficult

109

0 20 40 60 80 100epoch

0.0

0.2

0.4

0.6

0.8

1.0

acc

ura

cy

model accuracy

traintest

0 20 40 60 80 100epoch

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

loss

model loss

traintest

(a) Fixed-length

0 20 40 60 80 100epoch

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

acc

ura

cy

model accuracy

traintest

0 20 40 60 80 100epoch

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

loss

model loss

traintest

(b) Life Model

Figure 6.4: Training and validation set accuracy and loss function plots ofactivity prediction on Tim et. al [66] activity dataset. (a) Results for using32 LM MIS and (b) results for using fixed windows mapping with n=5, andk=32.

110

to remove. Thus, this can be investigated further in a future study to see

whether or not removing seasonality improves the performance of LM . An-

other test could be using a different loss-function. Also, the values are not

encoded using one-hot encoding, so that may be another causing issue.

6.8 Summary

In this chapter, we showed that LM can be used for mortality, fall, activity,

and disease prediction and forecasting. Several experimental results have

been conducted and results at different stages of the research were presented

as posters, papers in local and international conferences and journals. The

full list of publications can be found at the end of this document (Vitae).

Table 6.3 compares the proposed LM feature mapping with other techniques.

LM provides all the requirements for training sequential models efficiently.

In the experiment, LM can clearly capture the temporal relation in different

time-points faster than the fixed-size. This may be due to the fact that recent

data is repeated multiple times as the recency is incorporated into LM . RTP

[8] represents the data as temporal patterns. This technique however, is not

able to generate a fixed-length sequence from the extracted patterns and

as a recursive algorithm, it does not scale well. LM solves this problem

by enabling acceptance of sparse abstracted data as input and providing

fixed length output suitable for RNN s. The proposed technique also scales

well, however, similar to time-series techniques, it cannot provide a human-

111

Table 6.9: Comparison summary among LM and other techniques.

Method/Feature Scalable Sparse Fixed RNNs Expressiveness

Time-series Yes No No Yes NormalRTP (Batal et al. 2016 [8]) No Yes No No High

LM (Proposed) Yes Yes Yes Yes Normal

readable pattern as expressive as RTP .

The proposed solution, LM , provides a concise sequence to represent the

history or future, using the novel MIS tensors. LM algorithms and properties

enable MIS tensors to train LSTM networks efficiently in order to predict

anomalies and diagnosis from long historical records, even in the absence of

some values.

The main reasons why LM can provide a solution for many different predic-

tive problems are as follows:

1. LM or fixed-sized intervals enable a machine learning model to receive

all the information available for training and testing

2. LM or fixed-sized intervals embed the notion of time in the sequence

order of the input data (fixed-sized intervals perform better for seasonal

data)

3. LM emphasizes more on the recent data vs the historical data

4. The solutions provide the chance for system to model and predict future

as well, therefore becoming a powerful multivariate forecasting engine

112

for many temporal problems.

The next chapter in this research discusses deploying the proposed LM model

in a health monitoring test-bed environment to provide feedback to patients

and physicians with predictive health analytics support.

113

Chapter 7

Predictive Health Analytics

and Real-time Monitoring

Schema (PHARMS)

7.1 Introduction

To enable various applications of LM , an architecture, a framework, and

an implementation of those are necessary. For example, in scenarios shown

in Figures 1.3 and 1.4, a cloud-based software as a service (SaaS) imple-

mentation of the proposed solution is required. In this chapter, we describe

the newly developed PHARMS architecture, and the architecture of the two

main engines in it: HEAL [84, 85, 86] and LM . Because the implementations

of these models are technology-dependent, they are not considered as a ma-

114

jor contribution to this thesis and included mainly as guidelines and proof of

concepts.

Depending on the application, PHARMS can improve many solutions. In

this research, we are interested in considering it as one of the following two:

1. Diagnosis decision support system (DDSS)

A diagnosis decision support system (DDSS) is a clinical decision sup-

port system (CDSS) used by a clinician to get help with the diagnosis.

In general, CDSS is a system that provides physicians and health pro-

fessionals with clinical decision support. The decision support never

replaces a physician diagnosis and decision; it simply provides hints

and reference material for further investigation by the physician or

user. CDSSs are classified as either knowledge-based or non-knowledge

based systems. In knowledge-based systems, a set of IF-THEN rules

in a database that is generally obtained from medical textbooks are

used. Whereas in non-knowledge-based systems, patterns in past med-

ical records are extracted using machine learning techniques. DxPlain

[87, 88] is an example of a knowledge-based DDSSs . The research on

the non-knowledge-based systems is still in progress as many aspects

of medical data mining is still being researched (e.g., medical free-text

processing tools like Apache CTakes [89]).

The proposed PHARMS is a model for a non-knowledge-based DDSS .

2. A prediction system

115

As shown in Figures 1.3 and 1.4, PHARMS can be similar in foun-

dations to AAL technologies that monitor and assist people in their

everyday tasks. Many projects, especially in Europe, are focused on

improving AAL systems [90, 42, 91] so that the patients can be mon-

itored at home rather than in a hospital. Preventative goals are also

easier to achieve in such environments, e.g., making sure the patient

is taking his/her medicines as per the prescription. Here, PHARMS

focuses on anomaly prediction based on automated monitoring.

7.2 Schema

The proposed architecture for PHARMS , its engines, and its interaction

with other services and users, is shown in Figure 7.1. The two main engines

required for this system are HEAL and LM . HEAL defines the processes

involved in the interactions between PHARMS and the environment; whereas

LM defines the mapping between historical inputs and predictive outputs.

PHARMS interacts with the physicians and end-users via HEAL insights

layer. Also, the input streams of data are received and processed by HEAL

acquisition layer. As a whole, the proposed system provides all the compo-

nents required along with the novel data modeling technique provided by the

LM engine.

Systems implementing PHARMS are required to provide the following:

1. Clearly-defined models for history and future of each task

116

2. Services to retrieve/send real-time data from/to users’ edge devices

3. Predictive insights for end-users in real-time

4. Trained machine learning models to enable requirement 3.

7.3 Health Event Aggregation Lab (HEAL)

HEAL is a four-layer architecture which enables PHARMS to interact with

different distributed services in order to retrieve, process, and analyze data.

Furthermore, it provides service endpoint for both sensors and monitoring

systems to predict health anomalies accurately and quickly in real-time. The

HEAL Core is the module responsible for inter-layer communication and the

interaction between layers and other components.

Different layers of the HEAL architecture, shown in Figure 7.2, are as follows:

1. Analytics:

This layer is responsible for most of the core intelligence of the HEAL

framework. Any detection, prediction, regression, or data analysis will

occur in this layer. The results will add more knowledge to the cur-

rent information, preparing required data for decision support in the

Insights layer.

2. Insights

117

Clo

ud

PH

AR

MS

Re

alt

ime

Str

ea

m H

ub

Ap

ac

he

Sto

rm

(Op

en

So

urc

e)

Azu

re E

ve

nt

Hu

b

Str

ea

m A

na

lyti

cs

Azu

re S

tre

am

An

aly

tics

Da

sh

bo

ard

Ap

ac

he

Sp

ark

(Op

en

So

urc

e)

Eve

nts

Sto

rag

e

SQ

LN

oS

QL/B

lob

MV

CH

EA

L

Pre

pro

ce

ssin

g u

sin

g M

ach

ine L

ea

rnin

g/P

red

icti

on

En

gin

e

Azu

re M

ach

ine

Lea

rnin

g

Pre

dic

tio

n IO

(Op

en

So

urc

e)

Mic

ros

oft

R

Se

rve

r

Ap

ac

he

Ma

ho

ut

(Op

en

So

urc

e)

Go

og

le

Pre

dic

tio

n

Se

rvic

e L

aye

r

Se

cu

rity

La

ye

r

Mo

nit

ori

ng

Sys

tem

s

An

aly

sts

Da

ta E

xch

an

ge

Po

we

r B

I

Pre

dic

tio

n S

erv

ice

Pro

ce

sse

d D

ata

Lif

e M

od

el (L

M)

En

gin

e

Co

nte

xt

Pro

vid

ers

Eve

nt

Str

ea

ms

Mo

nit

ori

ng

Sys

tem

s

Pro

ce

ss F

low

Mic

ros

oft

Azu

re

Big

ML

IBM

Wa

tso

n

Sys

tem

Users

Da

ta I

npu

tTi

er 1

Pro

cess

ing

•M

SS, R

aw

Da

ta

Tier

2

Ma

pp

ing

•IT

S H

isto

ry•

ITS

Futu

re

Tier

3

Tra

inin

g

•Tr

ain

LST

M•

De

plo

y M

od

elA

cqu

isit

ion

Tran

sfo

rmat

ion

An

alyt

ics

Insi

ghts

HEA

L C

ore

Controllers

Sto

rage

HA

CC

Clo

ud

Dev

ice

UI

Fig

ure

7.1:

PH

AR

MS

,H

EA

L,

and

the

3-ti

erL

Men

gine

arch

itec

ture

s.B

oxes

show

nin

oran

gear

eth

eS

aaS

and

Paa

Sco

mp

onen

tsth

atca

nb

euse

dto

imple

men

tP

HA

RM

Sas

aS

aaS

rece

ivin

gin

put

from

IoT

edge

dev

ices

and

pro

vid

ing

insi

ghts

and

feed

bac

kto

use

rs.

118

Figure 7.2: HEAL Architecture

The top layer of HEAL is responsible for extracting meaningful infor-

mation using the data provided by the Analytics layer. Insights layer

then decides how that information should be leveraged, showing alerts

to assist doctors or performing actions such as calling the emergency

center.

3. Transformation

Transformation layer is responsible for stream processing, data sum-

marizing, event handling, and saving the processed events into the

database or forwarding it to the Analytics layer. Transformation layer

can utilize any technique and algorithms, such as complex event pro-

cessing (CEP), feature extraction, dimensionality reduction, etc.

4. Acquisition

119

The acquisition layer is responsible for acquiring data from the input

devices, sensors, and third parties. It will handle connections, proto-

cols, devices and streams. It can save the data directly to the database

or forwards them directly to the Transformation layer. The input is

mostly from monitoring systems.

Monitoring system Monitoring system consists of various sensing de-

vices; they collect raw data and send to the upper layer. Sensors can be

EEG , ECG , electromyography (EMG), accelerometer, fall detector, magneto

meter, gyroscope, motion sensor, blood pressure device, blood sugar sensing

etc. These sensors together form a body sensor network (BSN) [92] . Each

sensor operates on low power and can transmit the data wirelessly to an

upper layer in the cloud. Set up of the sensing devices varies individually.

Sensors can easily be added or removed from the system without affecting

the overall performance.

To summarize, HEAL can be used as the gateway to enable real-time anomaly

prediction for many users (Fig. 7.3). This concept may require two different

components that are required for an implementation of it in the analytics

layer: aggregators and predictors.

7.3.1 Aggregators

Aggregators are the bridge between the real time streams of data from the

monitoring systems or high-level streams of data from other parts of the

120

Figure 7.3: An overview of HEAL framework.

system, including context providers and predictors. As shown in Fig. 7.4,

these components will retrieve the data stream and use event processing lan-

guage statements provided by the system user, create a different abstraction

of the data, make it cleaner, more readable, or more prepared for aggrega-

tion. In this level, many different formats and data rates are provided. The

interpolator component interpolates missing data to increase the data rate

so that data stream can easily be aggregated with other data streams. In

the final step, the user has another opportunity to define more specific data

aggregation statements for the final output of the component.

Complex Event Processing (CEP) In this layer of the system, all the

incoming real time high level signals are passed through the high level com-

plex event processing language such as Esper and NEsper 1 to detect anoma-

1Esper’s home page and documentation are at http://www.espertech.com/esper. Es-per and NEsper are open-source software available under the GNU general public license

121

Figure 7.4: Proposed aggregator model for HEAL.

lies in the high-level data.

7.3.2 Predictors

Predictor is another novel component proposed for this model (Fig. 7.5). In

these distributed cloud-based components, data from a specific duration of

time or sequence are provided to the predictor as input. The predictor then

stores the data in its data warehouse (which is managed by the predictor

itself) and then using the prediction engine specified for its purpose will

create a prediction model to interpolate or extrapolate the data. The system

can then query the predictor to get future data, prediction errors, or possible

trends. Having separate distributed predictors help third parties and system

analysts share different prediction engines and have specific data warehouse

for their data. Some of the powerful current prediction engines are Google

cloud machine learning engine2 and PredictionIO3. Having a predictor instead

version 2 (GPL v2).2https://cloud.google.com/ml-engine/3https://predictionio.apache.org/

122

Figure 7.5: Proposed predictor model for HEAL platform

of a hard-coded machine learning model improves interoperability.

Cloud-based historical warehouse All the events, data and the infor-

mation about the anomalies are saved in the data ware house for the future

purpose. This data necessary for predictors to predict future trends and

anomalies and for setting the threshold for the various vital signs for a per-

son.

High level query services Access endpoint for the analytic systems with

REST4 and SPARQL5 endpoints. Such web services provide the interface for

other components and systems to request data.

4https://en.wikipedia.org/wiki/Representational_state_transfer5https://www.w3.org/TR/rdf-sparql-query/

123

7.4 Case Studies

7.4.1 Remote Dialysis

This study uses HEAL architecture and a robust algorithm to help organize

the people waiting for dialysis.The architecture for the automated remote

dialysis prediction (Figure 1.4) is shown in Figure 7.76. The goal of this

study is to determine the feasibility of an automated remote patient self-

assessment tool to prevent unnecessary trips or late dialysis.

With the self-assessment tools becoming available to patients, the proposed

cloud framework (HEAL) could retrieve data, analyze and predict the date a

patient requires dialysis, reducing costs, unnecessary trips, and renal failure.

A required self-assessment device capable of recording all the required sam-

ples was presented for the Qualcomm Tricorder X Prize7. HEAL framework,

combined with the self-assessment device, could be used to determine when

a dialysis patient requires to visit the hospital for dialysis. Using PHARMS ,

the patient with renal disorder will be able to receive proper insight on when

to visit the hospital for an in-time dialysis.

6The proposed method was presented as a poster in New Brunswick health researchconference (NBHRF) 2016 conference, NB, Canada.

7The Qualcomm Tricorder X Prize was a $10 million global competition to stimu-late innovation and integration of precision diagnostic technologies, helping consumersmake their own reliable health diagnoses anywhere, anytime. The winners of the 5-yearcompetition announced in Q1 2017.

124

Figure 7.6: HEAL core framework, an implementation of the HEAL archi-tecture. (Right) The structure of the implemented folders of the frameworkmatches the architecture. (Left) HEAL user interface used to process thedialysis study data.

Implementation

HEAL framework (Figure 7.6) is an implementation of HEAL architecture in

C# which is published as an open source software on GitHub [86]. For most

components, Microsoft Azure services can be used (e.g., IoT Hubs, Stream

Analytics, Machine Learning, Storage, and Cognitive Services).

The subsystems of the proposed framework are implemented as prototypes

and tested with experimental data on a cloud platform. Because of the variety

of the state-of-the-art services in Microsoft Azure cloud and support for both

open-source and Microsoft technologies, we have chosen Microsoft Azure for

cloud implementation of HEAL. Also, Microsoft Machine Learning is one

of the leaders in prediction service providers and supports parameterized

solution and support for R language though Microsoft R Server. The system

125

is tested with a running application on a Rasbberry Pie 2, sending real-time

signals to the Microsoft Azure Event Hub every 100ms. Event Hub is a real-

time event ingestor service that provides event and telemetry ingress to the

cloud at massive scales (millions of events per second), low latency, and high

availability [93]. Each event hub partition can handle 1 MB ingress and 2 MB

egress per second. Using default 16 partitions, our instance of Event Hub

can handle 16,384 messages of size 1 KB per second. The events are then

consumed by an instance of Azure Stream Analytics, which is a real-time

stream computation service providing scalable CEP . It also helps developers

to integrate real-time streams of data with historic records. Combined with

Event Hubs, Stream Analytics is capable of handling high event throughput

of up to 1GB/second [94]. The real-time system test indicated immediate

transfer of information from Raspberry Pie 2 to the Stream Analysis. The

final analysis results and detected anomalies is then pushed to a javascript

web client using SignalR in about one second.

Due to novelty of the scenario and required self-assessment devices, the pro-

posed method is tested using simulated data on 120,000 samples described

below. Each patient self-assessment sample contains 11 different parameters

including creatinine, international normalized ratio (INR), blood pressure,

and kidney failure history; each normalized to [0, 1]. Dialysis patients are

then classified into 3 primary groups using k-nearest-neighbor algorithm:

1. Past due (+24 hours past dialysis)

126

Figure 7.7: Four stages of the remote dialysis assessment study using HEALframework.

2. Requiring dialysis now (± 24 hours)

3. Require dialysis later (in +24 hours)

For this study, 120,000 overlapping noisy data samples from 1000 patients

(120 inquiries each) is simulated for baseline data evaluation. Data validation

using 10-fold cross-folding results show an overall 95.3% accuracy (average

recall) with only 1.3% FN rate. The system shows real-time performance

of 32 milliseconds including round-trip time to/from Microsoft Azure cloud

servers.

Further study using real patient data and physician supervision is the next

step in this study.

7.4.2 Mortality Prediction API

The mortality prediction model is deployed to Microsoft Azure cloud using

PHARMS schema to predict mortality based on admission diagnosis and

procedure codes. The webserver is setup using Flask and the model is trained

127

using 34,000 patients in Keras with Tensorflow backend. The API can be

accessed from: https://pharms.azurewebsites.net/api/v0.1/predict_

mortality

7.4.3 Fall Forecasting Mobile App

The proposed architecture and patterns are used by Foumani [95] in his B.Sc.

Honours Thesis to develop a fall forecasting mobile app on iOS platform. The

prediction model is hosted on the cloud as an AIaaS while the phone displays

the probability of falling up to 200ms in advance. The step is for the mobile

application to be ported to an Apple watch and also for the model to forecast

the fall as far in the future as possible.

7.5 Summary

PHARMS enables various applications of LM via an architecture, a frame-

work, and an implementation of those. With artificial intelligence as a service

(AIaaS) just around the corner, PHARMS can be used to facilitate sequence

to sequence temporal problems.

128

Chapter 8

Conclusion and Future Work

In real-time IoT predictive analytics, modeling a lifetime of an individual’s

medical history in a short, concise sequence is a challenge. The model should

be robust and preserve the concept of time for variety of examples despite

the missing values; especially in an IoT system, in which real-time predic-

tion depends on both recent data and historical records. The proposed LM

opens the door to many predictive analytics areas, particularly in health-

care, by addressing the challenge of mapping long-term periods to concise

representations.

The proposed solution, LM , provides a concise sequence to represent the

history or future, using the novel ITS and MIS tensors. LM algorithms and

properties enable ITS/MIS tensors to train LSTM networks efficiently in

order to predict anomalies and diagnosis from long historical records, even

in the absence of some values.

129

LM provides all the requirements for training sequential models efficiently.

In the experiment, LM can clearly capture the temporal relation in different

time-points faster than the fixed-size. This may be due to the fact that recent

data is repeated multiple times as the recency is incorporate into LM .

When redeploying a model, or large-scale IoT data and million users are

considered, a model that converges faster has an edge on similar models

with late convergence. This is shown specially when the number of patients

increase. LM can clearly capture the temporal relation in different time-

points faster than the fixed-size. This may be due to the fact that recent

data is repeated multiple times as the recency is incorporated into LM .

LM is used to predict and forecast mortality of up to 34,000 patients from

MIMIC III dataset based on their diagnosis and procedures codes. The re-

sults show improvement in the model trained by LM -mapped data compared

to fixed-sized intervals. Also, human fall forecasting is also accomplished for

the first time in this thesis. Furthermore, a new LM -powered PHARMS

enables design and implementation of predictive health analytic systems.

PHARMS uses deep learning for real-time minimally-invasive intelligent ac-

tivity monitoring and predictive analysis in a medical IoT scheme. The

models, algorithms, techniques, and the architectures proposed here are the

main contributions of this research.

A future step would be to make temporal sequence forecasting methods ex-

plainable, so that a physician and a model can work synergically to effectively

enhance healthcare. It is becoming more important to make decisions trans-

130

parent, understandable and explainable in health systems, due to rising legal

and privacy aspects [96].

The next steps in this research include deploying the proposed method in

a test-bed environment to provide feedback to patients and physicians with

predictive health analytics. Furthermore, diagnosis and fall forecasting for

vulnerable individuals are the next scenarios to be considered in LM appli-

cations.

131

Bibliography

[1] US Government, “Healthcare Budget US 2017.” [Online]. Available:

https://www.cbo.gov/topics/health-care

[2] M. U. Majeed, D. T. Williams, R. Pollock, F. Amir, M. Liam,

K. S. Foong, and C. J. Whitaker, “Delay in discharge and its

impact on unnecessary hospital bed occupancy,” BMC Health Services

Research, vol. 12, no. 1, p. 410, nov 2012. [Online]. Available:

https://doi.org/10.1186/1472-6963-12-410

[3] D. I. McIsaac, K. Abdulla, H. Yang, S. Sundaresan, P. Doering, S. G.

Vaswani, K. Thavorn, and A. J. Forster, “Association of delay of urgent

or emergency surgery with mortality and use of health care resources: a

propensity scorematched observational cohort study,” Canadian Medical

Association Journal, vol. 189, no. 27, pp. E905–E912, jul 2017. [Online].

Available: http://www.cmaj.ca/content/189/27/E905.abstract

[4] KenSci, “KenSci: Predictive Risk Management Platform for Healthcare

Powered by Machine Learning.” [Online]. Available: http://kensci.com/

132

[5] A. E. W. Johnson and A. A. Kramer, “Mortality prediction and acuity

assessment in critical care,” Ph.D. dissertation, University of Oxford,

2014.

[6] J. E. Zimmerman, A. A. Kramer, D. S. Mcnair, and F. M. Malila,

“Acute Physiology and Chronic Health Evaluation (APACHE) IV: Hos-

pital mortality assessment for today’s critically ill patients*,” vol. 34,

no. 5, pp. 1297–1310, 2006.

[7] R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, “Deep

Patient: An Unsupervised Representation to Predict the Future

of Patients from the Electronic Health Records,” Scientific

Reports, vol. 6, p. 26094, may 2016. [Online]. Available:

http://dx.doi.org/10.1038/srep26094http://10.0.4.14/srep26094http:

//www.nature.com/articles/srep26094#supplementary-information

[8] I. Batal, G. F. Cooper, D. Fradkin, J. Harrison, F. Moerchen,

and M. Hauskrecht, “An efficient pattern mining approach for event

detection in multivariate temporal data,” Knowledge and Information

Systems, vol. 46, no. 1, pp. 115–150, 2016. [Online]. Available:

http://people.cs.pitt.edu/∼milos/anomaly/

[9] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzell, “Learning to

Diagnose with LSTM Recurrent Neural Networks,” Iclr, pp. 1–18,

2016. [Online]. Available: http://arxiv.org/abs/1511.03677

133

[10] T. L. M. V. Kasteren, G. Englebienne, and B. J. A. Krose,

“Human Activity Recognition from Wireless Sensor Network Data

: Benchmark and Software,” Activity Recognition in Pervasive

Intelligent Environments, vol. 4, pp. 165–186, 2011. [Online]. Avail-

able: http://link.springer.com/chapter/10.2991/978-94-91216-05-3

8%5Cnhttp://dx.doi.org/10.2991/978-94-91216-05-3 8

[11] A. E. W. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman,

M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Anthony Celi, and

R. G. Mark, “MIMIC-III, a freely accessible critical care database,”

Scientific Data, vol. 3, p. 160035, may 2016. [Online]. Available:

http://dx.doi.org/10.1038/sdata.2016.35http://10.0.4.14/sdata.2016.35

[12] H. Soleimani, W. Nicola, C. Clopath, and E. M. Drakakis, “A

High GOPs/Slice Time Series Classifier for Portable and Embedded

Biomedical Applications,” arXiv preprint arXiv:1802.10458, 2018., feb

2018. [Online]. Available: http://arxiv.org/abs/1802.10458

[13] A. Rahim, M. Forkan, I. Khalil, and M. Atiquzzaman, “ViSiBiD

: A learning model for early discovery and real-time prediction

of severe clinical events using vital signs as big data,” Computer

Networks, vol. 113, pp. 244–257, 2017. [Online]. Available: http:

//dx.doi.org/10.1016/j.comnet.2016.12.019

[14] A. R. M. Forkan, I. Khalil, Z. Tari, S. Foufou, and A. Bouras, “A

context-aware approach for long-term behavioural change detection

134

and abnormality prediction in ambient assisted living,” Pattern

Recognition, vol. 48, no. 3, pp. 628–641, 2014. [Online]. Available:

http://dx.doi.org/10.1016/j.patcog.2014.07.007

[15] Y. Shahar, “A framework for knowledge-based temporal abstraction,”

Artificial Intelligence, vol. 90, no. 1-2, pp. 79–133, 1997.

[16] A. Karpathy, “The Unreasonable Effectiveness of Recurrent Neural

Networks.” [Online]. Available: http://karpathy.github.io/2015/05/21/

rnn-effectiveness/

[17] X. Xi, “Further applications of higher-order Markov chains and

developments in regime-switching models,” Ph.D. dissertation, The

University of Western Ontario, 2012. [Online]. Available: http:

//ir.lib.uwo.ca/etd/678/

[18] H. T. Siegelmann and E. D. Sontag, “Turing computability with neural

nets,” Applied Mathematics Letters, vol. 4, no. 6, pp. 77–80, 1991.

[19] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and

J. Schmidhuber, “LSTM: A Search Space Odyssey,” 2016.

[20] H. Siegelmann and E. Sontag, “On the computational power

of neural nets,” Comput. Complexity, vol. 117, pp. 285–306,

1992. [Online]. Available: http://binds.cs.umass.edu/papers/1992

Siegelmann COLT.pdf

135

[21] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A Critical Review of

Recurrent Neural Networks for Sequence Learning,” pp. 1–38, 2015.

[Online]. Available: http://arxiv.org/abs/1506.00019

[22] A. Goodfellow, Ian, Bengio, Yoshua, Courville, “Deep Learning,” 2016.

[Online]. Available: http://www.deeplearningbook.org/

[23] S. Hochreiter, “The Vanishing Gradient Problem During Learning

Recurrent Neural Nets and Problem Solutions,” International

Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,

vol. 06, no. 02, pp. 107–116, 1998. [Online]. Available: http:

//www.worldscientific.com/doi/abs/10.1142/S0218488598000094

[24] C. Olah, “Understanding LSTMs,” 2015. [Online]. Available: http:

//colah.github.io/posts/2015-08-Understanding-LSTMs/

[25] B. Khaleghi, A. Khamis, F. O. Karray, and S. N. Razavi,

“Multisensor data fusion: A review of the state-of-the-art,” Information

Fusion, vol. 14, no. 1, pp. 28–44, 2013. [Online]. Available:

http://dx.doi.org/10.1016/j.inffus.2011.08.001

[26] United Nations Population Division, “The world population situation

in 1970,” New York, pp. vi, 78, 1971.

[27] M. S. Emile Aarts, Rick Harwig, “invisible Future,” Ambient intelli-

gence, pp. 235–240, 2001.

136

[28] E. Aarts, “Ambient Intelligence: A Multimedia Perspective,” pp. 12–14,

2004.

[29] I. Qudah, P. Leijdekkers, and V. Gay, “Using mobile phones to improve

medication compliance and awareness for cardiac patients,” Proceedings

of the 3rd International Conference on PErvasive Technologies Related

to Assistive Environments - PETRA ’10, p. 1, 2010. [Online]. Available:

http://portal.acm.org/citation.cfm?doid=1839294.1839337

[30] K. a. Siek, D. U. Khan, S. E. Ross, L. M. Haverhals, J. Meyers, and

S. R. Cali, “Designing a personal health application for older adults

to manage medications: A comprehensive case study,” in Journal of

Medical Systems, vol. 35, no. 5, 2011, pp. 1099–1121.

[31] F. Sufi, I. Khalil, and Z. Tari, “A cardiod based technique to identify

Cardiovascular Diseases using mobile phones and body sensors,” in 2010

Annual International Conference of the IEEE Engineering in Medicine

and Biology Society, EMBC’10, 2010, pp. 5500–5503.

[32] P. Remagnino and G. L. Foresti, “Ambient intelligence: A new multi-

disciplinary paradigm,” IEEE Transactions on Systems, Man, and Cy-

bernetics Part A:Systems and Humans., vol. 35, no. 1, pp. 1–6, 2005.

[33] J. Cubo, A. Nieto, and E. Pimentel, “A cloud-based internet of things

platform for ambient assisted living,” Sensors (Switzerland), vol. 14,

no. 8, pp. 14 070–14 105, 2014.

137

[34] A. Forkan, I. Khalil, and Z. Tari, “CoCaMAAL: A cloud-

oriented context-aware middleware in ambient assisted living,” Future

Generation Computer Systems, vol. 35, pp. 114–127, 2014. [Online].

Available: http://dx.doi.org/10.1016/j.future.2013.07.009

[35] A. Copetti, J. C. B. Leite, O. Loques, and M. F. Neves, “A decision-

making mechanism for context inference in pervasive healthcare

environments,” Decision Support Systems, vol. 55, no. 2, pp. 528–537,

2013. [Online]. Available: http://dx.doi.org/10.1016/j.dss.2012.10.010

[36] K. WONGPATIKASEREE, A. O. LIM, M. IKEDA, and

Y. TAN, “High Performance Activity Recognition Framework for

Ambient Assisted Living in the Home Network Environment,”

IEICE Transactions on Communications, vol. E97.B, no. 9,

pp. 1766–1778, sep 2014. [Online]. Available: http://www.

researchgate.net/publication/272210598 High Performance

Activity Recognition Framework for Ambient Assisted

Living in the Home Network Environment

[37] Y. Xu, P. Wolf, N. Stojanovic, and H.-J. Happel, “Semantic-

based Complex Event Processing in the AAL Domain Semantic-

based Event Processing in AAL,” 9th International Semantic

Web Conference (ISWC2010), 2010. [Online]. Available: http:

//data.semanticweb.org/conference/iswc/2010/paper/463

138

[38] A. Zafeiropoulos, N. Konstantinou, S. Arkoulis, D. E. Spanos, and

N. Mitrou, “A semantic-based architecture for sensor data fusion,” Pro-

ceedings - The 2nd International Conference on Mobile Ubiquitous Com-

puting, Systems, Services and Technologies, UBICOMM 2008, pp. 116–

121, 2008.

[39] Microsoft, “Microsoft Health.”

[40] IBM Inc., “IBM Watson Healthcare.” [Online]. Available: http:

//www.ibm.com/smarterplanet/us/en/ibmwatson/health/

[41] Northern Communications Services, “CareLink.” [Online]. Available:

https://carelinkadvantage.ca/

[42] P. Wolf, A. Schmidt, J. P. Otte, M. Klein, S. Rollwage, B. Konig-Ries,

T. Dettborn, and A. Gabdulkhakova, “openAAL - The Open Source

Middleware for Ambient Assisted Living (AAL),” AALIANCE confer-

ence, no. March, pp. 1–5, 2010.

[43] S. Hanke, C. Mayer, O. Hoeftberger, H. Boos, R. Wichert, M.-R. Tazari,

P. Wolf, and F. Furfari, “universAAL An Open and Consolidated AAL

Platform,” R. Wichert and B. Eberhardt, Eds. Berlin, Heidelberg:

Springer Berlin Heidelberg, 2011, ch. universAAL, pp. 127–140.

[Online]. Available: http://dx.doi.org/10.1007/978-3-642-18167-2 10

[44] M. R. Tazari, “ReAAL,” 2013. [Online]. Available: http://www.

cip-reaal.eu/home/

139

[45] Microsoft Corporation, “Microsoft Azure Machine Learning.”

[Online]. Available: https://azure.microsoft.com/en-us/services/

machine-learning/

[46] Apache, “Apache Spark MLlib.” [Online]. Available: http://spark.

apache.org/mllib/

[47] Google Inc., “Google Prediction API.” [Online]. Available: https:

//cloud.google.com/prediction/

[48] D. Talia, P. Trunfio, and F. Marozzo, Data Analysis in the Cloud.

Elsevier, 2016. [Online]. Available: http://www.sciencedirect.com/

science/article/pii/B9780128028810000068

[49] Microsoft Corporation, “Microsoft IoT Demo.” [Online]. Available:

http://www.microsoftazureiotsuite.com/demos/remotemonitoring

[50] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence

Learning with Neural Networks,” pp. 1–9, 2014. [Online]. Available:

http://arxiv.org/abs/1409.3215

[51] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares,

H. Schwenk, and Y. Bengio, “Learning Phrase Representations using

RNN Encoder-Decoder for Statistical Machine Translation,” 2014.

[Online]. Available: http://arxiv.org/abs/1406.1078

[52] Z. C. Lipton, D. C. Kale, and R. Wetzel, “Directly Modeling Missing

Data in Sequences with RNNs: Improved Classification of Clinical

140

Time Series,” Machine Learning for Healthcare, no. 2016, pp. 1–17,

2016. [Online]. Available: http://arxiv.org/abs/1606.04130

[53] T. Theodoridis, V. Solachidis, N. Vretos, and P. Daras, “Human fall

detection from acceleration measurements using a recurrent neural net-

work,” in IFMBE Proceedings, 2018, vol. 66, pp. 145–149.

[54] C. Mayer, M. Bachler, A. Holzinger, P. K. Stein, and S. Wassertheurer,

“The effect of threshold values and weighting factors on the association

between entropy measures and mortality after myocardial infarction in

the Cardiac Arrhythmia suppression trial (CAST),” Entropy, vol. 18,

no. 4, 2016.

[55] D. Singh, E. Merdivan, I. Psychoula, J. Kropf, S. Hanke, M. Geist, and

A. Holzinger, “Human Activity Recognition using Recurrent Neural

Networks,” pp. 1–8, 2018. [Online]. Available: http://arxiv.org/abs/

1804.07144%0Ahttp://dx.doi.org/10.1007/978-3-319-66808-6 18

[56] R. S. H. Istepanian, S. Hu, N. Y. Philip, and A. Sungoor,

“The potential of Internet of m-health Things m-IoT for non-

invasive glucose level sensing,” in 2011 Annual International

Conference of the IEEE Engineering in Medicine and Biology

Society. IEEE, aug 2011, pp. 5264–5266. [Online]. Available:

http://ieeexplore.ieee.org/document/6091302/

141

[57] L. Catarinucci, D. de Donno, L. Mainetti, L. Palano, L. Patrono,

M. L. Stefanizzi, and L. Tarricone, “An IoT-Aware Architecture

for Smart Healthcare Systems,” IEEE Internet of Things Journal,

vol. 2, no. 6, pp. 515–526, dec 2015. [Online]. Available: http:

//ieeexplore.ieee.org/document/7070665/

[58] S. Amendola, R. Lodato, S. Manzari, C. Occhiuzzi, and G. Marrocco,

“RFID Technology for IoT-Based Personal Healthcare in Smart Spaces,”

IEEE Internet of Things Journal, vol. 1, no. 2, pp. 144–152, apr 2014.

[Online]. Available: http://ieeexplore.ieee.org/document/6780609/

[59] S. M. Riazul Islam, Daehan Kwak, M. Humaun Kabir, M. Hossain,

Kyung-Sup Kwak, S. M. R. Islam, D. Kwak, H. Kabir, M. Hossain,

and K.-S. Kwak, “The Internet of Things for Health Care : A

Comprehensive Survey,” Access, IEEE, vol. 3, pp. 678 – 708, 2015.

[Online]. Available: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?

arnumber=7113786http://ieeexplore.ieee.org/document/7113786/

[60] NHS, “NHS Data,” 2018. [Online]. Available: digial.nhs.uk/

data-services/hospital-episode-statistics/data-dictionary

[61] P. Lyons and J. Verne, “Hospital admisions in the last year of life and

death in hospital,” 2016. [Online]. Available: slideplayer.com/slide/

10216536/

142

[62] A. Forkan, I. Khalil, A. Ibaida, and Z. Tari, “BDCaM: Big Data

for Context-aware Monitoring - A Personalized Knowledge Discovery

Framework for Assisted Healthcare,” IEEE Transactions on Cloud

Computing, vol. PP, no. 99, pp. 1–1, 2015. [Online]. Available: http:

//ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7117389

[63] E. Oommen, A. Hummel, L. Allmannsberger, D. Cuthbertson,

S. Carette, C. Pagnoux, G. S. Hoffman, D. E. Jenne, N. A. Khalidi,

C. L. Koening, C. A. Langford, C. A. McAlear, L. Moreland,

P. Seo, A. Sreih, S. R. Ytterberg, P. A. Merkel, U. Specks,

P. A. Monach, E. Choi, M. T. Bahadori, A. Schuetz, W. F.

Stewart, and J. Sun, “Doctor AI: Predicting Clinical Events

via Recurrent Neural Networks,” JMLR workshop and conference

proceedings, vol. 56, no. 1, pp. 301–318, aug 2016. [Online]. Available:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5341604/

[64] T. Pham, T. Tran, D. Phung, and S. Venkatesh, “DeepCare: A deep

dynamic memory model for predictive medicine,” Lecture Notes in Com-

puter Science (including subseries Lecture Notes in Artificial Intelligence

and Lecture Notes in Bioinformatics), vol. 9652 LNAI, no. i, pp. 30–41,

2016.

[65] F. Li, M. Li, P. Guan, S. Ma, and L. Cui, “Mapping

publication trends and identifying hot spots of research on Internet

health information seeking behavior: a quantitative and co-word

143

biclustering analysis.” Journal of medical Internet research, vol. 17,

no. 3, p. e81, mar 2015. [Online]. Available: http://www.jmir.

org/2015/3/e81/http://www.ncbi.nlm.nih.gov/pubmed/25830358http:

//www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4390616

[66] T. L. M. Van Kasteren, G. Englebienne, and B. J. a. Krose, “Transfer-

ring knowledge of activity recognition across sensor networks,” Lecture

Notes in Computer Science (including subseries Lecture Notes in Artifi-

cial Intelligence and Lecture Notes in Bioinformatics), vol. 6030 LNCS,

pp. 283–300, 2010.

[67] A. Avati, K. Jung, S. Harman, L. Downing, A. Ng, and N. H.

Shah, “Improving Palliative Care with Deep Learning,” in IEEE

International Conference on Bioinformatics and Biomedicine 2017, nov

2017. [Online]. Available: http://arxiv.org/abs/1711.06402

[68] K. Bache and M. Lichman, “UCI Machine Learning Repository,”

p. 0, 2013. [Online]. Available: http://www.ics.uci.edu/∼mlearn/

MLRepository.html

[69] T. V. Kasteren, “Activity Recognition for Health Monitoring Elderly

using Temporal Probabilistic Models,” p. 174, 2011.

[70] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C.

Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E.

Stanley, “PhysioBank, PhysioToolkit, and PhysioNet: Components of

144

a New Research Resource for Complex Physiologic Signals,” Circulation,

vol. 101, no. 23, pp. e215—-e220.

[71] M. Kepski and B. Kwolek, “Fall Detection on Embedded Platform

Using Kinect and Wireless Accelerometer,” in Miesenberger K.,

Karshmer A., Penaz P., Zagler W. (eds) Computers Helping

People with Special Needs, 2012, pp. 407–414. [Online]. Available:

http://home.agh.edu.pl/∼bkw/research/pdf/2014/KwolekKepski

CMBP2014.pdfhttp://link.springer.com/10.1007/978-3-642-31534-3

60https://doi.org/10.1007/978-3-642-31534-3 60

[72] B. Kwolek and M. Kepski, “Human fall detection on embedded platform

using depth maps and wireless accelerometer,” Computer Methods and

Programs in Biomedicine, vol. 117, no. 3, pp. 489–501, 2014.

[73] S. Patel, H. Park, P. Bonato, L. Chan, and M. Rodgers, “A review of

wearable sensors and systems with application in rehabilitation,” Jour-

nal of neuroengineering and rehabilitation, vol. 9, no. 1, p. 21, 2012.

[74] S. Mazilu, U. Blanke, M. Hardegger, G. Troster, E. Gazit, and J. M.

Hausdorff, “GaitAssist: a daily-life support and training system for

parkinson’s disease patients with freezing of gait,” in Proceedings of the

32nd annual ACM conference on Human factors in computing systems.

ACM, 2014, pp. 2531–2540.

145

[75] A. Bulling, U. Blanke, and B. Schiele, “A tutorial on human activity

recognition using body-worn inertial sensors,” ACM Computing Surveys

(CSUR), vol. 46, no. 3, p. 33, 2014.

[76] O. D. Lara and A. L. Miguel, “A Survey on Human Activity Recognition

using Wearable Sensors,” IEEE Communications Surveys and Tutorials,

vol. 15, no. 3, pp. 1192–1209, 2013.

[77] Y. Jia, “Diatetic and exercise therapy against diabetes mellitus,” in

Second International Conference on Intelligent Networks and Intelligent

Systems. IEEE, 2009, pp. 693—-696.

[78] J. Yin, Y. Qiang, and J. P. Jeffrey, “Sensor-based abnormal human-

activity detection,” EEE Transactions on Knowledge and Data Engi-

neering, vol. 20, no. 8, pp. 82–1090, 2009.

[79] D. Ravi, W. Charence, L. Benny, and G.-Z. Yang, “A deep learning ap-

proach to on-node sensor data analytics for mobile or wearable devices,”

IEEE journal of biomedical and health informatics, vol. 21, no. 1, pp.

56–64, 2017.

[80] A. Graves and S. Jurgen, “Framewise phoneme classification with bidi-

rectional LSTM and other neural network architectures,” Neural Net-

works, vol. 18, no. 5, pp. 602–610, 2005.

[81] D. Yu, A. Eversole, M. Seltzer, K. Yao, Z. Huang, B. Guenter,

O. Kuchaiev, Y. Zhang, F. Seide, H. Wang, J. Droppo, G. Zweig,

146

C. Rossbach, J. Currey, J. Gao, A. May, B. Peng, A. Stolcke, and

M. Slaney, “An Introduction to Computational Networks and the Com-

putational Network Toolkit,” Tech. Rep. MSR-TR-2014-112, 2015.

[82] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural net-

works,” AISTATS ’11: Proceedings of the 14th International Conference

on Artificial Intelligence and Statistics, vol. 15, pp. 315–323, 2011.

[83] The H2O.ai Team, “H2O.” [Online]. Available: https://www.h2o.ai/

[84] A. Manashty and J. Light, “Cloud Platforms for IoE Healthcare

Context Awareness and Knowledge Sharing,” in Beyond the Internet of

Things: Everything Interconnected, J. M. Batalla, G. Mastorakis, C. X.

Mavromoustakis, and E. Pallis, Eds. Springer, 2017, ch. 12. [Online].

Available: http://www.springer.com/gp/book/9783319507569

[85] A. Manashty, J. Light, and U. Yadav, “Healthcare event aggregation lab

(HEAL), a knowledge sharing platform for anomaly detection and pre-

diction,” in 2015 17th International Conference on E-Health Networking,

Application and Services, HealthCom 2015. Boston, MA: IEEE, 2016,

pp. 648–652.

[86] A. Manashty, “Health Event Aggregation Lab (HEAL) Simu-

lator,” 2017. [Online]. Available: https://github.com/manashty/

AzureHealthDataSimulator/tree/master/HEALCoreSimlulation/

HEAL

147

[87] THE MASSACHUSETTS GENERAL HOSPITAL LABORATORY

OF COMPUTER SCIENCE, “DxPlain.” [Online]. Available: http:

//www.mghlcs.org/projects/dxplain

[88] G. O. Barnett, K. T. Famiglietti, R. J. Kim, E. P. Hoffer, and M. J.

Feldman, “DXplain on the Internet.” Proceedings / AMIA ... Annual

Symposium. AMIA Symposium, pp. 607–611, 1998.

[89] Apache, “Apache CTakes.” [Online]. Available: http://ctakes.apache.

org/

[90] P. Wolf, A. Schmidt, and M. Klein, “SOPRANO-An extensible, open

AAL platform for elderly people based on semantical contracts,”

3rd Workshop on Artificial Intelligence Techniques for Ambient

Intelligence (AITAmI’08), 18th European Conference on Artificial

Intelligence (ECAI’08)., no. Ecai 08, pp. 1–5, 2008. [Online].

Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.

140.4722&rep=rep1&type=pdf

[91] G. van den Broek, F. Cavallo, L. Odetti, and C. Wehrmann,

“AALIANCE Ambient Assisted Living Roadmap,” in Ambient Intel-

ligence and Smart Environments, vol. 6, 2010, p. 110.

[92] G. Fortino, R. Giannantonio, R. Gravina, P. Kuryloski, and R. Ja-

fari, “Enabling effective programming and flexible management of effi-

148

cient body sensor network applications,” IEEE Transactions on Human-

Machine Systems, vol. 43, no. 1, pp. 115–133, 2013.

[93] Microsoft Corporation, “Event Hub.” [Online]. Available: http:

//azure.microsoft.com/en-us/services/event-hubs/

[94] ——, “Exploring Microservices in Docker and Mi-

crosoft Azure,” 2017. [Online]. Available: https:

//www.microsoftvirtualacademy.com/en-us/training-courses/

exploring-microservices-in-docker-and-microsoft-azure-11796

[95] M. K. Foumani, “A cloud-based mobile human fall forecasting system

using recurrent neural networks,” Ph.D. dissertation, University of New

Brunswick, 2018. [Online]. Available: https://manashty.files.wordpress.

com/2018/08/honors thesis.pdf

[96] A. Holzinger, C. Biemann, C. S. Pattichis, and D. B. Kell, “What do

we need to build explainable AI systems for the medical domain?” dec

2017. [Online]. Available: http://arxiv.org/abs/1712.09923

149

Vita

Candidate’s full name: Alireza ManashtyUniversities attended

Ph.D. in Computer Science, 2014 (started),University of New Brunswick

M.Sc. in CS: Artificial Intelligence, 2010-2012,Shahrood University of Technology

B.Sc. in CS: Software Engineering, 2006-2010,Razi University

Publications, Presentations, and Honors since 2014

Peer-reviewed Journal Publications

1. Alireza Manashty, Janet Light, and Hamid Soleimani, “A ConciseTemporal Data Representation Model for Prediction in BiomedicalWearable Devices”, IEEE Internet of Things J., https://doi.org/10.1109/JIOT.2018.2863039, Aug 3rd, 2018. (IEEE IoT Journal (ImpactFactor 5.86))

2. Alireza Manashty, Janet Light, “Life Model: A novel representa-tion of life-long temporal sequences in health predictive analytics”,Future Generation Computer Systems (FGCS), Elsevier, Volume 92,2019, Pages 141-156, ISSN 0167-739X, https://doi.org/10.1016/j.future.2018.09.033. (http://www.sciencedirect.com/science/article/pii/S0167739X17326523) (submitted Dec 2017, Accepted September12, 2018, Published Online October 1st 2018)(Impact Factor 4.6)

Peer-reviewed Conference Publications

1. Alireza Manashty and Janet Light Thompson. 2017. “A New Tem-poral Abstraction for Health Diagnosis Prediction using Deep Recur-rent Networks”. In Proceedings of IDEAS ’17, Bristol, England, July2017 (IDEAS ’17), 6 pages, https://doi.org/10.1145/3105831.3105858(ACM)

2. Alireza Manashty, Janet Light, and Umang Yadav, “HealthcareEvent Aggregation Lab (HEAL), a knowledge sharing platform foranomaly detection and prediction”, in proceedings of the 17th Inter-national Conference on E-health Networking, Application & Services(IEEE HealthCom2015), 14-17 October 2015, Boston, Massachusetts,United States, pp. 648-652.

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=

7454584

Book Chapter

1. Alireza Manashty and Janet Light, “Cloud Platforms for IoE Health-care Context Awareness and Knowledge Sharing, Beyond the Internetof Things: Everything Interconnected, Springer-Verlag, 2017.

http://www.springer.com/gp/book/9783319507569

Conference Presentations, Posters, and Invited Talks

1. Alireza Manashty, ”Predictive Analytics in Health Monitoring”, NBIFR3 Innovation in Aging Conference, Fredericton, NB, Canada, April 122018

2. Alireza Manashty, ”Long-term patient mortality forecasting usingdeep learning”, UNB Graduate Research Conference, March 2018, Fred-ericton, NB, Canada

3. Alireza Manashty, Janet Light, “PHARMS: Predictive Analyticsin Health Monitoring using Deep Learning Can Save Lives”, NBHRF2017, 9th annual New Brunswick Health Research Conference, Monc-ton, NB, Canada, Nov 2nd & 3rd 2017.

4. Alireza Manashty, Janet Light, “Automated Remote Dialysis DatePrediction using a Novel Cloud Architecture”, NBHRF 2016, 8th an-nual New Brunswick Health Research Conference, Saint John, NB,Canada, Nov 2nd & 3rd 2016.

5. Alireza Manashty and Janet Light “Towards a Context Aware Knowl-edge Based Framework for Behavioral Anomaly Detection and Predic-tion”, 12th Annual Computer Science Research Exposition 2015, Uni-versity of New Brunswick, Fredericton, New Brunswick, Canada, April10th 2015 (Honorable mention award)

6. Alireza Manashty, “Connecting Health Monitoring Systems to De-tect Heath Anomalies, Fred Talks 2016, February 25th, Fredericton,NB, Canada

7. Alireza Manashty and Janet Light, “Towards a New JDL modelfor Big Data Analytics in Multi-sensor Data Fusion for Smart Health-care Monitoring, Science Atlantic Mathematics, Statistics and Com-puter Science Conference 2014, October 3-5 2014, University of NewBrunswick, Saint John, NB, Canada (Abstract Presentation)

Honors, Awards, and Grants

1. Rising Start New Brunswick Researcher of the Month Award, NewBrunswick Health Research Foundation, October 2018, New Brunswick,Canada

2. Microsoft Most Valuable Professional (MVP) Award in Microsoft Azure(April 2017)

3. 2nd Place at 4th annual RBC UNBeatable Ideas, Nov. 2017

4. Maecenas Graduate Scholarship, $5,000 2018-2019

5. Honorable Mention for Poster Presentation at 12th Annual ResearchExposition at Computer Science Department, University of New Brunswick,Saint John, April 2015.

6. Microsoft Azure for Research Award in Cloud Computing - $16,000(2017-2018)

7. Microsoft Azure for Research Award in Cloud Computing - $20,000(2016-2017)

8. MITACS Accelerate PhD Fellowship Intern ($30,000), IPSNP Comput-ing Inc., Saint John, NB, Canada