epilepsy prediction using deep learning in …...epilepsy prediction using deep learning in hpc...

EPILEPSY PREDICTION USING DEEP LEARNING IN HPC

Ronaldo [email protected]

Mauro DamoSenior Data Scientist / Advisory ConsultantDell EMC [email protected]

Srinivasan SivaramakrishnanData Scientist / ConsultantDell EMC [email protected]

Wei [email protected]

Knowledge Sharing Article © 2018 Dell Inc. or its subsidiaries.

2018 Dell EMC Proven Professional Knowledge Sharing 2

Table of Contents 1. Abstract ............................................................................................................................................. 3

2. Key Findings and Benefits of this Work ............................................................................................ 4

3. Overview of Subject Domain ............................................................................................................ 4

4. Seizures ............................................................................................................................................. 5

5. Data Discovery .................................................................................................................................. 5

6. Overview of the Methodology .......................................................................................................... 7

7. Data Preprocessing ........................................................................................................................... 7

8. Exploratory Analytics ........................................................................................................................ 9

8.1. Exploring Montages and Signals ............................................................................................... 9

8.2. Text Mining of Physician Notes ............................................................................................... 12

8.3. Social Network Analysis of Key Terms .................................................................................... 15

8.4. Similarity Analysis of Key Terms and Documents ................................................................... 17

9. Use Case Data Processing and Engineering .................................................................................... 18

9.1 The HPC Environment Used on this Work .............................................................................. 18

9.1.1 CentOS Project .................................................................................................................... 19

9.1.2 Hadoop Ecosystem .............................................................................................................. 20

9.1.3 BigDL ................................................................................................................................... 21

9.1.4 Spark ................................................................................................................................... 21

10. The HPC Architecture .................................................................................................................. 21

10.1 Data Set for the Training Model.............................................................................................. 22

10.2 Convolutional Neural Network ............................................................................................... 23

10.3 Results of the Classification Model ......................................................................................... 25

11. Conclusion and Future Work ...................................................................................................... 28

12. References .................................................................................................................................. 28

Disclaimer: The views, processes or methodologies published in this article are those of the authors.

They do not necessarily reflect Dell EMC’s views, processes or methodologies.


1. Abstract Deep Learning is one of the most interesting areas of Machine Learning and Pattern Recognition research. The endless possibility of this promising technology continues to progress in recent years and tends to solve lots of human challenging problems. One such challenge is in the autonomous diagnosis of critical diseases that requires the attention of specialized physicians. This work intends to leverage and pursue three key aspects of the Epilepsy/Seizure Diagnosis using machines in Medical Domain, Deep Beliefs Networks and in a High Performance Computing environment.

Epilepsy is a chronic disorder, the trademark of which is recurrent, unprovoked seizures. A person is diagnosed with epilepsy if they have two or more unprovoked seizures. The seizures in epilepsy may be related to either a brain injury or a family tendency, but often the cause is completely unknown. At present, there are at least 65 million people in the world who have epilepsy according to the Epilepsy Foundation. Also, 470,000 children in the US alone have epilepsy and 1 in 26 in the US will develop epilepsy at some point in their lifetime which states the seriousness of this problem.

Electroencephalography (EEG) monitoring is a noninvasive method to measure the electrical activity of the brain. Because it is noninvasive, this method needs connected electrodes attached to the scalp of the patients. The diagnosis of Epilepsy using this EEG data is complex and the exams take at least one week to be completed and requires the analysis of a trained and specialized professional. Preliminary research states that this kind of method to record brain activity also produces noise in brain wave monitoring. Although some automated diagnosis methods have been introduced in recent years – some of them are related with the use of neural networks applied in EEG in order to support physicians – our hypothesis is that with deep learning technique one can learn when a seizure is happening using data representations.

EEG recordings which are commonly used to investigate neural oscillations are often stored as EDF (European Data Format) Files. For this paper, we use EEG files of more than 200 patients containing almost 1 TB of raw EEG data. Signal decomposition methods like Time Series decomposition, Fast Fourier Transforms (FFT) and Spectral Analysis can be applied to extract information from each signal. Also, through text mining and parsing, meaningful information can be extracted from physician notes of each patient that feeds as an input to the prediction model for Epilepsy Prediction.

This paper also seeks to leverage BigDL - a distributed deep learning library that runs over Apache Spark and will be used to process the data. BigDL leverages existing Spark data scalability, data replication and data availability in deep learning algorithms. BigDL is also one of the most powerful tools recently developed and has the power to maximize deep learning algorithms due to that it applies a multilayer Convolution Neural Network with a synchronous mini-batch SGD (Stochastic Gradient Descent) executed in a multi-node Spark cluster within a High Performance Computing Environment (HPC).

Advanced Epilepsy detection and prediction can radicalize the diagnosis and discovery of neural disorders. This paper explores how automating detection and prediction of seizures based on various key indicators or patterns from our analysis can revolutionize the physicians’ approach to diagnose and understand seizures and epilepsies through this novel approach to the problem. Through this paper, we were able to achieve close to 95% accuracy to predict the probabilities of the top 5 seizure classes.

https://www.epilepsy.com/node/2000007

https://www.epilepsy.com/information/professionals/co-existing-disorders/head-trauma-post-traumatic-epilepsy

https://en.wikipedia.org/wiki/Stochastic_gradient_descent


2. Key Findings and Benefits of this Work The subject domain of this work focusses on improving epilepsy and seizure diagnostics which would help millions of people that suffer from neural diseases. Briefly, this work intends to:

1. Create visualizations that show the various hidden properties of EEG signals to support physicians in their decision making

2. Extract statistical summaries and inferences from EEG signals for quick summarization 3. Leverage Convolution Neural Networks to extract key features that would classify and

predict epilepsy events 4. Save time, money and resources regarding the early detection and prediction of seizures

and epilepsy 5. Develop a new approach of seizure classification model in a high performance

environment 6. Review the literature of emerging technologies such as Deep Learning that has the

potential to add new dimensions to data science

3. Overview of Subject Domain Epilepsy is a chronic disorder, the hallmark of which is recurrent, unprovoked seizures. A person is diagnosed with epilepsy if they have two unprovoked seizures (or one unprovoked seizure with the likelihood of more). The seizures in epilepsy may be related to a brain injury or a family tendency, but often the cause is completely unknown. [4]

The diagnosis of Epilepsy using electroencephalograms data is complex and the exams take at least one week to complete and requires the analysis of a trained and specialized professional. Some automated diagnosis methods has been introduced in recent years and some of them are related with the use of neural networks applied in EEG in order to reduce the mentioned expert efforts. [9]

EEG is the recording of electrical activity along the scalp. The flow of current due to firing of neurons in the brain results in a voltage fluctuation that is measured as EEG. The measurement of the brain’s response to a stimulus is called event-related potential (ERP). The stimulus can be sensory, motor, or cognitive in nature.

Neural oscillations are observed throughout the central nervous system. These are generated by large groups of neurons and can be characterized by the frequency, amplitude, and phase of the oscillations. EEG recordings are commonly used to investigate neural oscillations. Neurons can generate action potentials or spikes in a rhythmic pattern. Some neurons have the tendency to fire at particular frequencies and are called resonators. Spiking patterns that are the result of bursting are considered fundamental for information coding. In many neurological disorders, the cause is excessive neural oscillation. In seizures, excessive synchronization has been observed. Oscillatory activity can also be used in BCIs to control external devices. [11]

The human brain can produce five major brain waves, classified by their frequency ranges. These waves can range from low frequency to high frequency. These are known as alpha (α), theta (θ), beta (β), delta (δ), gamma (γ), and mu (μ). [6]

The EEG has the ability to monitor human brain activity noninvasively, with a precision of milliseconds. This is necessary for understanding the foundations of cognitive functions. The EEG reflects thousands of ongoing brain processes.

https://www.epilepsy.com/node/2000007

https://www.epilepsy.com/information/professionals/co-existing-disorders/head-trauma-post-traumatic-epilepsy


4. Seizures A main objective of this work is to provide a classification method for seizures provided on the available dataset. Seizures that can occur everywhere in the brain can be termed as generalized seizures and those that are specified to a particular location in the brain are called partial seizures.

Seizures are divided into groups [3] depending on:

where they start in the brain (onset) whether or not a person's awareness is affected whether or not seizures involve other symptoms, such as movement

Depending on where they start, seizures are described as being focal onset, generalized onset or unknown

onset. The classification will be performed with a deep learning neural network method based on

Convolutional Neural Network (CNN). The neural network is trained in order to achieve the following

seizure classes.

There is also a term-based annotations file for each EEG file which describes the seizure occurrence or

nonoccurrence. There are about 10 seizure classes and their corresponding manifestation and

explanations of that seizure are shown in Figure 1.

An epilepsy is defined as two or more unprovoked seizures. It is also one of the most common disorders

of our nervous systems and affects people of all ages, ethnic backgrounds, and races. Also, depending on

the localization, it can be temporal, frontal, parietal and occipital. [5]

Figure 1: Seizure Types

5. Data Discovery The data used for this paper is from the Temple University EEG Corpus [19]. In general, Clinical EEGs use

a variety of channel configurations. There are also two types of EEGs – Averaged Reference (AR) and

Linked Ears Reference (LE) – depending on the method used for recordings. For the purpose of this paper,

over 2100 EEG files with readings based on Averaged Referenced configuration have been processed. We

have also used over 823 sessions of physician notes about each EEG session to perform text mining of the

data. Each EEG file has varying sizes depending on the duration of EEG recording. Some files are small and

some are large depending on the patient and the duration of EEG session that was observed.

Code DescriptionFNSZ Focal seizures which cannot be specified by its type

GNSZ Generalized seizures which cannot be further classified into one of the groups below

SPSZ Partial seizures during consciousness; Type specified by clinical signs only

CPSZ Partial Seizures during unconsciousness; Type specified by clinical signs only

ABSZ Absence discharges observed on EEG; patient loses consciousness for a few seconds (Petit Mal)

TNSZ Stiffening of body during seizure (EEG effects disappears)

CNSZ Jerking/shivering of body during seizure

TCSZ At first stiffening and then jerking of body (Grand Mal)

ATSZ Sudden loss of muscle tone

MYSZ Myoclonous jerks of limbsMyoclonic Seizure Clinical N/A

Tonic Clonic Seizure Electroclinical All

Atonic Seizure Clinical N/A

Tonic Seizure Electroclinical All

Clonic Seizure Electroclinical All

Complex Partial Seizure Electroclinical All

Absence Seizure Electroclinical Generalized

Generalized Non-Specific Seizure Electrographic Generalized

Simple Partial Seizure Electroclinical All

Name Manifestation Location

Focal Non-Specific Seizure Electrographic Hemispheric/Focal


In each EEG session, the electrodes are placed on the subjects’ head as shown as an illustrative example

in Figure 2. Each electrode measures a potential which is then used to calculate the potential difference

between any two. Some regions in the brain record higher activity and some regions record lower activity

which is explained by varying intensity of color based on the readings from electrodes.

Figure 2: EEG Procedure Illustrative Example

For each file, the EEG recordings where taken based on the potentials observed in electrodes or channels.

There are about 22 different montage channels with each measuring a potential difference between two

electrodes as shown in Figure 3. Overall, there are about 19 different electrodes which are important to

calculate each montage and thereby detect seizures and epilepsies.

Figure 3: EEG Montages

Each EEG session folder for every patient also has the physician notes corresponding to events in each

EEG session. One of the session’s notes are shown in Figure 4. We can see that the notes have a rich

content containing clinical history, impression, correlation and many other key factors apart from activity

diagnosis. Most notes has the age, beats-per-minute and gender of the patient.


Figure 4: EEG Session Physician Notes

6. Overview of the Methodology This section describes the process of data mining and the algorithms used to reach the expected outcomes of this paper. The approach is based on DEPP (Descriptive, Explorative, Predictive and Prescriptive) methodology. The steps below explain the analytic development life cycle of the paper.

1. Utilize exploratory signal processing methods such as Time Series Analysis, Spectral Analysis, etc., to derive preliminary insights from EEG recordings across 19 electrodes post-data preprocessing

2. Use Text Mining to process unstructured data from physician notes that can help in prediction 3. Leverage techniques such as Clustering, Correlation Analysis, Social Network Analysis, Similarity

Analysis, etc., to find key indicators of Seizure and Epilepsy through Root Cause Analysis 4. Introduce Deep Learning techniques such as Convolutional Neural Network with multiple hidden

layers for feature extraction which helps to classify Seizures 5. Leverage Big Data solution architecture stack through R (RStudio), Python, Tableau, Spark, Hive

and BigDL to increase the compute and reduce the total time on DEPP approach in HPC

7. Data Preprocessing The EEG data is first preprocessed using preprocessing scripts that converts the EEG recordings in (.edf)

files to (.csv) files to get the corresponding data for each EEG file. Figure 5 shows the code in R that was

written to process the raw EEG data into consumable flat file format.


Figure 5: EEG Signal Preprocessing Script

Figure 6 shows the readings of some of the electrodes and the time duration and the session it was

recorded under post-preprocessing. Only the 19 electrodes that are used in montages as mentioned

above are considered for exploration and prediction of seizure. The remaining artifacts are discarded.

Figure 6: Preprocessed EEG Input Data

In addition to the EEG signal file mentioned above, there is also a header file which gives more information on the patient ID, duration of recording, timestamp of the file and the list of electrodes involved in the capture. For example, in Figure 7 the EEG duration of this file is 1466 seconds or 24.3 minutes. Most files have a sampling rate of 250 Hz.


Figure 7: EEG Signal Header

The records of all patients for each session is then iteratively preprocessed to create a single large data frame containing close to 140 GBs of data, close to 200 patients, and over 2100 EEG files across those EEG sessions that was captured.

8. Exploratory Analytics

8.1. Exploring Montages and Signals

Each EEG file can be visualized to see the different readings of the electrodes. This is very important as we have to measure the difference in potential among two electrodes to eventually calculate the corresponding montage value. An example showing the potential value of electrodes over time is shown in Figure 8. Each recording is a time series with its own characteristics.

Figure 8: Visualization of EEG Signals from each Electrode

Figure 9 depicts an example from one of the EEG files and how the values of each montage changes in accordance with time enabling us to see the seizure and non-seizure portions of the signal. In this case, “Bckg” refers to background noise and “tcsz” refers to Tonic Clonic Seizure. Four montages – montage 0, 1, 2, 3 are shown and each montage has its own time series pattern.


Figure 9: EEG Montage Time Series with Seizure Labels

Figure 10 shows the visualization of the first 0.20 seconds recording of an EEG session. This can be viewed as a heat map of the voltage readings from all the 19 electrodes over time. Higher values of the potential are shown by darker colors. At the top, the heat map also shows dendrogram which explains how the different electrodes are clustered and positioned based on nearest neighbors with each other.

Figure 10: Heat Map of Signal Recording from single EEG session

Time in

secs


Figure 11 shows the Periodogram Power Spectral Density estimate for four EEG files from a session for a patient. Each of the spectral density has its own features and also very close frequency bandwidths. The x axis represents the normalized frequency and the y axis represents the spectral density in decibels coming from voltage values of electrodes. You can see from all four figures that the maximum spectral density or peak occurs approximately at a frequency 0.30. So, the Time period is calculated as 1/0.3 which is equal to 3.33 time periods to complete a cycle. Since data is collected every 0.0025 secs, the dominant frequency occurs is 3.33*.0025, which is equal to 0.008 second. Similar technique can be applied to montages to identify and understand seizure dominant periodicities as shown in Figure 12.

Figure 11: Periodogram of Single Electrode from an EEG Session

Figure 12: Periodogram of Single Montage from an EEG Session


8.2. Text Mining of Physician Notes

The next step in exploration of EEG is to perform text mining of physician notes for all EEG sessions for all patients. We have used text mining techniques from “tm” package in R to perform analysis and visualizations of the word content from over 823 processed physician notes. Many common words and prepositions are removed as a part of data cleansing. Figure 13 shows the terms and the frequencies of occurrence of those across all the document/notes after processing unstructured text data.

Figure 13: Bar chart of Word Frequencies

From the bar chart in Figure 13, we see that words like “left” and “activity” are the most frequent words with more than 2000 in frequencies and tend to occur in almost every document. The other interesting frequent words are “hemisphere”, “focal”, “temporal”, etc. which gives some key topics from notes.

Figure 14: Word cloud of Key Words based on Frequencies


Similar to the bar chart, Figure 14 shows the Word Cloud of the top 100 terms across all documents of each EEG session sized by frequencies. The words shown in larger font indicate higher frequencies of occurrence across all documents, whereas the words in smaller font indicate relatively low frequencies. The word cloud is also color coded based on the frequency bin range that each word group falls into. The purpose of the word cloud is understand the core topics from the notes.

Figure 15 shows the cluster plot based on distance between words measured through frequencies of occurrence across all documents. We see from the plot that there are four clusters each of them gave groups of words bundled together projected onto a 2D space. Each cluster has its own properties and behaviors. For example, Cluster 4 on the extreme right has the largest spread and has the collection of very highly frequent words like “patient”, “seizures”, “activity”, etc., which are very common words in every document. Hence, this explains their high occurrence and thereby those groups of words are put together in Cluster 4. On the other hand, Cluster 3 has groups of not so common words but also a smaller spread when compared with 4.

Figure 15: Cluster Plot showing group formation among Key Words

Figure 16 shows the degree of relationship between different words through correlation. Values closer to 1 means strong positive correlation which implies the frequency of occurrence of those words vary together. The left column of Figure 16 shows all the top positive correlations between words like “Seizure” with all other key terms based on frequencies. Similarly, the right column shows the top correlation between the words “Seizures” with all the other key terms. Darker the color, stronger is the correlation. We consider “seizure” and “seizures” separately because the latter refers to epileptic conditions as epilepsy is defined as two or more unprovoked seizures. Words like “activity”, “starts”, “bed”, “temporal”, “rhythmic”, etc., are relevant to seizure in this context as they tend to co-occur with the target word “seizure. Similarly words like “hemisensory”, “recurrent”, “partial”, “aware”, etc., are very relevant to the word “seizures”.


Figure 16: Correlation Table showing key terms with “Seizure” and “Seizures”

Figure 17 shows the importance of words based on concept called Inverse Term Document Frequency(IDF) which measures importance of the word based on rarity of occurrence across document.

Figure 17: Tree Map showing Importance of words based on rarity

This visualization is important because if we consider only frequent terms, many words – i.e. “patient”, “looks”, “seems”, etc. – which are present in almost all documents get high weight but in reality are of little importance as they don’t convey any substantive information. Thus, using IDF, we weigh down frequent terms while we scale up the rare words to have more weight. Consequently, the IDF of a rare

Seizure Seizures

Increase

s

Increase


term is very high wheres the IDF of a frequent term is very low. The tree map shows the key words sized by IDF score. Bigger the rectangle, bigger the importance of that word. Words like “hemisphere”, “bursts”, “delta”, “frontal”, “periodic” are very relevant and important in understanding the causes and to get preliminary insights based on physician notes.

8.3. Social Network Analysis of Key Terms

Social network analysis explores the relationship between all the key terms based on graph theory. Each

Key (high frequency) term is represented as a node and there are 78 nodes in total as shown in Figure 18

after processing over 823 notes. The connection between nodes is called an edge. Here the edges are

color coded based on frequency of occurrence together. Thicker the edge between two nodes, greater is

the intensity of relationship between them based on co-occurrence. We can see that some relationships

amongst two words are very strong and some relationships are very weak. In the network below, there is

a weak association between words like “generalized” and “focal” indicated by thinner edges.

Figure 18: Social network showing relationship amongst key terms

In Figure 19, we can see two sub-graphs extracted by the above graph. The left sub-graph is focused on

“seizures” and the one on the right is focused on “seizure”. This is done through a technique called

Community Detection which identifies specific communities of interest. In this case, the two communities

are “seizures” and “seizure” seen in the two graphs below. We can clearly see that the “seizures” sub-

graph has different set of nodes when compared with “seizure” subgraph. The leftmost graph has 45

nodes and 990 edges in that community, whereas the rightmost graph has 33 nodes and 528 edges. As

explained before, thicker edges indicate stronger association based on co-occurrence of words across all

documents and thinner edges indicate weaker association between nodes.


Figure 19: Induced Sub-graph showing “Seizures” and “Seizure” Communities

Figure 20: Bubble chart showing influential nodes on “Seizures” and “Seizure” Communities

Figure 20 shows the bubble chart of the top words for “Seizures” and “Seizure” community sized by their

Eigen Vector Centrality score. Centrality score is based on the concept that connections to high scoring

node is valued more than a connection to a low scoring node. This is also another way measuring influence

Seizures Seizure

Seizures Seizure


of each node in the community. In the case above, bigger the size of the bubble, bigger is the influence

of that specific word in that community. Words like “epileptiform”, “theta”, “sharp”, etc., are very relevant

indicators of some preliminary insights about indicators of seizures.

8.4. Similarity Analysis of Key Terms and Documents

Figure 21 shows the distance between two documents (notes) based on the occurrence of over 200 highly

frequent words in the form of a similarity matrix or distance matrix. Using this matrix, the similarity

between any two physician notes can be easily studied which in turn helps physicians and nurses look for

similar symptoms or to assist in medication choices. Even though no two patients share the exact

complications, this technique defintely provides physcians with a lot more options and to look for

additional diagnosis that was initially missed. Darker the color, darker is the similarity. For example, you

can see from the matrix that document 4 is very similar to 2 and 3. This makes sense because the notes

of 2, 3, 4 all belong to the same patient even though it is captured from different EEG sessions. Also, we

can quickly see which documents are dissimilar to each other. Only 50 documents out of 823 are shown

in the visual below to make the visualization readable.

Figure 21: Similarity matrix of Documents color coded by intensity of similarity between them

Similar to the visual above, Figure 22 shows the similarities between key terms based on frequencies of

occurrence across all 823 documents. The similarity matrix here shows how close the words are to each

other. Some rows are fully dark which means that is a highly common term used across all documents.

For example, words like “description”, “ekg”, “electrodes” are common and frequent in all documents,

hence, have higher similarity score with other words. Sometimes it is also beneficial to look at the

dissimilarity between two words (shown by lighter intensity) to identify specific diagnosis or patterns

Similarity


common across documents. We can see from Figure 22 that the word “epilepsy” is dissimilar to all other

highly frequent words. This makes sense because not every physician notes mentions epilepsy as

frequently as they mention the word “description”.

Figure 22: Similarity matrix of Key Words color coded by intensity of similarity between them

9. Use Case Data Processing and Engineering This section describes the data processing on the HPC Cluster and the classification process. The following

also details the environment and the Big Data Ecosystem used for the purpose of this paper.

9.1 The HPC Environment Used on this Work

The environment used is a Hadoop Ecosystem with Hortonworks distribution package version 2.6.3.0-235 that contains a generic application software stack as shown in Figure 23.


Figure 23: The HPC Environment

Some components from third party companies were also used. Those components are: Jupyter Notebook for the GUI interface to develop the code, Python 2.7.5 for programming and BigDL – a Deep Learning framework for processing learning algorithms in a Spark cluster. All of them are open source components.

The stack used to process all the data from this Deep Learning use case was a subset of components from the stack above depending on the need. The nature of this use case is structured batch data processing, and a requirement for computation complexity for analytics.

Below is the stack used to process the entire EEG data frame:

Linux CentOS version 7, 64 bits

Hadoop version 1.7.3.2.6.3.0-235 o HDFS o Map Reduce

Yarn

Oozie

Zookeeper

Tez

Spark

Hive

Python

Jupyter Notebook

BigDL

All of these components are important because each has its own benefits.

9.1.1 CentOS Project

The CentOS Linux distribution is a stable, predictable, manageable and reproducible platform derived from the sources of Red Hat Enterprise Linux (RHEL) [20]. We are now looking to expand on that by creating the resources needed by other communities to come together and be able to build on the CentOS Linux platform. The process starts by delivering a clear governance model, increased transparency and access with an aim to publish our own roadmap that includes variants of the core CentOS Linux.


Since March 2004, CentOS Linux has been a community-supported distribution derived from sources freely provided to the public by Red Hat. As such, CentOS Linux aims to be functionally compatible with RHEL. We mainly change packages to remove upstream vendor branding and artwork. CentOS Linux is no-cost and free to redistribute.

CentOS Linux is developed by a small but growing team of core developers. In turn the core developers are supported by an active user community including system administrators, network administrators, managers, core Linux contributors, and Linux enthusiasts from around the world.

Over the coming year, the CentOS Project will expand its mission to establish CentOS Linux as a leading community platform for emerging open source technologies coming from other projects such as OpenStack. These technologies will be at the center of multiple variations of CentOS, as individual downloads or accessed from a custom installer. Read more about the variants and Special Interest Groups that produce them [20].

9.1.2 Hadoop Ecosystem

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing [21]. Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules [21]:

Hadoop Common: The common utilities that support the other Hadoop modules Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput

access to application data Hadoop YARN: A framework for job scheduling and cluster resource management Hadoop MapReduce: A YARN-based system for parallel processing of large data sets

Other Hadoop-related projects at Apache include:

Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner

Avro: A data serialization system Cassandra: A scalable multi-master database with no single points of failure Chukwa: A data collection system for managing large distributed systems HBase: A scalable, distributed database that supports structured data storage for large tables Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying Mahout: A scalable machine learning and data mining library Pig: A high-level data-flow language and execution framework for parallel computation Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and

expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation

Tez: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch


and interactive use-cases. Tez is being adopted by Hive, Pig and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop MapReduce as the underlying execution engine

ZooKeeper: A high-performance coordination service for distributed applications. It orchestrates all Hadoop components and their services [21]

9.1.3 BigDL

BigDL is a distributed deep learning library for Spark that can run directly on top of existing Spark or Apache Hadoop clusters. You can write deep learning applications as Scala or Python programs [18]. It has a rich deep learning support and it is modeled after Torch. BigDL provides comprehensive support for deep learning, including numeric computing (via Tensor and high-level neural networks). In addition, you can load pre-trained Caffe or Torch models into the Spark framework, and then use the BigDL library to run inference applications on their data.

BigDL can efficiently scale out to perform data analytics at “big data scale” by using Spark as well as efficient implementations of synchronous stochastic gradient descent (SGD) and all-reduce communications in Spark.

To achieve extremely high performance, BigDL uses Intel Math Kernel Library (Intel MKL) and multithreaded programming in each Spark task. Designed and optimized for Intel Xeon processors, BigDL and Intel MKL provides extremely high performance [18].

9.1.4 Spark

Spark is a lightning-fast distributed data processing framework developed by the University of California, Berkeley, AMPLab. Spark can run in stand-alone mode, or it can run in cluster mode on YARN on top of Hadoop or in Apache Mesos cluster manager (Figure 24). Spark can process data from a variety of sources, including HDFS, Apache Cassandra or Apache Hive. Its high performance comes from its ability to do in-memory processing via memory-persistent RDDs or Data Frames instead of saving data to hard disks like traditional Hadoop MapReduce architecture.

Figure 24: SPARK Architecture

10. The HPC Architecture The cluster used to process the use case has 6 virtual machines, split into 2 virtual machines as the masters and 4 VM’s as the slaves. So any distributed system of the cluster will have this configuration; machine with IPs 10.236.130.131 and 10.236.130.131 are the master nodes and the IPs 10.236.130.133, 10.236.130.134, 10.236.130.135 and 10.236.130.136 are the slave nodes (Figure 25).


Figure 25: Cluster Architecture

The server architecture for the Name Node and Secondary Name VM is:

Vcores of 8 cores Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz, but in the case of secondary Name Node the CPU speed is 2.60 GHz

Virtual RM Memory is 62 GB

Total Virtual HDD capacity is 1.3 TB

For the data node we have mode storage capacity then the Name and Secondary name nodes. The specification below is for the data nodes:

Vcores of 8 cores Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz, but in the case of secondary namenode the CPU speed is 2.60 GHz

Virtual RM Memory is 62 GB

Total Virtual HDD capacity is 9.2 TB

The total persistent storage for the cluster Hadoop is approximately 35.8 Terabytes, but we used a replication factor that is used for the Hadoop cluster to replicate the data and guarantee data replication by a factor of 2. So, the total net storage for our use cases is 17.9 Terabytes of storage.

10.1 Data Set for the Training Model

The total amount of records in our raw data is 336.7 million records. The total amount of seizure events is 19.7 million records. So to have a balanced dataset, we also select 19.7 million records from non-seizure events. In the overall distribution, 50% of the sample has some kind of a seizure event and 50% of the samples as a non-seizure event. The dataset available from the EEG file has the following distribution of classes as shown in Figure 26.

10.236.130.131:

Name Node 10.236.130.132:

Secondary Node

10.236.130.13[3-

6]: Data Node


Figure 26: Dataset Class Distribution

We added all the seizure events and add the same amount of non-seizure events. Total amount of records in the data is 39,262,882 records. Each record is a matrix with 22 cycles of EEG signals with the 22 channels. We choose this square matrix base to ease the computation of the Neural Networks.

The total size of the dataset, all sessions from all patients, is 168.8 GB of storage. So we preprocessed this file to reduce and achieve a good balance in the data sample – half of the data with non-seizure event and half of the dataset with a seizure event. The file system of storage for these files is HDFS and we use two applications to transform the data. Use Hive to process and then transform it into “HiveContext” inside a Spark session to leverage Spark. Figure 27 shows the dataflow using a block diagram.

Figure 27: Data Flow Preparation

10.2 Convolutional Neural Network

Our Convolution Neural Network (CNN) follows a generic approach and has the following architecture. The algorithm that we build to create the Spark Files for processing in BigDL is shown in Figure 28. We used a reference similar Convolution Neural Network architecture from Schirrmeister et al. [15]

0

5000000

10000000

15000000

20000000

25000000

30000000

35000000

40000000

45000000

1 2 3 4 5 6 7 8 9 Total


Figure 28: Processing Spark Files in BigDL

Following is the snippets code that is referenced of the CNN in BigDL framework:

model = Sequential(2) model.add(Reshape([1, 22, 22])) model.add(SpatialConvolution(n_input_plane=1,n_output_plane=6, kernel_w=5, kernel_h=5, stride_w=1, stride_h=1, pad_w=0, pad_h=0)) model.add(ReLU()) model.add(SpatialMaxPooling(2, 2, 2, 2)) model.add(ReLU()) model.add(SpatialConvolution(6, 12, 5, 5)) model.add(SpatialMaxPooling(2, 2, 2, 2)) model.add(Reshape([12 * 2 * 2])) model.add(Linear(12 * 2 * 2, 100)) model.add(ReLU()) model.add(Linear(100, class_num)) model.add(Dropout()) model.add(LogSoftMax()) return model

Figure 29 shows the snapshot of the submitted job and its status of completion. The job scheduling is managed by YARN (Yet Another Resource Negotiator).


Figure 29: Snapshot of the submitted job

10.3 Results of the Classification Model

In our process we have 39.3 million records of preprocessed dataset. In the Big DL environment, the process took 365.06 seconds to run on the cluster. We reach an accuracy on Top 1 class of 50% and an accuracy of the top 5 classes of 95.8%. We split the dataset, 75% of training set and 25% of the test set. We used 5 epochs using mini batch of 100 records which means that the data was ingested inside the neural network 5 times and consequently it updates all the weights from all neurons of neural network 5 times. One hundred records are used in each mini batch and the total number of mini batches is 393,000 batches. The outcome of the test dataset is only good when we consider the top 5 class probabilities within which a seizure might occur as shown in the 2nd row in Figure 30 that shows the model outcome.

Type of Metric Correct Count Accuracy

Top1 Accuracy 223.090 446.096 50.0%

Top5 Accuracy 427.505 446.096 95.8%

Figure 30: Model Accuracy

Just to understand what the Top N means, it is a widely used metric that shows the accuracy (how many predictions are correct over the total amount of predictions) by N classes. The softmax neural layer at the end of neural network produces the probability for each feature to be in a specific class. So, for each group of representations, softmax layer calculates the probability of all classes; in our example, each event can be classified into any of 9 classes and a probability is assigned for each seizure class.

Top 1 Accuracy: This means that the model has 50% accuracy when it has to select only 1 seizure class with highest probability out of 9 possible seizure classes. Only the seizure class with the highest probability will be designated as the outcome of the prediction.

Top 5 Accuracy: This means that the model has 95.8% accuracy when it selects the classes that has the 5 highest probabilities to be the correct class of seizure out of all existing classes. Figure 31 shows an example case and the probabilities of prediction for the top 5 seizure classes for just one event. So, Top 5 means only the five classes which has the highest probabilities are considered for this accuracy calculation. The remaining four class probabilities are ignored as seen in the illustrative example below.


Figure 31: Top 5 Class Probabilities Example

Figure 32 shows the snapshot of the output in our HPC environment using Big DL. Overall, Top 5 accuracy seems to be a good approach to measure model accuracy for this predictive model as it reduces the misclassification rate by more than 45% and eventually increasing the confidence in seizure prediction.

Figure 32: Snapshot of Classification Results

The BigDL tool has a few ways to measure model performance. It only covers Top 1 and Top 5 class accuracy - which is a limitation of this tool. It does not expand for different methods like Confusion Matrix of the classes, Analysis of Neuron Activation and other important features that are used in Deep Learning.

In addition, we used a tool called Tensorboard. This open source tool helps the Data Scientist team understand how the evolution of how train model and test evaluation behave.

Figure 33: Snapshot of Tensorboard

Below we have some plots that shows the behavior of Loss Function, Bias Gradients and Weight Gradients of the neural network. The first plot relates to loss function. This plot shows how much gain your model gets during the training phase. This loss measure converges at 1.5 million records. After this number of training sets is reached, we don’t get much increase in model accuracy.

Pro

bab

ility


Figure 34: Loss and Throughput Functions

Figure 34 shows the increase or decrease of Top 1 and Top 5 Accuracy during training phase.

Figure 35: Accuracy (Top 1 and Top 5)

The following two groups of eight charts show the evolution and histogram of the weights from the

parameters for the Linear and the Convolution Layers of the Convolution Neural Network. In particular, it

shows the evolution of the weights, the bias, and their respective gradients in accordance with the

increase in number of the training cases.

Linear Layer number 3d48176e

Figure 36: Linear Layer Example


Spatial Convolution number 2e29d3af

Figure 37: Spatial Convolutions Example

11. Conclusion and Future Work Based on these results, we can see that the model can be further optimized over time to learn the specific type of seizure class. Nonetheless, we could still use this algorithm in research to predict when the patient has a seizure event or non-seizure with very limited fine tuning and reinforced learning. In other terms, we can tweak the architecture of this neural network to use all the features to predict binary outcome that represents a Bernoulli distribution instead of a multi-nominal distribution. In the future, we can leverage different Deep Learning Frameworks apart from Big DL to improve the performance of the code. Also, we can add more nodes to the cluster to enhance compute and scalability of the model. In addition, this framework could be reused to predict cases related to epilepsy provided if we could improve the accuracy of the model.

12. References 1. DADGAR-KIANI, E. et al - Applying Machine Learning for Human Seizure Prediction – Stanford

University (2016)

2. DAUBE, J. R. and RUBIN, D.I - Clinical Neurophysiology, 3rd Edition-Oxford University Press (2009)

3. Epilepsy Foundation – Types of Seizures: https://www.epilepsy.com/learn/types-seizures online

on January, 22th (2018).

4. Epilepsy Foundation – What is Epilepsy: https://www.epilepsy.com/learn/about-epilepsy-basics/what-epilepsy online on January, 23th (2018)

5. Epilepsy Ontario: Seizures Classification: http://epilepsyontario.org/about-epilepsy/types-of-seizures/ online on January, 23th (2018)

6. Epilepsy Society – Seizures: https://www.epilepsysociety.org.uk/seizure-types#.WmfRMainE2w 7. KAMEL, N. & MALIK, A. S. – EEG / ERP Analysis – Methods and Applications – CRC Press (2015)

8. KEMP, B et al - A simple format for exchange of digitized polygraphic recordings 9. Electroencephalography and clinical Neurophysiology, 82, 391-393 391 - Elsevier Scientific

Publishers Ireland (1992) 10. LI, J. et al - Feature Learning from Incomplete EEG with Denoising Autoencoder - Department of

Computer Science and Engineering, Shanghai Jiao, Tong University, Shanghai, China (2014)

https://www.epilepsy.com/learn/types-seizures

https://www.epilepsy.com/learn/about-epilepsy-basics/what-epilepsy

https://www.epilepsy.com/learn/about-epilepsy-basics/what-epilepsy

http://epilepsyontario.org/about-epilepsy/types-of-seizures/

http://epilepsyontario.org/about-epilepsy/types-of-seizures/

https://www.epilepsysociety.org.uk/seizure-types#.WmfRMainE2w


11. KOEPSELL, K. et al – Exploring the Function of Neural Oscillations in Early Sensory Systems – Frontiers in Neuroscience (2010)

12. MURUGESAN, M. and SUKANESH, R. - Automated Detection of Brain Tumor in EEG Signals Using Artificial Neural Networks - International Conference on Advances in Computing, Control, and Telecommunication Technologies (2009)

13. NIGAM, V.P. & GRAUPE, D. – A Neural-Network-Based Detection of Epilepsy - Neurological Research, A Journal of Progress in Neurosurgery, Neurology and Neurosciences – Vol 26 (2004)

14. PEREIRA, S. et al – Brain Tumor Segmentation Using Convolutional Neural Networks in MRI

Images – IEEE Transaction on Medical Imaging, Vol. 35, No. 5 (2016)

15. SCHIRRMEISTER, R. T. Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG – University of Freiburg (2017)

16. STOBER, S. et al - Deep Feature Learning for EEG Recordings - The Brain and Mind Institute - University of Western Ontario, Canada (2016).

17. WALKER, I. - Deep Convolutional Neural Networks for Brain Computer Interface using Motor Imagery – Ms.C. Dissertation – Imperial College, London (2015).

18. Big DL : https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark

19. Temple University Corpus Info: https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml

20. CentOS : https://www.centos.org/about/ 21. Apache Hadoop : http://hadoop.apache.org/index.html

Dell EMC believes the information in this publication is accurate as of its publication date. The

information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO

RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS

PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS

FOR A PARTICULAR PURPOSE.

Use, copying and distribution of any Dell EMC software described in this publication requires an

applicable software license.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark

https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark

https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml

https://www.centos.org/about/

http://hadoop.apache.org/index.html