Identification of ischemic heart disease via machine learning analysis on magnetocardiograms

Download Identification of ischemic heart disease via machine learning analysis on magnetocardiograms

Post on 26-Jun-2016




16 download

Embed Size (px)


<ul><li><p>Computers in Biology and Medicine 38 (2008)</p><p>Identication of ischemic heart disease viamachine learning analysis onmagnetocardiograms</p><p>Tanawut Tantimongcolwata, Thanakorn Naennab, Chartchalerm Isarankura-Na-Ayudhyaa,Mark J. Embrechtsc, Virapong Prachayasittikula,</p><p>aDepartment of Clinical Microbiology, Faculty of Medical Technology, Mahidol University, 2 Prannok Road, Bangkok-noi, Bangkok 10700, ThailandbDepartment of Industrial Engineering, Faculty of Engineering, Mahidol University, Nakhonpathom 73170, Thailand</p><p>cDepartment of Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute, Troy, NY, USAReceived 10 May 2006; accepted 24 April 2008</p><p>Abstract</p><p>Ischemic heart disease (IHD) is predominantly the leading cause of death worldwide. Early detection of IHD may effectively prevent severityand reduce mortality rate. Recently, magnetocardiography (MCG) has been developed for the detection of heart malfunction. Although MCGis capable of monitoring the abnormal patterns of magnetic eld as emitted by physiologically defective heart, data interpretation is time-consuming and requires highly trained professional. Hence, we propose an automatic method for the interpretation of IHD pattern of MCGrecordings using machine learning approaches. Two types of machine learning techniques, namely back-propagation neural network (BNN) anddirect kernel self-organizing map (DK-SOM), were applied to explore the IHD pattern recorded by MCG. Data sets were obtained by sequentialmeasurement of magnetic eld emitted by cardiac muscle of 125 individuals. Data were divided into training set and testing set of 74 cases and51 cases, respectively. Predictive performance was obtained by both machine learning approaches. The BNN exhibited sensitivity of 89.7%,specicity of 54.5% and accuracy of 74.5%, while the DK-SOM provided relatively higher prediction performance with a sensitivity, specicityand accuracy of 86.2%, 72.7% and 80.4%, respectively. This nding suggests a high potential of applying machine learning approaches forhigh-throughput detection of IHD from MCG data. 2008 Elsevier Ltd. All rights reserved.</p><p>Keywords: Ischemia; Magnetocardiography; Machine learning; Back-propagation neural network; Self-organizing map</p><p>1. Introduction</p><p>Heart diseases are the leading cause of death worldwide.Heart malfunction usually rises from ischemia due to inade-quate blood supply to the cardiac muscles. Long-term de-ciency of oxygen and nutrients of heart musculature leads todamage of cardiac tissue which ultimately culminates in heartattack and death. Early diagnosis of heart ischemia is criticalto help reduce the mortality rate.</p><p>Electrocardiography (ECG) is traditionally used to monitorischemic heart disease (IHD). It demonstrates the electrophys-iological activity of heart via electrodes attached on the torsosurface. However, in some circumstances, ECG may be normal</p><p> Corresponding author. Tel.: +662 418 0227; fax: +662 412 4110.E-mail address: (V. Prachayasittikul).</p><p>0010-4825/$ - see front matter 2008 Elsevier Ltd. All rights reserved.doi:10.1016/j.compbiomed.2008.04.009</p><p>in patients at rest during angina. Various advanced methodshave been developed to improve diagnostic performance of car-diac abnormalities, such as myocardial perfusion scintigraphyand cardiac catheterization. Nonetheless, these techniques areinvasive, time-consuming and require technical experts for op-eration. Therefore, a non-invasive and contact-free technique,namely magnetocardiography (MCG), has been developed.</p><p>MCG is a sensitive technique that monitors the magneticeld emitted by cardiac tissues. It has been considered as apromising alternative to traditional ECG since superconductingquantum interference device (SQUID) is used as contact-freeprobe [14]. This provides high reproducibility, less signaluctuation and better spatial resolution over ECG [57]. MCGhas been proven to be an efcient tool for many clinical appli-cations. It is useful to monitor fetal heart activities and showssuperior efciency to determine myocardial infarction over</p></li><li><p>818 T. Tantimongcolwat et al. / Computers in Biology and Medicine 38 (2008) 817825</p><p>ECG [5,811]. Using MCG, coronary artery disease can poten-tially be detected in patient presenting acute chest pain [12].Three-dimensional localization of myocardial lesion can also bevisualized by magnetic mapping of MCG [1315]. With fore-mentioned, interpretation of MCG remains a challenge since itrequires highly experienced personnel and time-consuming.</p><p>Machine learning is a promising technique that explorespreviously unknown regularities and trends from diverse datasets by learning from sets of data and extrapolating the new-found knowledge on testing data [1618]. This technique haswidely been adopted for biomedical applications. Lympho-cytic leukemia has successfully been identied using automaticmorphological analysis via machine learning [19,20]. Normal,sickle or other abnormalities of erythrocytes can be classied bythe combination of image feature analysis and machine learningprocedures [21]. Furthermore, successful recognition of splicejunction sites of human DNA sequences has been achieved viaself-organizing map (SOM), the BNN and support vector ma-chine (SVM) learning methods [18].</p><p>Recently, the machine learning has been applied as a toolfor detection of cardiac diseases. Classication of arrhythmiafrom ECG data set has successfully been performed by OneR,J48 and Nave Bayes learning algorithm [22,23]. Analysis ofmyocardial infarction can be achieved by SOM, the BNN androbust Bayesian classiers [24,25]. As mentioned before, MCGprovides better discriminating power between healthy and dis-ease cases. Therefore, attempts at application of machine learn-ing have preliminarily been performed with various algorithmson MCG by our group [26].</p><p>Herein, two machine learning methods, namely back-propagation neural network (BNN) and direct kernel self-organizing map (DK-SOM), have been demonstrated as repre-sentative of supervised and unsupervised learning techniques.Supervised learning involves learning by example pattern,while the unsupervised approach learns by discovering andadapting to structured features in the input patterns [27]. There-fore, we aim to demonstrate improved usage of the DK-SOMalong with application of the BNN as tools for concurrentidentication of myocardial ischemia from MCG.</p><p>2. Materials and methods</p><p>2.1. Subjects and data acquisition</p><p>MCG patterns were taken from 125 individuals. Among this,55 patterns were from patients who had been clinically evalu-ated having myocardial ischemia. The remaining 70 individu-als were healthy and show no evidences of cardiac abnormalsymptom. The data from both patient and healthy subjects wererandomly selected as training set (74 cases) and testing set (51cases).</p><p>MCG data were acquired from 36 locations above the torsoby making four sequential measurements in mutually adjacentpositions. At each position, the cardiac magnetic eld wasrecorded by nine sensors for 90 s using a sampling rate of1000Hz, resulting in 36 individual time series. For diagnosis ofischemia, MCG signal was recorded under default lter setting</p><p>at 0.05100Hz. The post-processing was performed by apply-ing digital low pass lter at 20Hz [28,29]. For automatic clas-sication, we used data from a time window between J-pointand T-peak (JT interval) of the cardiac cycle [30]. The JT in-terval was subdivided into 32 evenly spaced points resulting ina set of 1152 descriptors from 36 MCG signals for each indi-vidual case. Therefore, the descriptors from 125 cases (testingand training) were used as input for IHD prediction (Fig. 1).</p><p>2.2. Overview of machine learning approachesThe BNN, also called multilayer perceptron, is a supervised</p><p>learning method, where feeding of example input patterns to-gether with target or output patterns is needed to train the learn-ing network. A general BNN hierarchical network is illustratedin Fig. 2. It is comprised of one input layer, one or more hiddenlayers, and one output layer. Each layer contains a processingunit namely neurons or nodes [31], where the data computationsand transformations take place. The connections among nodesof the layers are made for further processing and transferringof the computed data. The strength of the signal ow via eachconnection is expressed by a numerical value known as learn-ing weight, wherein the path of knowledge ow is discovered.The learning process starts with a projection of each trainingdata to each neuron in the input layer. The data are then trans-ferred to the neurons of the hidden layer under the control ofinitial weights of learning, which are randomly generated andseeded to the connections. Each training pattern is propagatedin a forward manner (layer-by-layer) until an output pattern iscomputed. Afterwards, the computed output is compared to thedesired or the target output and an error value is determined.As the name infers, back-propagation neural network, the er-rors are fed in a backward manner (layer-by-layer) to adjustthe synaptic weights of the learning network. The process isrepeated several times for each pattern in the training set untilthe total output error converges to a minimum or until a settinglimit is reached during the training iterations [27].</p><p>The DK-SOM is referred to as a traditional SOM appliedon kernel transformed data. Prior to processing through SOM,the input data are introduced to the kernel transformation pro-cedures. Kernel transformation is a class of pattern recognitionalgorithm, which searches for similarity among the features ofthe descriptors. In this study, the nonlinear characteristics ofMCG are captured and hidden within the kernel before beingsubmitted to the SOM. SOM is an unsupervised machine learn-ing method developed by Kohonen [3234]. It is often appliedfor novel detection and automated clustering of unknown datapatterns based on iterative competitive learning method. Unitswithin competitive model may be organized into one or morefunctional layers, in which the units in a single layer are groupedinto disjoint cluster. During the learning processing, each unitwithin a cluster inhibits all others in the same cluster to at-tain the winning position. The unit receiving the largest inputachieves its maximum output while all others are driven to zero.Learning is accomplished only on the winner and neighboringneurons, where weight vectors are adjusted toward the inputpattern vector. No learning is taken place among the losers [27].</p></li><li><p>T. Tantimongcolwat et al. / Computers in Biology and Medicine 38 (2008) 817825 819</p><p>R</p><p>T-peak</p><p>J-point</p><p>J-T IntervalQ</p><p>S</p><p>Fig. 1. MCG waveform showing JT interval which was subdivided into 32 data points to serve as input for IHD prediction.</p><p>e Back-propagatederror(e)</p><p>ee</p><p>x1</p><p>Inputs Target values</p><p>t1y</p><p>e</p><p>e</p><p>x2 t2y</p><p>e</p><p>e</p><p>yxn tn</p><p>Input layer Hidden layers Output layer</p><p>Fig. 2. Hierarchical network of BNN [19].</p><p>After training is complete, the resultant of competitive learningis presented to the output layer, which is normally arranged aseither hexagonal or rectangular lattices. Usually, the hexagonallattice is preferred because it offers a better visualization. Theinput data that hold the same properties are clustered togetheron the nearby neuron, in which the distance from one anotheris low. Frequently, the distances are represented by color spec-trum on U-Matrix, a method for visualizing the cluster struc-ture of the SOM. Therefore, the map is shaded to indicate theclustering tendency. Areas having low distance values in theU-Matrix form clusters, while those with high distance valueson the U-Matrix indicate a cluster border [18].</p><p>2.3. BNN calculations</p><p>Supervised learning neural network calculations were per-formed with the back-propagation implementation of WEKA3.4.3 [4]. The data sets were initially normalized to a range of0 and 1 according to the following equation [35]:</p><p>xnorm = xi xminxmax xmin (1)</p><p>where xnorm, xi , xmin and xmax refer to as the normalized data,the value of each instance, the minimum value and the maxi-mum value of the data set, respectively.</p><p>To achieve the highest prediction performance and reductionof computational time, the redundancy and multi-collinearityof descriptors were discarded with UFS version 1.8. UFS isa computer software utilizing the variable reduction algorithmnamely Unsupervised Forward Selection (UFS) [36]. Briey,the UFS starts with selection of two descriptors that exhibitthe highest degree of pairwise correlation. The descriptors withsquared multiple correlation coefcient (R2max) less than thepredened value are selected while the rest are removed. Asa result, a reduced set of 278 descriptors were selected froma total of 1152 descriptors. The optimal number of selecteddescriptors was then determined by making empirical trial-and-error learning on each data set. Sets of descriptors exhibitingthe lowest predictive error were chosen as an optimal [35]. Asa rule of thumb, optimal conditions for the BNN calculationwere further performed by trial-and-error adjustment of variousparameters including the number of nodes in hidden layer, thelearning epoch, and the learning rate () and momentum ()constants. Each optimization calculation was performed for 10runs upon adjusting initial weight of learning (random seed)from 0 to 9. The average root mean square error (RMSE) of10 runs was subsequently used as the measure of predictiveerror. The optimal number of nodes in the hidden layer wasdetermined by varying the number of hidden nodes from 1to 30 and observing the number of node that gave the lowestRMSE. Next, the best training time was determined by varyingthe training time or learning epoch at the optimal number ofhidden nodes. As before, the training time that gave the lowestRMSE was chosen. Finally, learning rate and momentum weresimultaneously adjusted, the pair giving the lowest predictiveerror was chosen as optimal.</p><p>2.4. DK-SOM calculations</p><p>The DK-SOM calculations were carried out with the Ana-lyze/StripMiner software which was developed in-house [37].In order to improve the prediction performance, kernel trans-formation was performed on the data set as data transformationduring the preprocessing step. This procedure was referred to</p></li><li><p>820 T. Tantimongcolwat et al. / Computers in Biology and Medicine 38 (2008) 817825</p><p>Kernel (m x n)</p><p>kkkk1mk12k11</p><p>x1x</p><p>SOM</p><p>Data set(m x n)</p><p>knmkn2kn1</p><p>k2mk22k21Kmx2</p><p>xn</p><p>y</p><p>Fig. 3. Schematic illustration of DK-SOM operation.</p><p>as a direct kernel method. The nonlinear property of input datawas captured and hidden in the kernel. Next, the traditionalSOM calculation was applied on the kernel transformed data.The training procedure was divided into two phases, an order-ing phase and a ne-tuning phase. The weights of learning wereobtained by competitive learning while the winning neuron andits neighbors were updated according to the following equation:newwm =(1 ) oldwm + xm (2)where</p><p>xmis a pattern...</p></li></ul>