prognostic value of combined clinical and myocardial...
TRANSCRIPT
J A C C : C A R D I O V A S C U L A R I M A G I N G V O L . - , N O . - , 2 0 1 7
ª 2 0 1 7 B Y T H E AM E R I C A N C O L L E G E O F C A R D I O L O G Y F O U N D A T I O N
P U B L I S H E D B Y E L S E V I E R
I S S N 1 9 3 6 - 8 7 8 X / $ 3 6 . 0 0
h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . j c m g . 2 0 1 7 . 0 7 . 0 2 4
Prognostic Value of Combined Clinical andMyocardial Perfusion Imaging Data UsingMachine Learning
Julian Betancur, PHD,a Yuka Otaki, MD,a Manish Motwani, MB, CHB, PHD,a Mathews B. Fish, MD,bMark Lemley, CNMT,b Damini Dey, PHD,a Heidi Gransar, MS,a Balaji Tamarappoo, MD, PHD,a Guido Germano, PHD,a
Tali Sharir, MD,c Daniel S. Berman, MD,a Piotr J. Slomka, PHDa
ABSTRACT
FrobO
Ca
Na
of
con
All
Ma
OBJECTIVES This study evaluated the added predictive value of combining clinical information and myocardial
perfusion single-photon emission computed tomography (SPECT) imaging (MPI) data using machine learning (ML) to
predict major adverse cardiac events (MACE).
BACKGROUND Traditionally, prognostication by MPI has relied on visual or quantitative analysis of images
without objective consideration of the clinical data. ML permits a large number of variables to be considered in
combination and at a level of complexity beyond the human clinical reader.
METHODS A total of 2,619 consecutive patients (48% men; 62 � 13 years of age) who underwent exercise (38%) or
pharmacological stress (62%) with high-speed SPECT MPI were monitored for MACE. Twenty-eight clinical variables, 17
stress test variables, and 25 imaging variables (including total perfusion deficit [TPD]) were recorded. Areas under
the receiver-operating characteristic curve (AUC) for MACE prediction were compared among: 1) ML with all available
data (ML-combined); 2) ML with only imaging data (ML-imaging); 3) 5-point scale visual diagnosis (physician [MD]
diagnosis); and 4) automated quantitative imaging analysis (stress TPD and ischemic TPD). ML involved automated
variable selection by information gain ranking, model building with a boosted ensemble algorithm, and 10-fold stratified
cross validation.
RESULTS During follow-up (3.2� 0.6 years), 239 patients (9.1%) had MACE. MACE prediction was significantly higher for
ML-combined thanML-imaging (AUC: 0.81 vs. 0.78; p< 0.01). ML-combined also had higher predictive accuracy comparedwith
MD diagnosis, automated stress TPD, and automated ischemic TPD (AUC: 0.81 vs. 0.65 vs. 0.73 vs. 0.71, respectively; p< 0.01
for all). Risk reclassification for ML-combined compared with visual MD diagnosis was 26% (p< 0.001).
CONCLUSIONS ML combined with both clinical and imaging data variables was found to have high predictive
accuracy for 3-year risk of MACE and was superior to existing visual or automated perfusion assessments. ML could
allow integration of clinical and imaging data for personalized MACE risk computations in patients undergoing SPECT
MPI. (J Am Coll Cardiol Img 2017;-:-–-) © 2017 by the American College of Cardiology Foundation.
T raditionally, the prognostic value of myocar-dial perfusion single-photon emissioncomputed tomography (SPECT) imaging
(MPI) has been studied with semiquantitative visual
m the aDepartments of Imaging, Medicine, and Biomedical Sciences, Ce
regon Heart and Vascular Institute, Sacred Heart Medical Center, Spr
rdiology, Assuta Medical Centers, Tel Aviv, Israel. This research was s
tional Heart, Lung, and Blood Institute/National Institute of Health (PI: P
the authors and does not necessarily represent the official views of the N
tributed equally to this work. Drs. Berman, Germano, and Slomka have r
other authors have reported that they have no relationships relevant to t
nuscript received March 27, 2017; revised manuscript received July 5, 20
and quantitative analysis of image data (1–3). Anumber of previous studies have shown that clinicaldemographics, functional parameters, and hemody-namic and stress results all affect the evaluation of
dars-Sinai Medical Center, Los Angeles, California;
ingfield, Oregon; and the cDepartment of Nuclear
upported in part by grant R01HL089765 from the
iotr Slomka). The content is solely the responsibility
ational Institutes of Health. Drs. Betancur and Otaki
eceived royalties from Cedars-Sinai Medical Center.
he contents of this paper to disclose.
17, accepted July 5, 2017.
ABBR EV I A T I ON S
AND ACRONYMS
CAD = coronary artery disease
CT = computed tomography
MACE = major adverse cardiac
events
MD = physician
ML = machine learning
MPI = myocardial perfusion
imaging
SPECT = single-photon
emission computed
tomography
TID = transient ischemic
dilation
TPD = total perfusion defi
Betancur et al. J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7
Machine Learning for Automated MACE Prediction - 2 0 1 7 :- –-
2
MPI (4–7). This integration of clinical informa-tion and imaging data into a final impression iscurrently performed subjectively by physi-cians when they assess the MPI test, often ina nonstandardized manner.
Machine learning (ML) is a field of com-puter science that uses computer algorithmsto identify patterns in large multivariabledatasets and can be used to predict out-comes. In recent years, ML has been used forprediction and decision-making in a multi-tude of disciplines, including internet searchengines, customized advertising, naturallanguage processing, finance trending, androbotics (8–10). For MPI, a large number ofparameters, including clinical variables,
stress test results, and imaging data variables, couldbe considered by ML for outcome prediction. Weevaluated the benefits of combining all of thesevariables using an ML algorithm to predict majoradverse cardiac events (MACE) (8). ML predictionusing combined data was also compared with physi-cian (MD) diagnosis (based on a visual read withawareness of clinical data) and with automatedperfusion quantification indexes (stress and ischemictotal perfusion deficit [TPD]).
METHODS
STUDY POPULATION. A total of 2,689 consecutivepatients who were referred for clinically indicatedexercise or pharmacological stress MPI at SacredHeart Medical Center between January 2010 andDecember 2011 were included. The study wasapproved by the institutional review board, includinga waiver for informed consent. After excluding 70patients with early revascularization within 90 days,2,619 patients were included for further analysis.
CLINICAL DATA. Clinical data were derived frompatients’ medical records and included age, sex, andrisk factors. Recorded risk factors were hypertension,diabetes mellitus, dyslipidemia, and smoking(defined as current smoking or cessation within 3months of testing), and family history of prematureclinical coronary artery disease (CAD). Presence ofchest pain, and type and shortness of breath wereassessed by the stress testing MD.
MPI AND STRESS PROTOCOLS. Resting and/or stress1-day 99mtechnetium-sestamibi imaging was per-formed using a high-efficiency, solid-state SPECTscanner (D-SPECT, Spectrum-Dynamics, Haifa, Israel)(11). Weight-adjusted doses of 353 � 151 MBq (9.5 �4.1 mCi) for rest and 1,252 � 196 MBq (34 � 5.3 mCi) forstress (recommended by vendor) were used (12),
cit
equivalent to a total average effective dose of10.7 mSv based on the latest International Commis-sion on Radiological Protection 103 estimates (13).Patients underwent symptom-limited Bruce protocolexercise testing (38%) or pharmacological stress (62%;regadenoson 0.4 mg) with injection at peak stress.Resting image acquisition was performed supine with6- to 10-min acquisition time, based on patient bodymass index. Upright and supine stress imaging (4 to6 min) began 15 to 30 min after stress.
Transaxial images were generated from list modedata maximum likelihood expectation maximizationreconstruction (11). No attenuation or scatter correc-tion was applied. Images were automaticallyre-oriented into short-axis, and vertical and horizontallong-axis slices with Quantitative Perfusion SPECT(QPS)/Quantitative Gated SPECT (QGS) software(Cedars-Sinai Medical Center, Los Angeles, California).
VISUAL PERFUSION ANALYSIS. The visual analysiswas done by multiple MDs who were aware of patientclinical information and quantitative assessment atthe time of the study. Reader scan interpretation (MDdiagnosis) was scored as 0 ¼ normal, 1 ¼ equivocal,2 ¼ probably abnormal, 3 ¼ abnormal, or 4 ¼ defi-nitely abnormal. A 3-step scale probability of CAD wasalso reported (0 ¼ low, 1 ¼ intermediate, 2 ¼ high).
AUTOMATED QUANTIFICATION. All image datasetswere de-identified, transferred to Cedars-Sinai Med-ical Center, and quality control was checked by asingle experienced core laboratory technologistwithout knowledge of clinical data. Automaticallygenerated myocardial contours by QPS/QGS softwarewere evaluated, and when necessary, contours wereadjusted to correspond to the myocardium. Uprightand supine images were quantified as previouslydescribed (14). We used automatic TPD, a quantita-tive perfusion variable that reflects a combination ofdefect extent and severity, and produces stress, rest,and ischemic (stress – rest) TPD values. Ejectionfraction, and systolic and diastolic volumes at stressand rest were quantified separately for each acquisi-tion using standard QGS software with 8 frames percardiac cycle. Transient ischemic dilation (TID) wascomputed as previously described (15). Counts in theleft ventricle were obtained by planar projections ofthe left ventricular region defined during the firststep of data reconstruction (16).
OUTCOME AND FOLLOW-UP DATA COLLECTION.
The endpoint was MACE, which consisted ofall-cause mortality, nonfatal myocardial infarction,unstable angina, or late coronary revascularization(percutaneous coronary intervention or coronaryartery bypass grafting). All-cause mortality was
FIGURE 1 Machine Learning Pathway
Data – 2,619 Cases with Imaging, Stress Test and Clinical Data
Variable Selection – Information Gain Ratio Ranking
Stra
tified
10-F
old
Cros
s Val
idat
ion
Model Building – LogitBoost
Derive MACE probability scores for entire population from 10 models
Repeat× 10 × 10
Model:
Estimate overall prediction by combining all probability scores
1 2 3 ... ... 10
10% holdout for Testing
90% for Training
6 54
3
2110
9
8
7
10%
The overall population is divided into 10 equally sized groups (1, 2,., 10) with approximately the same incidence of major adverse cardiac
events (MACE) (stratified). Of the 10 groups, 1 (10%) is retained as the test set (holdout set), and the others (90%) are used as the training
set. To estimate the machine learning (ML) performance for all the data, the cross-validation procedure loops 10 times over these groups, each
time performing variable selection and model building with a different training set, and then testing this model on the unseen test set.
Therefore, each data point is used once for testing and 9 times for training, and the result is 10 experimental LogitBoost models trained on
90% fractions. Once finished, the estimates of MACE probability for each of the 10 holdout sets derived by the corresponding 10 models are
concatenated to provide an overall expected estimate of ML performance with unseen (holdout) data.
J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7 Betancur et al.- 2 0 1 7 :- –- Machine Learning for Automated MACE Prediction
3
determined from the Social Security Death Index andcombined with MACE obtained from the hospitalelectronic medical records, including all clinics, aswell as cardiology group and hospital visits. Nonfatalmyocardial infarction was defined based on thecriteria of hospital admission for chest pain, elevatedcardiac enzyme levels, and typical changes on theelectrocardiogram (17). The first event in each patientwas used as the outcome. Patients with earlyrevascularization #90 days after MPI were excluded.
MACHINE LEARNING. Figure 1 illustrates the MLpathway, which involved automated variable selec-tion by information gain ratio ranking and modelbuilding with a boosted ensemble algorithm, bothworked into a stratified 10-fold cross validation pro-cedure, as reported in our previous work (8). MLtechniques were implemented in the open-sourceWaikato Environment for Knowledge Analysis(WEKA) platform 3.8.0 (University of Waikato,Hamilton, New Zealand) (18).
VARIABLE SELECTION. Twenty-five imaging datavariables, 17 stress test variables, and 28 clinicalvariables were available for variable selection by the
information gain ratio (18). Information gain ratiooffers a measure of the effectiveness of a variable inclassifying the training data. Only variables thatresulted in an information gain ratio >0 were subse-quently used in model building (Figure 2B).
MODEL BUILDING. Predictive classifiers for MACEscoring were developed by an ensemble (“boosting”)LogitBoost algorithm. The principle behind MLensemble boosting is to combine the prediction ofsimple classifiers with weak performances to create asingle strong classifier (19). These weak predictionsare then combined in an ensemble (weightedmajority voting) to derive an overall classifier, the MLscore.
CROSS VALIDATION. The performance and generalerror estimation of the entire ML process (variableselection and LogitBoost) were assessed using strati-fied 10-fold cross validation (Figure 1), which iscurrently the preferred validation technique in ma-chine learning (18). The main advantages of thistechnique, compared with the conventional split-sample approach, are: 1) it reduces the variance inprediction error; 2) it maximizes the use of data for
FIGURE 2 Variable Selection
0
A B
Stress EF (%)
Information Gain Ratio AUC
Rest EF (%)Rest TPD (%)
Rest EDV (ml)
Stress supine TPD (%)
Stress upright TPD (%)
Stress combined TPD (%)Ischemic supine TPD (%)Body mass index (kg/m2)
Stress heart rate (beats/min)Reason for termination (1-11)
Location of patient (1-3)Rest ECG abnormality (0,1)
Past myocardial infarction (0,1)Exercise stress (0,1)
Past other open heart surgery (0,1)Weight (kg)
Past CABG (0,1)Rest scan (0,1)
Post TAVR (0,1)Age (yrs)
LV count rest supine
Stress diastolic BP peak (mm Hg)
Stress systolic BP peak (mm Hg)Past PCI (0,1)
Stress EDV (ml)
0.02 0.04 0.48 0.58 0.68 0.780.06 0.08 0.1
Resting BP diastole (mm Hg)ECG response to stress (1-5)
LV counts stress upright
LV counts stress supineTranscient ischemic dilation
Diabetes mellitus (0,1)Rest dose (MBq)
Stress dose (MBq)Presenting symptoms (1-4)
Quality of study (1-5)Imaging protocol (1,2)
Hypertension (0,1)Family history (0,1)
Clinical Indications for test (1-22)Clinical response to stress (1-5)
Exercise duration (min)Sex (M,F)
Pharmocological stress agent (1-5)Stress upright scan time (min) Information gain ratio > 0
Information gain ratio = 0Time of ECG changes response (min)Under drug influence
ST deviation direction (elevation, depression)ST sloping (up, down, horizontal)
Artifacts (0,1)
Dyslipidemia (0,1)
Smoking (0,1)
Height (cm)
Exercise work load (METs)
Chest pain with exercise index (0-2)Stress ST deviation at stress (mm)
Rest scan time (min)Stres supine scan time (min)
Heart rhythm (1-4)Old myocardial infarction (0,1)
Exercise protocol (Bruce , modified Bruce)Conduction disease (0,1)
Resting BP systole (mm Hg)Resting heart rate (beats/min)
Left ventricular hypertrophy (0,1)Maximal predicted heart rate (beats/min)
ST changes at rest (0,1)
Post cardiac transplant (0,1)
Peripheral vascular disease (0,1)Carotid artery disease (0,1)
Stress supine TPD (%)Stress heart rate (beats/min)
Stress systolic BP peak (mm Hg)Stress combined TPD (%)
Rest TPD (%)
Stress diastolic BP peak (mm Hg)Pharmocological stress agent (1-5)
Reason for termination (1-11)Rest ECG abnormality (0,1)
ECG response to stress (1-5)Transcient ischemic dilation
Stress EF (%)Location of patient (1-3)
Exercise protocol (Bruce , modified Bruce)Exercise stress (0,1)
Resting BP diastole (mm Hg)Rest EF (%)
Diabetes mellitus (0,1)ST changes at rest (0,1)
LV count rest supineRest EDV (ml)
Body mass index (kg/m2)Past PCI (0,1)
Hypertension (0,1)Stress dose (MBq)
Quality of study (1-5)Weight (kg)
Family history (0,1)LV counts stress supine
Stres supine scan time (min)Past CABG (0,1)
Rest dose (MBq)LV counts stress upright
Past myocardial infarction (0,1)Maximal predicted heart rate (beats/min)
Resting BP systole (mm Hg)Carotid artery disease (0,1)
Peripheral vascular disease (0,1)
Chest pain with exercise index (0-2)Clinical Indications for test (1-22)
Resting heart rate (beats/min)Exercise work load (METs)
Time of ECG changes response (min)ST deviation direction (elevation, depression)
ST sloping (up, down, horizontal)Under drug influence
Heart rhythm (1-4)Dyslipidemia (0,1)
Smoking (0,1)
Stress ST deviation at stress (mm)Rest scan time (min)
Imaging protocol (1,2)
Left ventricular hypertrophy (0,1)Conduction disease (0,1)
Post TAVR (0,1)Past other open heart surgery (0,1)
Rest scan (0,1)Exercise duration (min)
Presenting symptoms (1-4)Clinical response to stress (1-5)
Post cardiac transplant (0,1)Old myocardial infarction (0,1)
Artifacts (0,1)
Height (cm)
Sex (M,F)
Stress upright scan time (min)
Stress EDV (ml)
Age (yrs)
Ischemic supine TPD (%)Stress upright TPD (%)
(A) Twenty-five imaging data (gray bars: 22 selected), 17 stress test (pink bars: 8 selected) and 28 clinical (green bars: 17 selected) variables ranked by their mean
(95% confidence interval [CI]) information gain ratio within 10-fold cross-validation. (B) Same variables ranked by their individual area under the receiver-operating
characteristic curve (AUC) [95% CI] for MACE prediction. Variables selected by information gain ratio are shown as solid bars. Nonselected variables are shown by open
bars. BP ¼ blood pressure; beats/min ¼ beats per minute; CABG ¼ coronary artery bypass graft; ECG ¼ electrocardiography; EDV ¼ end-diastolic volume; EF ¼ejection fraction; ESV ¼ end-systolic volume; LV ¼ left ventricular; MET ¼ metabolic equivalent; PCI ¼ percutaneous coronary intervention; TAVR ¼ transcatheter
aortic valve replacement; TPD ¼ total perfusion deficit; other abbreviations as in Figure 1.
Betancur et al. J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7
Machine Learning for Automated MACE Prediction - 2 0 1 7 :- –-
4
both training and validation, without overfitting oroverlap between the test and validation data; and 3) itguards against testing hypotheses suggested by arbi-trarily split data (20).
STATISTICAL ANALYSIS. Using receiver-operatingcharacteristic analysis and pairwise comparisonsaccording to DeLong et al. (21), the predictive accu-racy for MACE was compared among: 1) ML with all
TABLE 1 Patient Characteristics
All Patients(N ¼ 2,619)
MACEþ(n ¼ 239)
MACE�(n ¼ 2,380) p Value
Age, yrs 62 � 13 70 � 12 62 � 12 <0.0001
Men 1,247 (48) 128 (54) 1,119 (47) 0.054
Body mass index, kg/m2 31 � 8 30 � 9 32 � 8 <0.01
CAD risk factors
Diabetes 691 (26) 100 (42) 591 (25) <0.001
Hypercholesterolemia 1,491 (57) 141 (59) 1,350 (57) 0.5
Hypertension 1,692 (65) 181 (76) 1,511 (63) <0.001
Family history of CAD 1,006 (38) 66 (28) 940 (40) <0.001
Smoker 662 (25) 65 (27) 597 (25) 0.474
Typical angina 301 (11) 38 (16) 263 (11) <0.05
History of CAD
Previous MI 130 (5) 31 (13) 99 (4) <0.001
Previous PCI 231 (9) 52 (22) 179 (8) <0.001
Previous CABG 172 (7) 36 (15) 136 (6) <0.001
Values are mean � SD or n (%).
CABG ¼ coronary artery bypass graft; CAD ¼ coronary artery disease; MACE ¼ major adverse cardiac event;MI ¼ myocardial infarction; PCI ¼ percutaneous coronary intervention.
TABLE 2 Pharmacologic and Exercise Stress Test Results
Pharmacologic stress(n ¼ 1,614)
MACEþ(n ¼ 217)
MACE�(n ¼ 1,397) p Value
Resting heart rate, beats/min 75 � 14 73 � 13 <0.05
Peak heart rate at stress, beats/min 95 � 19 103 � 20 <0.0001
Resting SBP, mm Hg 132 � 22 132 � 20 0.577
Resting DBP, mm Hg 73 � 12 77 � 12 <0.001
Peak SBP, mm Hg 131 � 27 143 � 27 <0.0001
Peak DBP, mm Hg 70 � 12 76 � 13 <0.0001
Exercise stress(n ¼ 1,005)
MACEþ(n ¼ 22)
MACE�(n ¼ 983) p Value
Resting heart rate, beats/min 81 � 13 76 � 13 0.072
Peak heart rate at stress, beats/min 142 � 13 148 � 13 <0.05
Resting SBP, mm Hg 128 � 19 126 � 17 0.647
Resting DBP, mm Hg 74 � 9 79 � 10 <0.05
Peak SBP, mm Hg 179 � 27 181 � 25 0.703
Peak DBP, mm Hg 84 � 10 83 � 12 0.700
Ischemic ST change during exercise stress 7 (32) 175 (18) 0.091
Values are mean � SD or n (%).
DBP ¼ diastolic blood pressure; SBP ¼ systolic blood pressure; other abbreviation as in Table 1.
J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7 Betancur et al.- 2 0 1 7 :- –- Machine Learning for Automated MACE Prediction
5
available data (ML-combined); 2) ML with only im-aging data (ML-imaging); 3) a 5-point scale visualdiagnosis (MD diagnosis); and 4) automated quanti-tative imaging analysis (stress TPD and ischemicTPD). Brier score and Pearson correlation werecomputed between predicted and observed MACE(22). For all analyses, MACE-free patients werecensored to their follow-up date. To define the low-risk limit for MACE prediction by ML-combined, weused clinical diagnosis ¼ 0, which is considered asdefinitely normal scans, as a well-established, low-risklimit. Then, low-risk cutoffs for ML-combined and TPDwere calculated for approximately the same popula-tion percentile as for the MD diagnosis ¼ 0 (87thpercentile). Subsequently, improvement in riskclassification using ML-combined compared with theMD diagnosis was assessed with a 5-category reclas-sification. Statistical calculations were performedusing R software version 3.3.1 (R Foundation, Vienna,Austria) and PredictABEL package (R Foundation) forthe reclassification.
RESULTS
STUDY POPULATION AND OUTCOME. Table 1 showsthe baseline clinical characteristics of the studiedpopulation. When the first event per patient wasconsidered, there were 239 (9.1%) 3-year MACE, with150 (5.7%) all-cause deaths, 11 (0.4%) nonfatal MIs, 24(0.9%) unstable anginas, and 54 (2.1%) late targetrevascularizations. The observed annual MACE ratewas 3%.
HEMODYNAMIC AND MPI RESULTS. Table 2 showshemodynamic and stress results separately for phar-macological stress and for exercise stress. The fre-quency of exercise stress was lower among patientswith MACE compared with those without MACE(9% with MACE vs. 41% without MACE; p < 0.0001).Table 3 shows quantitative and visual MPI results. Forthe quantitative evaluation of perfusion and func-tion, 9.8% of myocardial contours were corrected bythe core laboratory technologist.
VARIABLE SELECTION. Figure 2A shows the averageinformation gain ratio within 10-fold cross validation.On average, 22 imaging data, 8 stress tests, and 17clinical variables were selected. All perfusion andfunctional variables from MPI had an informationgain ratio >0, including left ventricular counts andinjected dose. Top 9 selected variables were allimaging data variables.
MACE PREDICTION BY INDIVIDUAL VARIABLES. Figure 2Bshows the area under the receiver-operating charac-teristic curve (AUC) for the prediction of MACE by
each individual variable. Stress TPD, stress heart rate,ischemic TPD, stress systolic blood pressure, restingTPD, and age were the best individual predictors.Compared with the information gain ratio inFigure 2A, there were some variables for which indi-vidual AUCs were predictive, yet they did not offerincremental information gain for predicting MACE(white bars). Furthermore, the variables with highestAUCs did not always have the highest informationgain ratio.
MACE PREDICTION BY COMBINED VARIABLES. MACEprediction was significantly higher for ML-combined
TABLE 3 Perfusion and Functional Results
MACEþ(n ¼ 239)
MACE�(n ¼ 2,380) p Value
MD-diagnosis: normal 142 (59) 2,138 (90) <0.001
MD-diagnosis: abnormal or definitely abnormal 89 (37) 217 (9) <0.001
Stress TPD, % 9 � 11 3 � 5 <0.0001
Ischemic TPD, % 4 � 4 2 � 3 <0.0001
Resting TPD, % 5 � 9 1 � 3 <0.0001
Stress EDV, ml 112 � 57 91 � 36 <0.0001
Stress ESV, ml 96 � 57 73 � 33 <0.0001
Stress EF, % 46 � 9 49 � 3 <0.0001
Rest EDV, ml 105 � 52 89 � 34 <0.0001
Rest ESV, ml 89 � 52 71 � 31 <0.0001
Rest EF, % 46 � 8 49 � 3 <0.0001
Transient ischemic dilation 1.09 � 0.16 1.03 � 0.14 <0.0001
Values are n (%) or mean � SD.
EDV ¼ end-diastolic volume; EF ¼ ejection fraction; ESV ¼ end-systolic volume; MD ¼ physician; TPD ¼ totalperfusion deficit; other abbreviation as in Table 1.
FIGURE 3 ROC Cur
0.0
1.0
Sens
itivi
ty
M
0.2
0.4
0.6
0.8
1.0
ML combining all va
(ML-combined) had
imaging data variabl
**p < 0.001, in AUC
characteristic; other
Betancur et al. J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7
Machine Learning for Automated MACE Prediction - 2 0 1 7 :- –-
6
than ML-imaging (AUC: 0.81, 95% confidence interval[CI]: 0.78 to 0.83 vs. AUC: 0.78, 95% CI: 0.75 to 0.81;p < 0.01). ML-combined also had a higher AUCcompared with the AUCs of automated stress TPDand automated ischemic TPD (Figure 3), and
ves for Prediction of 3-Year MACE (239 of 2,619 Events)
Specificity
achine Learning (ML)
AUC (bars) and 95% CI (whiskers)
ML-combined 0.81* **
0.78
0.73
0.72
ML-imaging
Stress TPD
Ischemic TPD
0.8 0.6 0.4 0.2 0.0
riables using variable selection and LogitBoost algorithm
a significantly higher AUC for MACE prediction than ML combining
es only (ML-imaging), and standard image analysis. *p < 0.01;
comparison by DeLong test. ROC ¼ receiver-operating
abbreviations as in Figures 1 and 2.
compared with the AUCs for probability of CAD (0.64;95% CI: 0.61 to 0.66) or MD diagnosis (0.65; 95% CI:0.62 to 0.68), as reported by the MD (all p < 0.001).When stress test variables were added to image var-iables for ML integration, AUC did not changesignificantly (AUC: 0.79, 95% CI: 0.76 to 0.82 vs. AUC:0.78, 95% CI: 0.75 to 0.81; p ¼ 0.4).
The Brier score for ML-combined prediction ofMACE was 0.07, which indicated good calibrationbetween ML scores (estimated predicted risk) andobserved 3-year risk. The plot of observed MACEversus predicted MACE over percentiles of ML-combined risk is shown in Figure 4. High correlationof ML-combined predicted MACE versus observedMACE was found (r ¼ 0.97; p < 0.0001).
RISK RE-CATEGORIZATION. To allow categoricalcomparison, a low-risk, ML-combined score (<0.15)was determined as the cutoff that defined thesame percentile as visual MD diagnosis ¼ 0 (87thpercentile). This percentile also approximatelycorresponded to the stress TPD threshold of <5% (14).For patients within the 95th to 100th percentileof the ML-combined score, 19% (25 of 131) ofpatients had a normal MD diagnosis and 10% (13 or131) had stress TPD of <5% (Figure 5). Finally, a5-category risk reclassification was 26% forML-combined scores compared with a 5-category MDdiagnosis (p < 0.001) (Table 4), with 30.5% improvedidentification of patients with MACE and �5%decreased identification of MACE-free patients (allp < 0.001).
DISCUSSION
We developed and validated a highly accurate,personalized method for post-MPI risk computationthat used ML. This approach allowed the combinationof all available clinical, stress test, and automaticallyderived imaging data variables without a prioriassumptions about the influence or weighting ofindividual factors, or how they may interact. Themethod was used to evaluate the added value ofclinical and stress test information for the predictionof MACE after MPI. The observed 3% annual MACErate was similar to previous studies that assessed theprognostic value of SPECT MPI (4). The only humaninput required for the derivation of the ML-combinedMACE risk score was the collation of clinical data fromhealth records (conceivably a task fulfilled byadvanced text mining in the future) and the adjust-ment of contours by the technologists in a minority(<10%) of the cases. Figure 6 illustrates how theproposed ML model would allow prediction of therisk of MACE for an individual unknown case by
FIGURE 4 Observed Versus Predicted 3-Year Risk of MACE
00 0.0
0.1
0.2
0.3
0.4
0.5
0.6
10
20
30
40
50
60
5 10 15 20 25 30 35 40 45Percentile of ML Score
Obse
rved
: Pro
port
ion
of E
vent
s (%
)
Predicted: ML Score
50 55 60 65 70 75 80 85 90 95 100
Observed Predicted
Observed proportion of events (pink bars) and predicted ML score (green points) grouped by every fifth percentile of risk. Abbreviations as in Figure 1.
FIGURE 5 Frequency of Normal Clinical Diagnosis and Low Perfusion Scores by
Predicted ML Risk Percentile
0
< 2525-4
950-74
75-9
4≥ 95
< 2525-4
950-74
75-9
4≥ 95
20
40
60
80
10099% 97%
93%
19%
97%
87%95%
56%
10%
69%
Percentile of ML Score
Normal Clinical Diagnosis Stress TPD < 5%
Freq
uenc
y (%
)
The frequency of patients with normal clinical diagnosis and low automated perfusion
score (TPD <5%) across percentiles of the ML score. Abbreviations as Figures 1 and 2.
J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7 Betancur et al.- 2 0 1 7 :- –- Machine Learning for Automated MACE Prediction
7
automatically integrating the clinical data with theimaging data.
The performance of the ML-combined score wassuperior to image risk metrics that are traditionallyused to study prognostic outcomes after MPI (1–7).The AUC estimate, derived in a rigorous manner withtest and training data separated within 10-fold crossvalidation (preventing overfitting) was substantiallyhigher than that for ML-imaging, as well as visual orautomated MPI assessment. Furthermore, riskreclassification analysis demonstrated that theML-combined risk allowed better classification ofhigh-risk patients than visual clinical diagnosis. Riskreclassification revealed that the ML-combined scorecould increase the risk score for >30% of patientswith MACE incidence, but also increased the riskscore for 5% of MACE-free patients. At the same time,we found that 19% of the patients in the highestML-combined risk category (top 5%), with a MACEincidence of 38%, were still read as normal scans witha MD diagnosis ¼ 0. These observations highlight thedifficulty in finding the appropriate thresholds for themulticategory risk scores. The low-risk threshold inthis study was derived for the same populationpercentile as “normal” visual scans, and subsequenthigher risk thresholds were defined at 5% incrementsof increasing ML risk score. Furthermore, we foundthat automatically derived stress and/or ischemic
TPD had better predictive value for MACE than aclinical diagnosis, which was in line with our previousreports (9,23), but has not been previously reported inprognostic studies.
To our knowledge, this was the first study thatapplied ML to predict MACE in patients who
FIGURE 6 Illustrat
PatientStre
MPerf
QGS ¼ quantitative g
single-photon emiss
TABLE 4 Risk Reclassification by ML Versus MD Diagnosis
MD Diagnosis
ML-Boosting Risk Category
TotalLow<0.15
Equivocal0.15–0.2
Mild0.2–0.25
Moderate0.25–0.3
Severe$0.3
MACE (n ¼ 239)
Normal 99 19* 9* 7* 8* 142
Equivocal 1† 0 1* 0* 2* 4
Probably abnormal 2† 0† 0 1* 1* 4
Abnormal 11† 5† 8† 7 55* 86
Definitely abnormal 1† 1† 0† 1† 0 3
Total 114 25 18 16 66 239
No MACE (n ¼ 2,380)
Normal 1,959 95* 35* 16* 33* 2,138
Equivocal 5† 1 0* 2* 3* 11
Probably abnormal 8† 0† 0 3* 3* 14
Abnormal 69† 29† 21† 23 67* 209
Definitely abnormal 3† 0† 1† 1† 3 8
Total 2,044 125 57 45 109 2,380
Reclassification 26%
*Up-risking by machine learning (ML). †De-risking by ML.
Abbreviations as in Tables 1 and 3.
Betancur et al. J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7
Machine Learning for Automated MACE Prediction - 2 0 1 7 :- –-
8
underwent MPI. Recently, our group assessed thefeasibility and accuracy of ML to predict 5-yearall-cause mortality in 10,030 patients who under-went coronary computed tomography (CT) angiog-raphy (8). In this analysis, ML exhibited a higher AUCcompared with the Framingham risk score or visualCT severity scores alone (8). Automated processing ofCT images was not used. In contrast, the presentstudy capitalized on established automated process-ing software tools that were validated in nuclear
ion of Prognostic Risk Computation in an Individual Patient by the Propo
ss, Rest Scans(QPS/QGS orEquivalent)
ImageQuantification
yocardialusion SPECTImaging
ImagingData
Variables
ated single-photon emission computed tomography; QPS ¼ quantitative pe
ion computed tomography; other abbreviation as in Figure 1.
cardiology to provide multiple imaging data variableswith limited manual interaction. The intent was todemonstrate the feasibility of edging us closer to acompletely automated computer-powered imaginganalysis and risk assessment. A future direction andpotential next step will be to develop tools that arealso capable of automatically extracting clinical vari-ables, for example, by text mining electronic healthrecords.
The ML approach provides a computational inte-gration of all available information that is notfeasible for subjective analysis by the reportingphysician. As part of the clinical decision-making,physicians take into account clinical and stresstesting data; however, this is done subjectivelywithout a systematic way of integrating information.Furthermore, although including these variables aspart of the MPI report is recommended by guide-lines, integration of these findings in the report isnot yet part of standardized reporting guidelines(24,25). Intuitive patient-specific weighting of allindividual clinical and imaging factors for assessingrisk could not be expected to be precise, or consis-tent among different medical centers, whetherperformed by the interpreting physician or thephysician managing the patient.
Although the average patient radiation dose(10.7 mSv) used in this study was higher than thosespecified in current guideline recommendations (26),the data were collected before the latest guidelineswere adopted, using the same day rest�first protocoloptimized for the acquisition speed rather than for
sed ML Model
Machine Learning Model
MACE Risk Prediction
MACE Risk
Database
ElectronicMedical Records
Physician
Stress test and Clinical Variables
rfusion single-photon emission computed tomography; SPECT ¼
PERSPECTIVES
COMPETENCY IN MEDICAL KNOWLEDGE: Combining
clinical and imaging information by an ML algorithm exhibited
significantly better MACE prediction than using only imaging
information or performing visual and automated perfusion
assessment alone in SPECT MPI.
TRANSLATIONAL OUTLOOK: Adding clinical information to
imaging data by ML will aid comprehensive MPI assessment to
improve clinical patient management.
J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7 Betancur et al.- 2 0 1 7 :- –- Machine Learning for Automated MACE Prediction
9
the radiation dose. Furthermore, a weight-basedprotocol was used, and most of the patients wereobese (body mass index $30 kg/m2). It is likely that atleast a 50% lower effective radiation dose could beachieved with longer acquisition times without anyeffect on image quality, as previously studied (16).Further dose reductions could be achieved withstress-first and/or stress-only protocols.
IMPLICATIONS. The ability to optimally assess riskin individual patients remains a major challenge incardiology. With MPI, visual image analysis itself issubjective, and the overall risk assessment that in-corporates clinical, stress test, and imaging results,is highly variable, based on physician knowledgeand experience, and limited by the complexity ofappropriately assigning weight to individual factors.The presented ML score provides an automatedprecise and objective risk estimate that combinesimaging, clinical, and stress testing variables. Thesame optimal method for risk computation wouldbe readily available to all imaging centers, includingless experienced centers. The practical imple-mentation will depend on the ability to interfacethe MPI reporting workstation with electronic pa-tient records, to access the clinical variables. Such atool could be perhaps interfaced with large registrydata (e.g., the ImageGuide registry of the AmericanSociety of Nuclear Cardiology [25]), which couldcollect clinical variables similar to those used in thisstudy. The implementation will depend on theavailability of the interface to the electronic healthrecords.
STUDY LIMITATIONS. This was a single-center study,and further multicenter and external validation ofthe derived risk score will be required. Future workshould include the definition of the optimal MLthreshold, to validate prospective practical clinicalimplementation. The sample size was modest andfollow-up was only 3 years; however, all results weresignificant. Although training data were alwaysseparated from test data within the 10-fold crossvalidation, it is not yet known how well such an MLscore can extrapolate among different centers, pa-tient populations, and follow-up time. Although weincluded key perfusion and function imaging vari-ables in this study, the list was not exhaustive. Thederived ML score was generic and could be applied toboth pharmacological and stress protocols, becausethe ML technique uses the information about thetype of test internally. However, further evaluationof ML risk stratification for MACE prediction in
specific subpopulations, for example, in patients withsuspected disease, patients with early revasculariza-tion, or patients undergoing adenosine protocols,may be appropriate in multicenter studies. Riskreclassification metrics have limitations such asdependence on the choice of cutoff values of thecontinuous probability risk score. It is likely thatmore appropriate threshold selection in futurestudies may optimize the reclassification patterns forspecific clinical risks. Alternatively, the MACE riskscore without any categories could be also usedclinically to indicate the probability of events for agiven patient. Finally, we selected a LogitBoostapproach for automatic ML variables integration, asin our previous work (8), but the LogitBoost approachwe used is only one of many possible ML approachesto combine multiple variables for prediction. It ispossible that different approaches such as deeplearning may provide more optimal risk score deri-vation. However, a larger multicenter data set isrequired to evaluate possible advantages of other MLapproaches.
CONCLUSIONS
ML combining both clinical and imaging data vari-ables was found to have high predictive accuracy forthe 3-year risk of MACE, and was superior to existingvisual or automated perfusion assessments in isola-tion. This computational method could allow inte-grating the clinical data with imaging results for theoptimal evaluation of MACE risk in patients under-going MPI.
ADDRESS FOR CORRESPONDENCE: Dr. Piotr J.Slomka, Artificial Intelligence in Medicine Program,Cedars-Sinai Medical Center, 8700 Beverly Boule-vard, Suite A047N, Los Angeles, California 90048.E-mail: [email protected].
Betancur et al. J A C C : C A R D I O V A S C U L A R I M A G I N G , V O L . - , N O . - , 2 0 1 7
Machine Learning for Automated MACE Prediction - 2 0 1 7 :- –-
10
RE F E RENCE S
1. Gimelli A, Rossi G, Landi P, et al. Stress/restmyocardial perfusion abnormalities by gatedSPECT: still the best predictor of cardiac events instable ischemic heart disease. J Nucl Med 2009;50:546–53.
2. Hachamovitch R, Kang X, Amanullah AM, et al.Prognostic implications of myocardial perfusionsingle-photon emission computed tomography inthe elderly. Circulation 2009;120:2197–206.
3. Shaw LJ, Berman DS, Maron DJ, et al. Optimalmedical therapy with or without percutaneouscoronary intervention to reduce ischemic burden:results from the Clinical Outcomes UtilizingRevascularization and Aggressive Drug Evaluation(COURAGE) trial nuclear substudy. Circulation2008;117:1283–91.
4. Shaw LJ, Iskandrian AE. Prognostic value ofgated myocardial perfusion SPECT. J Nucl Cardiol2004;11:171–85.
5. Kang X, Berman DS, Lewin HC, et al. Incre-mental prognostic value of myocardial perfusionsingle photon emission computed tomography inpatients with diabetes mellitus. Am Heart J 1999;138:1025–32.
6. Hachamovitch R, Berman DS, Kiat H, et al. Ex-ercise myocardial perfusion SPECT in patientswithout known coronary artery disease: incre-mental prognostic value and use in risk stratifica-tion. Circulation 1996;93:905–14.
7. Sharir T, Germano G, Kang X, et al. Prediction ofmyocardial infarction versus cardiac death bygated myocardial perfusion SPECT: risk stratifica-tion by the amount of stress-induced ischemia andthe poststress ejection fraction. J Nucl Med 2001;42:831–7.
8. Motwani M, Dey D, Berman DS, et al. Machinelearning for prediction of all-cause mortality inpatients with suspected coronary artery disease: a5-year multicentre prospective registry analysis.Eur Heart J 2017;38:500–7.
9. Arsanjani R, Dey D, Khachatryan T, et al. Pre-diction of revascularization after myocardial
perfusion SPECT by machine learning in a largepopulation. J Nucl Cardiol 2015;22:877–84.
10. Betancur J, Rubeaux M, Fuchs T, et al. Auto-matic valve plane localization in myocardialperfusion SPECT/CT by machine learning:anatomical and clinical validation. J Nucl Med2017;58:961–7.
11. Gambhir SS, Berman DS, Ziffer J, et al. A novelhigh-sensitivity rapid-acquisition single-photoncardiac imaging camera. J Nucl Med 2009;50:635–43.
12. Sharir T, Slomka PJ, Hayes SW, et al. Multi-center trial of high-speed versus conventionalsingle-photon emission computed tomographyimaging: quantitative results of myocardialperfusion and left ventricular function. J Am CollCardiol 2010;55:1965–74.
13. Andersson M, Johansson L, Minarik D, Leide-Svegborn S, Mattsson S. Effective dose to adultpatients from 338 radiopharmaceuticals esti-mated using ICRP biokinetic data, ICRP/ICRUcomputational reference phantoms and ICRP2007 tissue weighting factors. EJNMMI Physics2014;1:9.
14. Nakazato R, Tamarappoo BK, Kang X, et al.Quantitative upright–supine high-speed SPECTmyocardial perfusion imaging for detection ofcoronary artery disease: correlation with invasivecoronary angiography. J Nucl Med 2010;51:1724–31.
15. Xu Y, Arsanjani R, Clond M, et al. Transientischemic dilation for coronary artery disease inquantitative analysis of same-day sestamibimyocardial perfusion SPECT. J Nucl Cardiol 2012;19:465–73.
16. Nakazato R, Berman DS, Hayes SW, et al.Myocardial perfusion imaging with a solid-statecamera: simulation of a very low dose imagingprotocol. J Nucl Med 2013;54:373–9.
17. Thygesen K, Alpert JS, White HD. Universaldefinition of myocardial infarction. Circulation2007;116:2634–53.
18. Hall M, Frank E, Holmes G, Pfahringer B,Reutemann P, Witten IH. The WEKA data miningsoftware: an update. SIGKDD Explor Newsl 2009;11:10–8.
19. Friedman J, Hastie T, Tibshirani R. Additivelogistic regression: a statistical view of boosting(with discussion and a rejoinder by the authors).Ann Statist 2000;28:337–407.
20. Kanamori T, Takenouchi T, Eguchi S, Murata N.Robust loss functions for boosting. Neural Comput2007;19:2183–244.
21. DeLong ER, DeLong DM, Clarke-Pearson DL.Comparing the areas under two or more correlatedreceiver operating characteristic curves: anonparametric approach. Biometrics 1988;44:837–45.
22. Brier GW. Verification of forecast expressed interms of probability. Monthly Weather Rev 1950;78:1–3.
23. Arsanjani R, Xu Y, Dey D, et al. Improved ac-curacy of myocardial perfusion SPECT for detec-tion of coronary artery disease by machinelearning in a large population. J Nucl Cardiol 2013;20:553–62.
24. Tragardh E, Hesse B, Knuuti J, et al. Reportingnuclear cardiology: a joint position paper by theEuropean Association of Nuclear Medicine (EANM)and the European Association of CardiovascularImaging (EACVI). Eur Heart J Cardiovasc Imaging2015;16:272–9.
25. Tilkemeier PL, Mahmarian JJ, Wolinsky DG,Denton EA. ImageGuide� Update. J Nucl Cardiol2015;22:994–7.
26. HenzlovaMJ, Duvall WL, Einstein AJ, TravinMI,Verberne HJ. ASNC imaging guidelines for SPECTnuclear cardiology procedures: stress, protocols,and tracers. J Nucl Cardiol 2016;23:606–39.
KEY WORDS machine learning, majoradverse cardiac events, SPECT myocardialimaging