DOI: 10.1161/CIRCGENETICS.113.000384
1
Identification of Genetic Markers for Treatment Success in Heart Failure
Patients: Insight from Cardiac Resynchronization Therapy
Running title: Schmitz et al.; Genetics in HF treatment success
Boris Schmitz, PhD1,2; Renata DeMaria, MD3; Dimitris Gatsios, BSc4; Theodora
Chrysanthakopoulou, BSc, MSc5; Maurizio Landolina, MD6; Maurizio Gasparini, MD7;
Jonica Campolo, MSc3; Marina Parolini, BStat3; Antonio Sanzo, MD6; Paola Galimberti, MD7;
Michele Bianchi, MD8; Malte Lenders, PhD2; Eva Brand, MD, PhD2; Oberdan Parodi, MD3;
Maurizio Lunati, MD8; Stefan-Martin Brand, MD, PhD1
1Institute of Sports Medicine, Molecular Genetics of Cardiovascular Disease, 2Internal Medicine D, Department of Nephrology, Hypertension and Rheumatology, University Hospital Münster, Münster, Germany; 3CNR Institute of Clinical Physiology, Cardiothoracic and Vascular Department, Niguarda
Ca’ Granda Hospital, Milan, Italy; 4University of Ioannina, Ioannina University Campus; 5Neuron Energy Solutions G.P., Science & Technology Park of Epirus, Ioannina, Greece; 6Department of
Cardiology, Fondazione IRCCS Policlinico San Matteo, Pavia; 7Department of Cardiology, Humanitas Research Hospital IRCCS, Rozzano-Milan; 8Cardiothoracic and Vascular Department,
Niguarda Ca’ Granda Hospital, Milan, Italy
Correspondence:
Dr. rer. nat. Boris Schmitz
University Hospital Münster
Institute of Sports Medicine
Molecular Genetics of Cardiovascular Disease
Horstmarer Landweg 39
48149 Münster, Germany
Tel: +49/251/83-52996
Fax: +49/251/83-35387
E-mail: [email protected]
Journal Subject codes: [11] Other heart failure, [33] Other diagnostic testing, [27] Other treatment
; ,
D,DD PPPPhDhDhDhD1111
o 2 it of Nephrology, Hypertension and Rheumatology, University Hospital Münster, MCNR Institute of Clinical Physiology, Cardiothoracic and Vascular Department, Nid e
no ys m
of Spopoportrttrtss MeMeeMediciciciinnne, Molecular Genetics of fff CaCC rdiovascular Disisiseeease, 2Internal Medicit ooof NNNeN phrolololologygyygy, ,, HyyyHypepeppertrtrttenee sisisionononon andndndn RRRRheheheumuu aatologggy,y,y,y, UUUniiiivevevev rsrsrsity y y Hospspspspititiitalalall Mününününstststerererr,,,, MCNCNCNC R Institute offof Cllinnnicaaaall l l PPhysysysiiollloggy, CCarddioothhororacacacacicc andnnd Vaasccculalalarrr Deeepapapap rtmemm nttt, NNNidaaaa HHHHosoo pital,,,, MMiilaaan, Itttaly;;; 4444UnUU ivvvere sssitty ooof f Ioananninanan , Iooaannnninna UUnnniveersrssity CaCaCaCampmpmpus; 5NNeolututttioioioi nsnsnsns GGG P.P., SSScicicic enenencececece & TTTTeccecechhnhh ollllogogogogyyyy PaPaPaP rkrkrkrk ooof fff EpEpEpE iriririrususus, IIoIoIoannanannininin nananaa, GrGrGrreeeeeeeecececee; 6DeDeDeD pappapartrtrtrtmememen
ology, Fondaziooonenenen IIIRCRCRCRCCSCSCSC PoPoPoP lilililiclclclinininnicicico ooo SaSSaS n n n MaMaMaatttttttteoee , PaPaPaP vivivivia;a;aa 7DeDeDeD papapap rtrtrtmememementnnn of Cardiologys Research Hos ipipipitattal lll IRIRIRCCCCCCC S,SS RRRRozozozzazazanonono-MiMMiM lallan;n;n; 88CCCardrddrdioioi ththththororacacacicici aaandndnd VVVVaascular Departm
NiNiNiNiguggg ararrardadadad CCCCa’a’a’a GGGGrararrandndndnda aa a HoHoHoHospspspititititalalal,,, MiMiMiM lalalaan,nn,n, IItatatatalylylyl
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
2
Abstract:
Background – Cardiac resynchronization therapy (CRT) can improve ventricular size, shape and
mass and reduce mitral regurgitation by reverse remodelling of the failing ventricle. About 30%
of patients do not respond to this therapy for unknown reasons. In this study, we aimed at the
identification and classification of CRT responder by the use of genetic variants and clinical
parameters.
Methods and Results – Out of 1,421 CRT patients, 207 subjects were consecutively selected and
CRT responder and non-responder were matched for their baseline parameters before CRT.
Treatment success of CRT was defined as a decrease in left ventricular end systolic volume
(LVESV) >15% at follow-up echocardiography compared to LVESV at baseline. All other
changes classified the patient as CRT non-responder. A genetic association study was performed,
which identified 4 genetic variants to be associated with the CRT responder phenotype at the
allelic (p<0.035) and genotypic (p<0.031) level: rs3766031 (ATPIB1), rs5443 (GNB3), rs5522
(NR3C2) and rs7325635 (TNFSF11). Machine learning algorithms were used for the
classification of CRT patients into responder and non-responder status, including combinations
of the identified genetic variants and clinical parameters.
Conclusions - We demonstrated that rule induction algorithms can successfully be applied for
the classification of heart failure patients in CRT responder and non-responder status using
clinical and genetic parameters. Our analysis included information on alleles and genotypes of 4
genetic loci, rs3766031 (ATPIB1), rs5443 (GNB3), rs5522 (NR3C2) and rs7325635 (TNFSF11),
pathophysiologically associated with remodelling of the failing ventricle.
Key words: heart failure, cardiovascular disease, risk factor, resynchronization, reverse remodeling, data mining, machine learning
y
aaaatttt babababaseseseselilililinenenene. AlAlAlAlllll otototheheheherrrr
ation sssstutututudydydydy wwwwasasasas pepepeperfrr o
tified 4 genetic variants to be associated with the CRT responder phenotype at th
0 5
n
on of CRT patients into responder and non-responder status, including combinat
t
n We demonstrated that rule induction algorithms can successfully be applied
tifiedededed 4444 gggeneneneeeticicicc variants to be associatedddd wwwiti h the CRT resppooonder phenotype at th
0.00033353 ) and geennotyttypipipic c c (p(p(pp<0<0<0<0 0.003131313 ) lelll vvel:lll rrs376766000313131 ((((ATATATA PIPIP B1B1B11(((( ))),) rs554444444433 3 ((((GNGNGNGNB3B3B3B3))), rrrs5ss 5
nd dd rsss7377 25633335 (TTTNFSFSSF11111).).).). Maca hhhih nne llllearnninng alalalgooriithmsmm weeerrre uusesesed fofofoforr r thhheee
on of CRT patienenentsts intto o o rerer spspsppondededer anaa d dd nononon-n-resps ononondedeer ststatttususu ,, , inii clcc uduu ing combinat
tified gegg netic va iriiiants a ddnd clllil nicall l papp rameters.
nnnss - WeWeWe dddememononsttstraratettedddd ththththatat rrululullee ininini dududu tctctioioionn alalalgogoriririththththmsms ccanan ssucuccecessssfufufuf llllllyy bebebe aapppplililiededed
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
3
Introduction
The concept of individually optimized therapy, often referred to as personalized medicine, is
rapidly advancing in the field of modern health care,1 in particular for common diseases.
Personalized medicine is expected to improve the treatment of cardiovascular disease (CVD),
including prognosis of treatment outcomes.2 As a novel integrative approach, personalized
medicine in treatment of CVD will have to collect and selectively evaluate a patient’s unique
clinical and anthropometric parameters as well as information on genetic predisposition. It is
well known that CVD is a highly heritable trait,3 with individual combinations of multiple
genetic variants accounting for different CVD phenotypes4 in combination with classic risk
factors. Classic risk factors alone explain a large proportion (>50%) of CVD risk, while an
estimated 15% to 20% of myocardial infarction (MI) patients have none of the traditional risk
factors.5,6 Increased knowledge of the molecular mechanisms involved as well as insight into the
additive and interactive effects of multiple genetic variants and environmental factors have been
postulated as the foundation for novel therapeutic strategies.7 Even at the current state of
knowledge, genetic information allows clinicians to stratify individuals who are at intermediate
risk by generation of clinically useful treatment recommendations if interpreted correctly.7
We have most recently developed a data mining approach including rule-based machine
learning algorithms for the classification of CVD patients and the extraction of potential risk
predictors including genetic variants.8 In the current study, we have applied this methodology on
top of a genetic association study to extract potential combinations of genetic variants and
clinical parameters as markers for treatment success in patients with chronic systolic heart failure
(HF) treated by cardiac resynchronization therapy (CRT).
CRT combines right atrial and ventricular pacing with pacing of the left ventricular (LV)
binations of multiplplplple eee
ation witititith hhh lllclas isic ririririssksksk
a
5% to 20% of myocardial infarction patients have none of the traditional r
ncreased knowledge of the molecular mechanisms involved as well as insight in
d interactive effects of multiple genetic variants and environmental factors have
as the fo ndation for no el therape tic strategies 7 E en at the c rrent state of
assicccc rrisisissk kkk fafafaaccctooorsrsrs alone explain a large prororopportion (>50%) oof f f f CVD risk, while an
555%%% to 20% off f mmmyococcardididiiaala infnfnfnfaarccctiion (MMI)) ppattieieientntntnts haaavee nononne ofofoff the trrraddditttionnnaal r
ncreasasasedededed kkknoowlwll dededgege oof thhheee mmmmolecucuculallalar rrr meme hchchhananisisissmsmmsms iiiinvvvollololveveddd aas wweleelell ll as iiinsnsigigi hththht iiin
d interactive efffffects of ff multll ipiipllel gggenetici variiiai nts and ddd enviiiironmental factors have
hth ffo dda iti ff lel thhe tiic trat iie 77 EE t hth t stat fof by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
4
free wall by a third lead, introduced through the coronary sinus in the great cardiac vein to
resynchronize contraction between and within ventricles. CRT has been shown to ameliorate
ventricular size, shape and pump function, and reduce mitral regurgitation by reverse
remodelling (RR) of dilated failing ventricles and to improve survival in patients with moderate
to severe HF and intraventricular conduction delay.9 However, it is estimated that over one third
of patients do not respond to this therapy.10
Many criteria to define a positive response to CRT have been used with a wide variability
between studies.11,12 Proposed measures include (1) primary clinical end points such as mortality
due to progressive pump dysfunction or CV events and cardiac transplantation; (2) secondary
clinical end points such as re-hospitalization for worsening HF, and (3) subjective or objective
changes in functional capacity expressed as improved New York Heart Association (NYHA)
class or the increase in the distance walked in 6 minutes, respectively, 3 to 6 months after CRT
implantation. Echocardiographic criteria include changes observed 3 to 6 months after the
procedure in left ventricular ejection fraction (LVEF) or left ventricular end-systolic (LVESV) or
end-diastolic (LVEDV) volume, using different cut-off values. RR has been shown to start early
after CRT, to peak between 6 and 12 months and to be sustained in the long term, up to 5 years,
with only little further improvement.13,14
Agreement between clinical and echocardiographic criteria has been shown to be modest
at best.11 In general, the rate of response using clinical criteria is higher compared to the rate of
response when remodelling markers are considered but clinical measures of response are poorly
correlated to long-term prognosis. Conversely, death from CV causes or progressive pump
failure, have been shown to be dependent on RR, and changes in LVESV are acknowledged as a
reliable surrogate end point. RR and CV mortality appear to correlate in the medium-term and
nd points such as momomom r
lantatititition; (2(2(2(2) ))) secoooondndndnda
d points such as re-hospitalization for worsening HF, and (3) subjective or object
functional capacity expressed as improved New York Heart Association YHA
C
o
n left entric lar ejection fraction (LVEF) or left entric lar end s stolic (LVES
d poioioiointntntsss sususuchchchch aasss re-hospitalization for worororseening HF, and (((3)3)3)) subjective or object
funununnctional capapaacityy eeexpppprererer ssedededed asss iimppprooveded Newewww YYYoorkkk HHeaarttt AAsAssssociatatata ioiii nnn (((N( YHYHYHYHA
increreeasasasaseeee iiin tttheheh ddddisisi ttatancn e wawawaalklklklked iiiinnn 6666 mmminunuttetes, rrreeeespepepectivivivivelelelely, 3333 to 6666 mmmonttthshsh aaftftffterer C
on. Echocardiogrgg appphihihihic critii eriiia inclull deddd chahhh nggges obsbbb erved ddd 3333 to 6666 months after the
n ll feft nt iri lla jej iti ff ctiio (L(LVEVEF)F) llefft triic ll dnd st loliic ((LVLVEES by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
5
the relationship is sustained up to 5 years.15-17
Predictors of CRT success have been extensively investigated and include female gender,
non-ischemic etiology of HF, symptom severity, myocardial scar burden, QRS morphology and
duration and technical factors such as LV lead placement or proportion of time paced.10,18-21
Whether genetic variants associated with CVD may be differentially associated with CRT
success has been hitherto poorly investigated.
Our approach aimed at more specifically classifying CRT responders by inclusion of
predictive genetic markers within a study of CRT patients recently published by our group.22
Methods
Study design and patient selection
The CRT study for identification of predictive genetic markers was a retrospective multicenter
case-control study conducted at 3 Italian centers.22 The study was approved by the institutional
ethics committees of the participating centers and patients expressed their written informed
consent to participate. The study included HF patients who had undergone CRT to correct
mechanical dyssynchrony represented by a sequence abnormality in atrio-ventricular, or inter- or
intra-ventricular contraction according to guideline indications: any etiology of HF, NYHA
classification II - IV, a QRS duration on surface electrocardiogram
and LV end-diastolic diameter >55 mm.23 Assessment of scar burden was performed prior to
patient selection for the procedure; patients with extensive scar burden were excluded from CRT.
Further study entry criteria were stable positioning of the left lead at the lateral or postero-lateral
wall level and proportion of time paced >97%. In patients with atrial fibrillation, atrioventricular
(AV) node ablation was performed to achieve this percent pacing target and AV delay was
optimized under echocardiographic guidance immediately post implant.
blished by y our grououououp.ppp
g
tudy for identification of predictive genetic markers was a retrospective multicen
o o
mittees of the participating centers and patients expressed their written informed
gnnn aand patieeennt sellleleccctititiononon
tuddddy yy y fofofoorrr idididenenene tititifiiiicacacac ttit onononon offf f prprprredededediictitititiveveveve ggggeneneneteteteticcc mmmmarararrkekekekersrsrss wwwasasasas aaaa retetetetroorospspspspecececctitititiveeee mmmmulullultttit cececen
ol study conduccccteteteted dd atatata 333 IIIItatatat lililiianananan ccccenenene teteteersrsrsrs..22222 TTThehehee sstudydydydy wwwwasasaa aaaappppppp rorororoveveved ddd bbbyb the institutio
mimmitttteeeess ofof tthehhe pparartiticiicipapatititingng ccenentettersrs aandnd ppatatieientntss exexprpresessesed dd ththheiieirr wrwritittettenn ininfoformrmeded
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
6
Out of 1,421 patients (18% deceased), implanted with CRT since 2002, the study enrolled
207 consenting subjects who had undergone the procedure since at least 6 - 12 months, had a
valid echocardiographic study to define the remodelling status at 6 to 12 months (median 9
months), and were consecutively reviewed in the electrophysiology outpatient clinic for routine
follow-up between March and December 2009 (figure 1).
Definition of treatment success
CRT treatment success, designated as reverse remodelling (RR+), was defined as a significant
decrease in LVESV >15% (i.e. a reduction in LV size) at follow-up compared to LVESV at
baseline determined by echocardiography. All other changes classified the patient as CRT non-
responder (RR-). For each RR+ patient, a RR- subject was enrolled matched by gender, age,
NYHA functional class, HF etiology and baseline LVEF.
Echocardiography
LVESV was measured by transthoracic echocardiography examinations at rest using
conventional methods with commercially available ultrasound devices (Sonos 7500 and IE33,
Philips Medical Systems, Andover, USA; Sequoia C256 Acuson, Siemens, Mountain View,
USA; Famiglia Mylab25, Esaote, Genoa, Italy; Vivid System 7, GE/Vingmed, Milwaukee, USA)
equipped with a 2.5 - 3.5 MHz-phased-array sector scan probe. Parameters were obtained by 2-
and 4-chamber view using the biplane discs' summation method (Simpson's rule).24
Genotyping
Patients’ blood was sampled during a follow-up outpatient visit. Genomic DNA was extracted at
the University Hospital of Münster. Genotyping was performed, blinded to patients’ remodelling
status, using TaqMan SNP genotyping assays on the real-time PCR System ABI7900 (Life
Technologies Corporation, Carlsbad, USA) in a 384 well format. For detailed PCR conditions
ompared to LVESV V V V a
d the pppattttieiii tttnt as CRCRCRCRTTTT n
e
c
o
a
al methods ith commerciall a ailable ltraso nd de ices (Sonos 7500 and IE
RRRR--).).).). FFFFororor eeeacch h h h RR+ patient, a RR- subjjjjeecectt was enrolled mmaaatched by gender, age
ctttiooonal class, HHHFFF etioooologygygyg andndndnd basaaselinnnee LVVEFFF.
ograppphyhyhyhy
as measured by yy transthhoh raciiiic echohhh cardddioii grgg appphyhhyh examiiinatiiiions at rest usinggg
all ethhodds iithh ici lalll ilil bablle llt dd dde ii (S(S 77505000 dd IEIE by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
7
see supplemental information. Replicate samples and samples without template were used as
controls. Genotyping call rates were >95%. Hardy-Weinberg equilibrium was tested by
calculating the expected genotype frequencies from the allele frequencies. Deviation from the
observed genotype frequencies was determined by chi-square test. Genotype distributions of the
6 analyzed genes were compatible with Hardy-Weinberg equilibrium, except for rs5723 within
SCNN1G.
Selection of genes and genetic variants
With respect to the selection of appropriate genetic variants, we conducted a literature search
including different combinations of the terms “genetic variant”, “single nucleotide
polymorphism”, “cardiovascular disease” and “vascular remodel(l)ing”
(http://www.ncbi.nlm.nih.gov/pubmed; last date of access 28.02.2010). The main focus of the
search was on variants for which functional data was available. The results were evaluated for
appropriate and reasonable quality of the report and reproducibility. Due to the smaller sample
size of our study group, genetic variants with a reported minor allele frequency <10% in
Caucasian population have not been included. Data on gene regulation from our own lab has also
been taken into account.25 The final set of genetic variants tested included the common GNB3
(guanine nucleotide-binding beta polypepti
26 enhanced activity of
atrial inward rectifier potassium currents27 and increased response to vasoactive hormones.28
ATP1B1 encodes the Na+/K+- -subunit, an oligomeric membrane-bound protein
essential for maintenance of the myocardial resting membrane potential.29 Total Na+/K+-ATPase
concentration has been reported to be decreased by 40% in endomyocardial biopsies from
patients with compromised cardiac function.30 The ATP1B1 locus has repeatedly been associated
ucted a literature seaeaeaearc
e nuclllel ttottidididide
i
w.ncbi.nlm.nih.gov ubmed; last date of access 28.02.201 . The main focus of t
on variants for which functional data was available. The results were evaluated
e and reasonable quality of the report and reproducibility. Due to the smaller sam
st d gro p genetic ariants ith a reported minor allele freq enc <10% in
ismmm”,””, ““““cacacardrdrddioiii vavavascular disease” and “vassscucuculal r remodel(((l)ing”g”g””
w....nccccbi.nlm.nihhh.gggovv/ppppubbmemmm d; lllaasttt ddateee oof acaccessss 2228.8.8.02.220100)0 ... ThThThhe maaaaininnin fffoocusss oof t
on vavaaririririananana tts ffforor whihihi hhchch funnctctcttioioioionallll dadadatatatata wasas aavavavaililillabababablelele. ThThThhe reresusults wewewwere eevavalulul atatt deded
and reasonable qqqualililil tytyty off f hhthhe repopp rt a ddndd reppprodu iciibibibibilililitytyty. DDue to the smaller sam
st dd etiic iiant ii hth tedd imi lalllelle ff <1010%% iin by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
8
with CVD.31 TNFSF11 encodes the osteoprotegerin ligand (OPGL; receptor activator of nuclear
-B ligand, RANKL). Enhanced myocardial expression of the OPG/RANKL/RANK axis
has been reported to contribute to LV remodelling32 while circulating OPG levels have been
suggested as independent predictors for CV mortality.33,34 The analysis also included NR3C2
rs5522, which has been shown previously to be associated with successful CRT.23 In addition,
genetic variants of the epithelial sodium channel (ENaC) alpha/gamma (SCNN1A [rs3759324],
SCNN1G [rs5723]) have been tested since ENaC has been suggested as a mediator of aldosterone
in the vascular endothelium.35
Statistical analysis
Variables are presented as number (frequency percent) or median [interquartile range]. Chi-
square test for categorical variables and Student’s t-test or Mann-Whitney test for continuous
variables were used to compare the baseline characteristics of both groups. Relative allele and
genotype frequencies were compared by chi-square test (Fisher’s exact test, where appropriate).
Recessive/dominant associations were tested by comparing allele and genotype frequencies
between RR- and RR+ groups using contingency table and chi-square or Fisher’s exact test.
Given the group sample sizes of 80 RR+ and 76 RR- patients, the power to detect differences in
allele frequencies of 0.16 for an allele of 34% frequency exceeded 80%. P-values <0.05 were
considered statistically significant. To correct for multiple comparison, we used the Benjamini
and Yekutieli36
using the formula p = a/ (1/i), where a = 0.05, i ranges from 1 to N and N represents the number
of comparisons including clinical and genetic variables (N=20). The associations between RR+
and genetic variants were assessed by multivariable logistic regression, after adjustment for
clinically-relevant potential confounders. The incremental predictive performance for RR+ of the
are presented as number (frequency percent) or median [interquartile range]. Chi
were used to compare the baseline characteristics of both groups. Relative allele a
r i
dominant associations ere tested b comparing allele and genot pe freq encies
are prprpresesesenenenteteted ddd asasas number (frequency percececentnnt) or median [inttterererquartile range]. Chi
foor categoricall l vvvarriabababablesss s and ddd SStS uduudenttt’ss t-tetetest oorr MaMaMM nnnn-WWWhiiitnnneyyy tttest fofofofor cococontininini uuo
were useseseed ddd ttto ccomompapaparere ttthe bbbassasaseline chhchchararara acctetett iririists icicicssss ofofof bbbbototothhhh grgrououppps. ReRReRellllativeve aalllllll lelele e a
reqqquencies were compapp redd d bybbyb chihihih -sqqquare test (F((Fishhehh r’’’’s exact test,,, where apppprpp opppr
dodd imi nt iia iti test ded bb iin lalllelle dd t ff ici by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
9
predicted probability risk was determined by C statistic for 1) clinical variables and 2) the
combination of clinical variables and genetic variants. The areas under the Receiver Operating
Characteristic (ROC) curve (AUC) with their 95% confidence interval were determined and
compared by the method of DeLong et al.37 The Statistical Package for the Social Sciences
(SPSS) v 17 was used.
Data mining and machine learning
Patients were grouped in the two categories RR+ and RR-. After data adjustment for
simultaneous analysis of heterogeneous datasets, 5 independent classifiers (supplemental table 1)
including either patients’ clinical (n = 207; RR+ = 107, RR- = 100; supplemental table 2) or
genetic information (n = 156) or a combination of both (n = 156) were subjected to a multitude
of 15 machine learning algorithms (supplemental methods; supplemental table 3). For each
classifier, we used the 10-fold cross-validation approach to evaluate the general accuracy of the
algorithm. Data were randomly partitioned into ten separate sets and each algorithm was
provided with nine of the sets as training data, while the remaining set was used as test cases.
The process was repeated ten times using the different possible test sets. The resulting accuracies
were averaged. For the Decision Table and Voting Feature Intervals algorithm the Leave One
Out cross-validation method was used. For this method, the dataset containing N observations is
split into two subsets. One containing N-1 observations, which is used as the training set and one
containing 1 observation which is used for validation. The process is repeated in all possible
ways until all observations have been used for validation. Random Forest, C4.5, PART, Decision
Table, Bayes Network and Multilayer Perceptron that proved to be the most reliable (i.e. not
overtrained) and accurate algorithms after the initial testing, were further analyzed; they were
applied several times, with different values for the parameters to identify the most efficient
fiers (s( upplp ementaaaall l l ttat
pplementtatt lll l ttttablblblble 2222)))) o
ormation (n = 156) or a combination of both (n = 156) were subjected to a multit
h
w o
D
ith nine of the sets as training data hile the remaining set as sed as test case
ormmmatatatioioioonn n (n(n(nn = 111156555 ) or a combination of bobobobotht (n = 156) werre e e subjected to a multit
inne learning alglglgoritthhmhmh s (s(s(s( upplplplp emmmeentaaall meetthoddds;s;s;s; ssssuuppppleemenntttal taaable 3)3)3)3). FoFoFor eaeeach
we usesesedddd thththt e 1000 fff-fololld dd crcross-vavavav llililidatitititiononon aaaapppproroacach hh toototo eeevavavaluuuattatate thththee genenenenerararalll acccucuraracycy o
Data were rando lmllly yy papp rtiiti ioiii ned ddd iini to ten seppparate sets andddd each hh llallgogg rithm was
itii hh ini ff hth ts tr iai ini ddat hihille thhe iai ini et dd t t by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
10
configuration in terms of specificity, sensitivity and accuracy for the detection of RR+ and RR-
individuals (supplemental tables 4 - 8).
PART is a blend of C4.538 and RIPPER39. Both methods adopt a two-stage approach: a
set of rules is produced and subsequently refined by omission (C4.5) or adjustment (RIPPER).
As C4.5, PART generates rules from decision trees and utilizes the ‘divide and conquer’ rule
learning method as RIPPER, while inferring rules by repetitive generation of partial decision
trees. Initially, a rule is produced, then the covered instances are removed and PART continuous
building rules recursively for the residual instances until none is left. As the name suggests,
PART generates partial decision trees with branches to undefined sub-trees instead of fully
explored trees, integrating building and pruning stages to identify a stable sub-tree that cannot be
further cut down. When this sub-tree has been created, tree building ceases and a single rule is
produced. For missing values, PART adopts the approach of C4.5: in case an instance cannot be
assigned deterministically to a branch because of a missing attribute value, it is assigned to each
of the branches with a weight proportional to the number of training instances going down that
branch, normalized by the total number of training instances with known values at the node.
Results
Study population
The characteristics of the HF patient study population before CRT is shown in table 1. All
patients suffered from severe pump dysfunction and advanced symptoms. The RR status with
CRT at follow-up was determined after a median of 9 [7-12] months. No significant differences
existed in the clinical variables used for matching (age, atrial fibrillation, NYHA class, LVEF
and LVESV) between the patients analyzed and those not included in the study (figure 1). RR-
and RR+ group included 76 and 80 patients, respectively. Consistent with clinical matching,
As the name sugggeseseseststtt
-trees iniii tttsteaddd d ffof ffffulululullylylyly
e n
d
For missing values, PART adopts the approach of C4.5: in case an instance cann
e
ches ith a eight proportional to the n mber of training instances going do n
ees,,, ininini tettetegrgrgratatatinnng g g building and pruning staaaggges to identify y a stttabababable sub-tree that cann
dodoowwwnw . When tthiiis ssubbb-b trrrreeeeeee hasaas beeeeen cccreeateedd, treree e e e bububuildddinng cceaaseeesss anddd d aaa sininngleee rrul
For mimimissssssssininini g vavallulueses, PAPPAPARTTT aaaaddddopts thhththeeee apapprprproaoachchch ooooff f C4C4C4C4.555::: iiinin ccasase anananan iiinstaancnce e cacannn
eterministicallyy to a brbbb anchhh bbbbecause off f a miiissiini g gg attribibibibute valulll e,,, iiiit is assiggned to
hhe ii hth iei hght iti lal t hth bmb ff tr iai ini ii ta ioi dd by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
11
baseline parameters and medication were similar between groups, except for a slightly higher
prevalence of type 2 diabetes mellitus (p=0.057). Significant differences, resulting from the
defined remodelling phenotypes, were found between RR+ and RR- subjects for volume
(p<0.001) and function (p<0.001) changes (figure 2). In RR+ patients, LVEDV decreased by 22
ml [-37 to -16 ml] and LVEF improved by 11% [6 to 16%] to a clinically relevant extent,
whereas changes in LV volume ( LVEDV 2 ml [-4 to +10 ml]) and LVEF ( LVEF 2.5% [-2 to
+5%]) were slight in RR- patients.
Genetic association study
Information on genetic variants was available for 156 CRT study participants. Out of 6
previously established genetic variants that had been associated with CVD phenotypes, 4 were
associated with the RR+ phenotype (table 2) at the allelic (p<0.035) and genotypic (p<0.031)
level: rs3766031 (ATPIB1), rs5443 (GNB3), rs5522 (NR3C2) and rs7325635 (TNFSF11).
Identified associations remained significant after correction for multiple testing by the Benjamini
and Yekutieli false discovery rate method36 ATPIB1), rs5443 (GNB3)
and rs5522 (NR3C2).
By multivariable logistic regression analysis after adjustment for age, gender, LVEF,
atrial fibrillation, NYHA class, type 2 diabetes mellitus, baseline LVEDV and etiology of HF,
GNB3, ATP1B1 and NR3C2 remained independently associated with RR+ (table 3), whereas
TNFSF11 was of borderline significance (p=0.051). Minor allele carriage appeared to be
significantly associated with CRT success for both GNB3 rs5443 (OR 3.155 [95% CI 1.434 –
6.941], p=0.004) and ATP1B1 rs3766031 (OR 2.853 [95% CI 1.149 – 7.084], p=0.024). By
contrast, minor allele carriers of NR3C2 rs5522 showed a lower chance of RR+ (OR 0.320 [95%
CI 0.120 – 0.851], p=0.022) than major allele carriers. Female gender (OR 3.855 [95% CI 1.010
cipants. OuOO t ofof 6
established genetic variants that had been associated with CVD phenotypes, 4 w
with the RR+ phenotype (table 2) at the allelic (p<0.035) and genotypic (p<0.03
6
associations remained significant after correction for multiple testing by the Benj
li f l di h d36 ATPIB1) 5443 (GN
estttabababablililishshshshedededd genennete ic variants that had beeeen nn aassociated wwith CVCCC D phenotypes, 4 w
wwwwithhh h the RR+ phphhennoottypeee e (tabbblle 222) at tthhe aalllelic ((((p<p<p<p<00.030035) aannnd gggeeenotypypypy icc ((pp<00.0.03
6603111 ((((ATATATA PIB1B1B1(((( ),),), r 55s54444 3 (((GNGNGNG B3), rs5555555 2222 (((NRNRNR3C3C3CC2(((( ) ) )) anddd rs737373252525633335555 (((TNFSFSFSF1F1F1111NNNN ))).
assssssococociaiatitionononss s rereremamamainineded sssigigninifificacacantntnt aaaftftererer cccorororrererectctctioionn n foforr r mumumultltipiplele tttesesestitingngng bbyy y ththeee BeBenjn
li f l di h d3636 ATPIB1) 5443 (GN by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
12
– 14.721], p=0.048), type 2 diabetes mellitus (OR 0.227 [95% CI 0.078 – 0.660], p<0.006) and
valvular heart disease (OR 0.109 [95% CI 0.018 – 0.675], p<0.017) were also independently
associated with the RR phenotype. The concordance index, a measure of model fit, was 76.6%.
The C statistic (figure 3) documented the incremental predictive value of the model combining
clinical and genetic information (AUC 0.794 [95% CI 0.720 – 0.855]) vs. the clinical model
(AUC 0.678 [95% CI 0.597 – 0.751]), p=0.002.
Data mining and machine learning
When comparing specificity, sensitivity and accuracy of the different algorithms applied within
each classifier, we observed that some algorithms performed generally better than others (table
4). Approximation of 100% accuracy (based on the 10-fold cross-validation method) as detected
for K Nearest Neighbors, Non Nested Generalised Exemplars and Random Tree in some
classifiers indicated artificial overtraining of the applied method. Within the classifiers “Clinical
& Genotypes” and “Clinical & Alleles” the rule-based methods C4.5 and PART performed well,
exceeding 82.5% accuracy. Since rule-based methods produce lower complexity classification
results with higher transparency, which may be used to generate expert consensus in a modified
Delphi method40, we identified the PART algorithm41 as appropriate for the generation of
efficient and interpretable rules (table 5) in this series.
Discussion
In the current study, we demonstrated that machine learning algorithms can successfully be
applied for the classification of HF patients treated with CRT into responders and non-
responders using clinical and genetic parameters to model prediction of RR. Our analysis
included information on alleles and genotypes newly associated with the CRT responder
phenotype.
algorithms appliedddd wwww
y betttttter tttthahhh n ttotthehhh rsrsrsrs (((t(ta
i t
e
e d
82 5% acc rac Since r le based methods prod ce lo er comple it classificat
imatatatioioionnn ofofofof 11100%0%00% accuracy (based on theeee 1110-00 fold cross-validididdation method) y as det
esssst NNNeighbors,, NNNonn NNeNeN ststststeede GGGGeenereeralissseed EExxempmpplalalarsrsrr annndd Raandndndomomom Treeeee innn sssommmee
ndicaaateteteed ddd artifififi iicicialall oovevertrainininnininining ofofoff ttthehehehe appplplpliieied dd memememethththoood. WiWiWiWithththiiinin thehehehe ccclalalal sssifififiieiersrs “““CCCl
es” and “Clinic llall &&&& AAAlllllll lleles”””” thhhhe rule b-bbas deddd methohh ddsd CCC4.444 555 anddd d PAPAPAPART ppperformed
82882 55%% SiSi lle bb ded ethhodds dd ll lpl iit lcl isififi t by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
13
Markers and determinants of CRT response
Predicting whether a patient will benefit from CRT has long been an issue of interest and
surrogate end points of response at mid-term follow-up have been used repeatedly.12,13,18 The
correlation between primary clinical measures of response, such as cardiac death, and
symptomatic improvement has been observed to be poor, whereas RR after CRT strongly
correlates with clinical outcome.16-18 Consistently, as marker of CRT success, we used
echocardiographic RR after a median follow-up of 9 months, a time interval coincident with
peak changes in trials with repeated echocardiographic assessments.13
We selected a well-balanced data set of RR+ and RR- patients matched for known
clinical parameters that have been associated with a different incidence of RR after CRT such as
ischemic etiology of HF, lower LVEF, atrial fibrillation, shorter QRS duration and female
gender.18-21 As extensive myocardial scarring and procedural factors, including LV lead position
and percent pacing are important technical determinants of CRT success, 17,20,42,43 a limited scar
burden and technical success were prerequisites for enrolment in the study. Furthermore post-
implant AV delay optimization, which also impacts on response,44 was routinely performed.
However, we observed significant differences in patients’ outcome, potentially based on
unknown interactions of clinical parameters such as type 2 diabetes mellitus45 and undetected
genetic predispositions.
The data-mining approach
Machine learning algorithms have already been used to model the pathobiology of complex
CVD such as IHD,46 based on the combination of classic risk factors and genotype information.
This approach has mainly been used in large population data sets to identify subpopulations of
individuals at increased risk for the analyzed trait.46,47 A genetic profile in a disease model may
matchhhh ddedd ffffor kkkknownwnwnwn
a u
t
t
technical s ccess ere prereq isites for enrolment in the st d F rthermore po
amemeeteteterrrsss ththththatata hhhavavave been associated with aaa ddifferent incidencecece of RR after CRT su
tioolooogo y of HF, looowererr LVEVEVEVEF, aaattriaaal fibbbriillattioon, shshshooroo tter QQRSS dddurararatttit on andndndn ffffemalallle
As eeextxtxtx enenene sivee mmyoyoyocacardr ialll sccscscarringngng aandndndn pprorocecedududurrrral ll faffafactttorrorors,s iiincncludidididingngng LV VV llleleadaddd ppo
t papp cinggg are imppportant techhnh ic lalll ddddeter imiiinants of CCCCRTRTT success,,, 1717177 222,2000,42,43 a limited
t hhniic lal ii isite ffo llm t iin thhe t dd FF rthhe by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
14
be superior to a single measurement of risk factors if the included functional variants lead to a
life-time exposure to the affected condition.48 Following the assumption that CV risk factors
have diverse and interdependent effects in individuals with a plurality of unknown parameters
and variables, we applied 15 different machine learning algorithms to datasets of HF patients
treated with CRT to discriminate RR- from RR+ individuals and included combinations of
phenotypic risk factors and genetic information. We identified the PART algorithm as
appropriate for the generation of efficient and interpretable rules in this series.
Rule deduction and patient classification using PART
Using PART, rules of lower complexity with a maximum of ten variables were generated, which
could be applied to a sufficient number of patients with adequate accuracy and transparency. The
method of rule induction generates a set of “if(combined)-then” rules that can be used to
discover interesting patterns in the data set (knowledge extraction) or, as a classification rule, to
predict the outcome of subjects. PART generated rules with up to 100% accuracy using each of
the five classifiers “Clinical”, “Genotypes”, “Alleles”, “Clinical & Genotypes” and “Clinical &
Alleles”. Although these rules were generated for computational classification of CRT patients
and may be too complex for an individual straightforward analysis, an interpretation of some
patients correctly (93.75% accuracy) as RR+, which translates into the finding that younger
female patients respond well to CRT, consistently with common clinical observations. Lack of
type 2 diabetes mellitus was no classifier of high accuracy in our model even when combined
with other clinical parameters (<91.7% accuracy). In combination with the allele information on
rs5443, the rule [diabetes = No AND rs5443 = T AND LVEDV > 197] exceeded 96% accuracy,
pointing towards a protective role of the GNB3 rs5443 T allele in this setting. Female gender and
bles were generattttedededed, w
pplied to a sufficient number of patients with adequate accuracy and transparency
r
teresting patterns in the data set (knowledge extraction) or, as a classification rul
c
ssifiers “Clinical” “Genot pes” “Alleles” “Clinical & Genot pes” and “Clinic
ppliedededed ttttooo aa sususufffficicicicient number of patients wiwiww tht adequate accuurrracy and transparency
ruuleee e induction gggeneeraaatessss aa a set tt ofofof “if(cccoombbinned)d)d)-t-ttthheh nn” ruules thhhatt t cccan bebebebe uuseseed ttto
terestititingngngng pattetet rnrnss inini ttthehh dddatatataaaa set (k(k(k( nonononowlwlwlw ededddgege eextxttxtrararacttctctiiiion)n)n) oorr, aass a clclclclasasassififificacatititionon rrulu
outcome of subbbjejjej cts. PAPAPAARTRTRT gggenerateddd rullel s with uppp to 10101000%0%0%% accuracyy usinggg eacyy
ififiie ““ClCliiniic lal”” ““GGe t ”s” “A“Allll lel ”” ““ClCliiniic lal && GG ot ”” dd “C“Clili ini by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
15
the minor T allele of GNB3 rs5443 were also associated with CRT success by multivariable
logistic regression analysis.
Study limitations
The population studied in this investigation, although phenotypically well characterized, was
retrospectively enrolled, consequently timing of follow-up echocardiography to define RR was
not fixed but ranged from 6 to 12 months. Variability in timing of echocardiographic assessment
is widely accepted in clinical trials of CRT, where a range of 45 days around the scheduled
follow-up is generally used, and is probably unavoidable in “real world designs” such as in our
study. However, although RR is known to occur even later than the first year,13 longer follow-up
is also likely to include intercurrent events unrelated to pump failure that may halt or invert an
established favorable remodelling. Therefore, the median distance of 9 months observed in our
series represents an appropriate time point. The study was relatively small and potentially not
adequately powered to detect all genotype/phenotype interactions and not all genetic variants
potentially associated with RR status after CRT have been included in the analysis. All our
patients were Caucasians, so genetic findings might not be extendable to other races. As the
dataset was relatively small, the results obtained by multivariable logistic regression analysis
may be of limited accuracy. The current study should therefore be considered as a pilot study
that could be the basis for a larger and prospective study.
Although the sample was balanced across many clinical confounders, additional
parameters may be missing in the investigation. In particular, the groups were not matched for
QRS morphology, an important predictor of CRT response, alone and in conjunction with a QRS
17,18,49 However, less than 10% of our patients had neither LBB nor a QRS
The current models
d designs” such assss iiiin
rst year,13131313 llllongeggg r fofofofollllllllo
ly to include intercurrent events unrelated to pump failure that may halt or invert
n
e n
n
associated ith RR stat s after CRT ha e been incl ded in the anal sis All o r
ly to oo inininnclclclcludududdeee innntettetercurrent events unrelateeed d dd tto pump failure thththhat may halt or invert
ffaf vvvov rable remomoodellll iinining. TTTThererrefooree, thhhee meeddiann dddisisistaancccee of 9 mmmononnths oboobo ssserrrveeddd in
esenttts ss anananan apprropopopririiatattee titt me ppppoioioioint. ThThThThe tstst dududdy y wawawasss rerelalalattttivevevelylylyly ssmamall aaaandnndnd potttenentitt alallllyly n
pppowered to detect alllllll gegg notytytypepp /p//p/ hehh notytytypepp iiiinteractiions andd d not llallllll gegg netic varian
iciat ded iithh RRRR tat fafte CRCRTT hh bbe ii ll ddedd iin thhe ll isi AAllll by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
16
may present some features of so called model overtraining. This effect is mainly marked by
accuracy values approximating 100% and results from data overfitting. Testing sensitivity and
specificity in an additional and independent data set will be needed to prove broad practicability
of the model. The model might perform less accurate when used on a data set containing specific
records that were not included in the original data set.
Conclusion
Our data mining approach has identified combinations of different factors including genetic
variants with impact on HF treatment outcomes, pointing to so far unknown underlying
biological mechanisms. These findings underscore that an effective and efficient model for HF
has to be based on a multi-parameter model, including numerous known potential modifiers, to
meet the needs for the high complexity of the disease.
As any treatment of disease has certain risks and costs, there will always be treatment risk
thresholds.50 Current clinical decision-making in HF patients is based on well-established
conventional measures and treatment is recommended if the individual risk is acceptable, even if
treatment success is not fully predictable. Our study on CRT response in HF patients may help to
guide appropriate therapy and improve clinical outcomes, at least in otherwise uncertain cases
since it provides additional individual risk information.
Funding Sources: This study was supported by the European Union, FP7-ICT-2007-2, project
number 224635, “VPH2-Virtual Pathological Heart of the Virtual Physiological Human”. EB is
supported by a Heisenberg professorship from the Deutsche Forschungsgemeinschaft (Br1589/8-
2).
Conflict of Interest Disclosures: None
g g
knowwwwn n n n unununundededederlrlrlrlyiyiyiyingngngng
mechanisms. These findings underscore that an effective and efficient model for
ased on a multi arameter model, includin numerous known otential modifier
e
a e
5 g p
mechaanininismsmsms.ss TThehh se findings underscore thahat an effective andndnd efficient model for
asseddd d on a mululultitti---pararamamameeteterere mmmodododo eelee , inclccc uudinngg nunumemeerorousss kkknooowwwn ppotottotennntitititialalala mmmododododififififieiii r
eedsdsdss ffforororr tttthehehee hhhigigigh hhh cococompmpmpm leeexixixx tytytyy oooof thththhe ee didididiseseseasasasase.
any treatment ooooff ff dididiiseseses asasasse eee hahahaas sss cececerrrrtatatat inininn rrrrisisisksksksk aaandndndn cosssstststst , ththththererreeee wiwiwiillllll aaalwlwlwlwaaya s be treatme
550 CuCurrrrenentt clclliniinicici alal dddececisisioiionn-mmakkakiningg iniin HHHFF F papatitienentstts iiss babbasesedd onon wwellell-l-esestatablblisishehedd
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
17
References:
1. Chan IS, Ginsburg GS. Personalized medicine: progress and promise. Annu Rev Genomics Hum Genet. 2011;12:217-244.
2. Thanassoulis G, Vasan RS. Genetic cardiovascular risk prediction: will we get there? Circulation. 2010;122:2323-2334.
3. Marenberg ME, Risch N, Berkman LF, Floderus B, de Faire U. Genetic susceptibility to death from coronary heart disease in a study of twins. N Engl J Med. 1994;330:1041-1046.
4. Brand-Herrmann SM. Where do we go for atherothrombotic disease genetics? Stroke.2008,39:1070-1075.
5. Yusuf S, Hawken S, Ounpuu S, Dans T, Avezum A, Lanas F, et al. INTERHEART Study Investigators. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): casecontrol study. Lancet. 2004;364:937-952.
6. Khot UN, Khot MB, Bajzer CT, Sapp SK, Ohman EM, Brener SJ, et al. Prevalence of conventional risk factors in patients with coronary heart disease. JAMA. 2003;290:898-904.
7. Humphries SE, Drenos F, Ken-Dror G, Talmud PJ. Coronary heart disease risk prediction in the era of genome-wide association studies: current status and what the future holds. Circulation.2010;121:2235-2248.
8. Gatsios D, Garofalakis J, Chrysanthakopoulou T, Tripoliti E, De Maria R, Franzosi MG, et al. Knowledge extraction in a population suffering from heart failure. ITAB. 2010;1-6.
9. Holzmeister J, Leclercq C. Implantable cardioverter defibrillators and cardiac resynchronisation therapy. Lancet. 2011;378:722-730.
10. Birnie DH, Tang ASL. The problem of non-response to cardiac resynchronization therapy. Curr Opin Cardiol. 2006;21:20-26.
11. Fornwalt BK, Sprague WW, BeDell P, Suever JD, Gerritse B, Merlino JD, et al. Agreement is poor among current criteria used to define response to cardiac resynchronization therapy. Circulation. 2010;121:1985-1991.
12. St John Sutton MG, Plappert T, Abraham WT, Smith AL, DeLurgio DB, Leon AR, et al, Multicenter In-Sync Randomized Clinical Evaluation (MIRACLE) Study Group. Effect of cardiac resynchronization therapy on left ventricular size and function in chronic heart failure. Circulation. 2003;107:1985-1990.
13. Ghio S, Freemantle N, Scelsi L, Serio A, Magrini G, Pasotti M, et al. Long-term left ventricular reverse remodeling with cardiac resynchronization therapy: results from the CARE-HF trial. Eur J Heart Fail. 2009;11:480-488.
INTERHEART SSStutututudwiwiwiwithththth mmmmyoyoyoyocacacacardrdrdrdiaiaiaialll innnnfafafafarrcrcc
20040444 3;33364646464:999937373737 9-952525252.
Na 4
ies SE, Drenos F, Ken-Dror G, Talmud PJ. Coronary heart disease risk predictiol
2
D, Garofalakis J, Chr anthak oulou T, Tri liti E, De Maria R, Franzosi MG,e e traction in a pop lation s ffering from heart fail re ITAB 2010;1 6
N, KhKhKhKhotottot MMMMB,B,BB BBBajajajajzer CT, Sapp SK, Ohmananan EEEM, Brener SJ, etettet al. Prevalence of allll riisisi k factorooo s s s inininin ppatatatatieieentntntnts wiwiwiwithththth ccccorororo onnnnararara y hheaart didiidiseseseseasaa e.e.e.e. JAJAJAJ MAMAAMA. 2000003030303;2;2;22900:8:88:8989898-9-9-9-904
iesss SSSSE,E Dreeenonn ss FFF, KKeeen-DDDroror G,G, TTTaalmmumudd PJ.. CoCoCorronanaryy hheartt dddiseaeaase risisiskk k prprprediiicttioenommmeeee ww-wwiiide asassosocicii tatatioii n stststududududies: cccuuurrererer ntttt ssttatatutut sss anaananddd whwhwhatatatat ttttheheh ffffuturururureee hhhholdddss. CiCiCiC rcul
2235-2248.
DDD, Garofaalalaalakikkikis ss J,J,JJ, CCCChrhhrhrysysyssananananththhthakakakakopoppopououooulolololou uu T,T,TT, TTTriririripopoolilillitittiti EEEE, ,, DeeDeDe MMMMarararariaiaiia RRRR,,, FrFrFrrananaanzosi MG,tr iti ii lla iti ffffe iri ff hhe t ffailil ITITABAB 20201010 1;1 66 by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
18
14. Yu CM, Bleeker GB, Fung JW, Schalij MJ, Zhang Q, van der Wall EE, et al. Left ventricular reverse remodeling but not clinical improvement predicts long-term survival after cardiac resynchronization therapy. Circulation. 2005;112:1580-1586.
15. Ypenburg C, van Bommel RJ, Borleffs CJ, Bleeker GB, Boersma E, Schalij MJ, et al. Long-term prognosis after cardiac resynchronization therapy is related to the extent of left ventricular reverse remodeling at midterm follow-up. J Am Coll Cardiol. 2009;53:483-490.
16. Foley PW, Chalil S, Khadjooi K, Irwin N, Smith RE, Leyva F. Left ventricular reverse remodeling, long-term clinical outcome, and mode of death after cardiac resynchronization therapy. Eur J Heart Fail. 2011;13:43-51.
17. Yu CM, Hayes DL.Cardiac resynchronization therapy: state of the art 2013. Eur Heart J.2013;34:1396-403.
18. van Bommel RJ, Bax JJ, Abraham WT, Chung ES, Pires LA, Tavazzi L, et al. Characteristics of heart failure patients associated with good and poor response to cardiac resynchronization therapy: a PROSPECT (Predictors of Response to CRT) sub-analysis. Eur Heart J.2009;30:2470-2477.
19. Wikstrom G, Blomström-Lundqvist C, Andren B, Lönnerholm S, Blomström P, Freemantle N, et al. The effects of aetiology on outcome in patients treated with cardiac resynchronization therapy in the CARE-HF trial. Eur Heart J. 2009;30:782-788.
20. Adelstein EC, Tanaka H, Soman P, Miske G, Haberman SC, Saba SF, et al. Impact of scar burden by single-photon emission computed tomography myocardial perfusion imaging on patient outcomes following cardiac resynchronization therapy. Eur Heart J. 2011;32:93-103.
21. Linde C, Abraham WT, Gold MR, Daubert C; REVERSE Study Group. Cardiac resynchronization therapy in asymptomatic or mildly symptomatic heart failure patients in relation to etiology: results from the REVERSE (REsynchronization reVErses remodelling in Systolic Left vEntricular Dysfunction) study. J Am Coll Cardiol. 2010;56:1826-1831.
22. De Maria R, Landolina M, Gasparini M, Schmitz B, Campolo J, Parolini M, et al. Genetic variants of the renin-angiotensin-aldosterone system and reverse remodeling after cardiac resynchronization therapy. J Card Fail. 2012;18:762-768.
23. Dickstein K, Vardas PE, Auricchio A, Daubert JC, Linde C, McMurray J, et al. 2010 Focused update of ESC Guidelines on device therapy in heart failure. Eur Heart J.2010;31:2677-2687.
24. Rudski LG, Lai WW, Afilalo J, Hua L, Handschumacher MD, Chandrasekaran K, et al. Guidelines for the echocardiographic assessment of the right heart in adults: a report from the American Society of Echocardiography endorsed by the European Association of Echocardiography, a registered branch of the European Society of Cardiology, and the Canadian
azzi L,LLL et ttt allll. ChChChCharracacacactttediac rerereresysysysyncncncnchrhrhrhrononononizizizizatatatio
P4
o ae effects of aetiology on outcome in patients treated with cardiac resynchronizatt
e ssingle photon emission comp ted tomograph m ocardial perf sion imaging on
PROOOSPSPPSPECECECE T TTT (PPPrrredictors of Response to CCCRTRR ) sub-analysy isss.. EuEEE r Heart J.JJ470700-2-2-2-2477.
ommm GGGG, Blommmstrröööm-Luuundqdqdqviv st CC, AAnddrdreen BB, Lööönnnererhholmllm S, BBBlomommströöömm m PPP, Freeeeemae efffffefeffectctctctssss of aaetettiioiololol gygy on ouououutctctctcome ee ininin pppatatttieieii tntnttss trtrtreeeeatetetedddd wiwiwithththth ccarardiddd acccc rrreeesynnchchhhroroniniizaz tthe CARE-HF ttttrirrir alalal. EuEuEuEur rrr HeHeHeearararart t t JJJJ. .. 20202000909090 ;3;330:00 787878782-2-2-788888.8.8 JJJJJJ
eiiin EC, Taananananakakakak HHHH,, SoSoSSomamamaman nnn P,PP,P, MMMissisiskekkeke GGGG, , HaHaHaHabebeebermrmrmr anananan SSSC,CC,C, SSSababababa aa SFSFSFSF,, etetetet aaal.l.l. IIIImpmm act of sssii lle hhoto iis isi tedd to hh drdiiall frf iio iim iin by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
19
Society of Echocardiography. J Am Soc Echocardiogr. 2010;23:685-713.
25. Schmitz B, Nedele J, Guske K, Maase M, Lenders M, Schelleckes M, et al. Soluble Adenylyl Cyclase in Vascular Endothelium: Gene Expression Control of Epithelial Sodium Channel-Na+/K+-ATPase- Hypertension. 2014;[Epub ahead of print].
26. Siffert W, Rosskopf D, Siffert G, Busch S, Moritz A, Erbel R, et al. Association of a human G-protein beta3 subunit variant with hypertension. Nat Genet. 2005;18:45-48.
27. Dobrev D, Wettwer E, Himmel HM, Kortner A, Kuhlisch E, Schuler S, et al. G-Protein beta(3)-subunit 825T allele is associated with enhanced human atrial inward rectifier potassiumcurrents. Circulation. 2005;102:692-697.
28. Wenzel RR, Siffert W, Bruck H, Philipp T, Schäfers RF. Enhanced vasoconstriction to endothelin-1, angiotensin II and noradrenaline in carriers of the GNB3 825T allele in the skin microcirculation. Pharmacogenetics. 2002;12:489-495.
29. Smith JG, Avery CL, Evans DS, Nalls MA, Meng YA, Smith EN, et al. Impact of ancestry and common genetic variants on QT interval in African Americans. Circ Cardiovasc Genet.2012;5:647-655.
30. Schwinger RH, Bundgaard H, Müller-Ehmsen J, Kjeldsen K. The Na, K-ATPase in the failing human heart. Cardiovasc Res. 2003;57:913-920.
31. Newton-Cheh C, Eijgelsheim M, Rice KM, de Bakker PI, Yin X, Estrada K, et al. Common variants at ten loci influence QT interval duration in the QTGEN Study. Nat Genet. 2009;41:399-406.
32. Ueland T, Yndestad A, Øie E, Florholmen G, Halvorsen B, Frøland SS, et al. Dysregulated osteoprotegerin/RANK ligand/RANK axis in clinical and experimental heart failure. Circulation.2005;111:2461-2468.
33. Røysland R, Masson S, Omland T, Milani V, Bjerre M, Flyvbjerg A, et al. Prognostic value of osteoprotegerin in chronic heart failure: The GISSI-HF trial. Am Heart J. 2010;160:286-293.
34. Ueland T, Dahl CP, Kjekshus J, Hulthe J, Böhm M, Mach F, et al. Osteoprotegerin predicts progression of chronic heart failure: results from CORONA. Circ Heart Fail. 2011;4:145-152.
35. Kusche-Vihrog K, Callies C, Fels J, Oberleithner H. The epithelial sodium channel (ENaC): Mediator of the aldosterone response in the vascular endothelium? Steroids. 2010 ;75:544-549.
36. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001;29:1165-1188.
37. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more
d vasoconstrictionnnn ttttoo oo333 888825252525T T T T alalalallelelelelelelele iiiinnnn ththththe e ee skkkk
Jo t7
em
n mten loci infl ence QT inter al d ration in the QTGEN St d N t G t 2009;4
JG, AvAvAverereryy CLCCC , EvEEE ans DS, Nalls MA, Meeengnng YA, Smith EN,NN, etee al. Impact of anceonnnn ggggenetic vvvvarrriaiaaiantnnn ss ononoon QQQQT T ininnintetetet rvvvvalalal in n n n AfAA ricaan AmAmmAmerereericcccananana s.s.s. CiCiiCircrrr CCCCarararardidididiovasasasasc c c GeGeGeGenet-66655555 .
ger RHRHRHRH, BBBundnddgagagaararddd HHHH, MMMülülüllleleleler-EhEhEhEhmsmsmsmseeene JJJJ, KjKjKjKjeleleldsdsdsdsenenen KKKK. ThThThThee NNNaNa, K-KKK ATATAATPaPaP sese iiinn thththeman heart. Carddddioioioi vavavaascscscs RRRReseses.. 202020200303033;5;5;5;57:77:9191919 3-3-3-9292920.0.0.0.
n-Cheh C,, EEEEijijijijgegeegelslslslsheheeheimimiim MMMM, RiRiRiRicecece KKKKM,M,MM, ddde eee BaBBaBakkkkkkkkererer PPPPI,I,I, YYYYiniin XXXX, , EsEsEsEstrtrttradadadada a aa K,K,KK, eeet al. Commte llo ici ii fnfll QQTT iinte lal dd atiio iin thhe QQTGTGENEN SSt dd NN t GG t 20200909 4;4 by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
20
correlated receiver operating characteristic curves: a nonparametric approach. Biometrics.1988;44:837-845.
38. Quinlan RJ. C4.5: programs for machine learning. San Francisco, CA: Morgan Kaufmann; 1993.
39. Cohen W. Fast effective rule induction. In Morgan Kaufmann. 1995;115-123.
40. Murphy MK, Black NA, Lamping DL, McKee CM, Sanderson CF, Askham J, et al. Consensus development methods, and their use in clinical guideline development. Health Technol Assess. 1998;2:1-88.
41. Frank E, Witten IH. Generating Accurate Rule Sets Without Global Optimization. Machine Learning: Proceedings of the Fifteenth International Conference, Morgan Kaufmann Publishers, San Francisco. 1998;144-151.
42. Derval N, Steendijk P, Gula LJ, Deplagne A, Laborderie J, Sacher F, et al. Optimizing hemodynamics in heart failure patients by systematic screening of left ventricular pacing sites: the lateral left ventricular wall and the coronary sinus are rarely the best sites. J Am Coll Cardiol.2010;55:566-575.
43. Mullens W, Grimm RA, Verga T, Dresing T, Starling RC, Wilkoff BL, et al. Insights from a cardiac resynchronization optimization clinic as part of a heart failure disease management program. J Am Coll Cardiol 2009;53:765–773.
44. Bertini M, Delgado V, Bax JJ, Van de Veire NR. Why, how and when do we need to optimize the setting of cardiac resynchronization therapy? Europace. 2009;Suppl5:v46-57.
45. Höke U, Thijssen J, van Bommel RJ, van Erven L, van der Velde ET, Holman ER, et al. Influence of diabetes on left ventricular systolic and diastolic function and on long-term outcome after cardiac resynchronization therapy. Diabetes Care. 2013;36:985-991.
46. Stengård JH, Dyson G, Frikke-Schmidt R, Tybjærg-Hansen A, Nordestgaard BG, Sing CF. Context-dependent associations between variation in risk of ischemic heart disease and variation in the 5’ promoter region of the Apolipoprotein E gene in Danish women. Circ Cardiovasc Interv. 2010;3:22-30.
47. Austin PC, Tu JV, Ho JE, Levy D, Lee DS. Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. J Clin Epidemiol. 2013;66:398-407.
48. Kathiresan S, Melander O, Anevski D, Guiducci C, Burtt NP, Roos C, et al. Polymorphisms associated with cholesterol and risk of cardiovascular events. N Engl J Med. 2008;358:1240-1249.
49. Stavrakis S, Lazzara R, Thadani U. The benefit of cardiac resynchronization therapy and
F, et ttt allll. OOOOptitititimizizizizingngngng ventrrrriciciciculululularararar ppppacacacacinininingggg si
a6
s rynchronization optimization clinic as part of a heart failure disease management
he setting of cardiac res nchroni ation therap ? E 2009;S ppl5: 46 57
eft vvvenenentrtrtriciciculuuu arr wwwall and the coronary sinnnnuuus are rarely the bebebessst sites. J Am Coll Ca66-6-6-575757575.
s W,W,WW, Grimmmm RRAAA, VVeeergaaa TTT,, Drrese iining TTT, Staarrlinnng g RCC,, WiWWilkofff ff BLL, et aaal... Innsssighhhtss frynchrhrhhrononononizizizi atioonn opopoptititi imimizatititionnonon cliniiiiccc asasasas pararttt fofoff aa hhhheaeaearttrtrt faiaiailuululurere ddddisiii easessese mannagagememenentAm Coll Carddioioioioll 22220000000 9;9;9;;53553:7:7:776565656 –7–7–7–77373733....ll
MM, Delgaadodododo VVV,,, BaBaBaBax xxx JJJJJJ, , , VaVVaVan nn n dedede VVVVeieiieirererere NNNNR.RR.R. WWWWhyhyhy, , , hohhohow ww w anannand dd d whwhwhwheneneen ddddo o o wewewewe nnneed to he et iti ff drdiia hhr ii atiio hth ?? EE 20200909 S;S lpl55: 4466 5757 by guest on M
ay 17, 2018http://circgenetics.ahajournals.org/
Dow
nloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
21
QRS duration: a meta-analysis. J Cardiovasc Electrophysiol. 2012;23:163-168.
50. Hlatky MA, Greenland P, Arnett DK, Ballantyne CM, Criqui MH, Elkind MS, et al. Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation. 2009;119:2408-2416.
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
22
Table 1: Baseline characteristics of the study population
Values are expressed as n (frequency percent) or median [interquartile range]. P-values for categorical variables were calculated by chi-square or Fisher’s exact test, p-values for non-categorical variables were calculated by Student’s t- or Mann-Whitney test. IHD, ischaemic heart disease; IDC, idiopathic dilated cardiomyopathy; VALV, valvular defect; LVEF, left ventricular ejection fraction; LVEDV, left ventricular end diastolic volume; LVESV, left ventricular end systolic volume; MI, myocardial infarction; RAS, renin-angiotensin system; RR, reverse remodelling.
All(n=156)
RR+(n=80)
RR- (n=76)
P- value
Anthropometry
Gender (male) 136 (87%) 67 (84%) 69 (91%) 0.234
Age (years) 62 [56-70] 64 [57-71] 61 [56-70] 0.681
Type 2 diabetes mellitus 27 (17%) 9 (11%) 18 (24%) 0.057
History of hypertension 43 (27%) 21 (28%) 22 (31%) 0.717
Previous MI 63 (41%) 30 (39%) 33 (44%) 0.514
Atrial fibrillation 25 (16%) 11 (14%) 14 (18%) 0.514
Aetiology 0.157
IHD 79 (51%) 39 (49%) 40 (53%) -
IDC 66 (42%) 38 (47%) 28 (37%) -
VALV 11 (7%) 3 (4%) 8 (10%) -
Medications
Beta-blockers 126 (82%) 63 (82%) 63 (83%) 1.000
RAS inhibitors 149 (96%) 74 (94%) 75 (99%) 0.210
Aldosterone antagonists 97 (64%) 46 (61%) 51 (67%) 0.500
Echocardiography
NYHA class II (vs III-IV) 45 (29%) 22 (28%) 23 (30%) 0.727
LVEF (%) 27 [22-30] 27 [22-30] 27 [23-30] 0.665
LVEDV (ml) 227 [190-310] 230 [200-330] 227 [174-295] 0.253
LVESV (ml) 170 [135-231] 178 [140-240] 164 [121-222] 0.253
QRS duration (msec) 160 [140-180] 169 [150-188] 160 [140-180] 0.163
follow-up (month) 9 [7-12] 10 [7-12] 9 [7-12] 0.879
0.1515151577
404040 (((5353533%)%)%)%) ----
28 (37%)%)%)%) ---
11 (7%) 3 (4%) 8 (10%)
k
b
n
r
a
11 (7%) 33 (4%) 8 888 (10%) -
keeeerssss 1222266 6 (82%) 6363 (822%)%)%)%) 6663 (((838333%) 1...0000
bitooorsrsrsrs 144449 999 (9(9996%6%6%6%) 7474747 ((((9494949 %)%)%)%) 7575575 (9999999%)%)%)% 0.0.0.0.21110
ne antagonists 97979797 (((64646464%)%)%)% 464646 (((616161%)))) 51515151 ((676767%)%%% 0.500
raphy
aaass II (vs IIIIIII---IVIVIVIV))) 45454545 ((2929299%)%)%)% 22222222 ((((282828%)%)%)%) 23232323 (((3030300%)%)%)% 0.727
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
23
Table 2: Genotype and allele frequencies
Gene SNP Minor allele
RR+(n=80)
RR- (n=76)
P-valueallele
P-valuegenotype
SCNN1A rs3759324 C T/C 125/33 129/21 0.134 CT+CC 0.123
CC 2 (2%) 1 (1%) vs.
TT 48 (61%) 55 (73%) TT
CT 29 (37%) 19 (25%)
SCNN1G rs5723 G C/G 158/2 138/12 0.005 GG+CG 0.057
CC 79 (99%) 69 (92%) vs.
GG 1 (1%) 6 (8%) CC
CG 0 (0%) 0 (0%)
ATP1B1 rs3766031 T C/T 131/29 138/12 0.011 TT+CT 0.005
CC 52 (65%) 64 (85%) vs.
TT 1 (1%) 1 (1%) CC
CT 27 (34%) 10 (13%)
GNB3 rs5443 T C/T 93/67 110/38 0.004 TT+CT 0.006
CC 26 (33%) 41 (55%) vs.
TT 13 (16%) 5 (7%) CC
CT 41 (51%) 28 (38%)
TNFSF11 rs7325635 A G/A 108/52 83/67 0.035 AA+AG 0.031
GG 36 (45%) 21 (28%) vs.
AA 8 (10%) 13 (17%) GG
AG 36 (45%) 41 (55%)
NR3C2 rs5522 C T/C 150/10 126/24 0.006 CT+CC 0.014
TT 71 (89%) 54 (72%) vs.
CC 1 (1%) 3 (4%) TT
CT 8 (10%) 18 (24%)
Values are expressed as n (frequency percent). P-values for categorical variables were calculated by chi-square or Fisher’s exact test. SCNN1A, epithelial sodium channel alpha subunit; SCNN1G, epithelial sodium channel gamma subunit; ATP1B1, Sodium/potassium-transporting ATPase subunit beta-1; GNB3, guanine nucleotide binding protein (G protein), beta polypeptide 3; TNFSF11, tumor necrosis factor (ligand) superfamily, member 11 (RANKL). NR3C2, mineralocorticoid receptor. Underlined p-values marc associations which remained significant after correction for multiple testing (clinical and genetic variants comparisons combined) by the Benjamini and Yekutieli false discovery rate method (p ).
CC
0.011 TTTTTTTT+C+C+C+CTTTT
CCCCCCCC 52 ((((656566 %)% 64 (85%) vs.
TT 1 (11%%) 1111 (1( %%%%) CCCCCCCC
CT 227 (334%%) 100 ((13%)%%
rs54545443 TT C/T 9333/6/677 11100/388 00.004 TTTTT C+CTT
CC 26262626 (((333333%)%)%)% 41414141 ((55555555%)%%)% vs.
TT 1313313 ((((1616116%)%%% 5 5 5 5 (7(77(7%)%%% CC
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
24
Table 3: Multivariate logistic regression analysis of genotypes associated with RR+
P-value Odds ratio 95% confidence intervall
Age 0.950 0.999 0.957-1.042
Female gender 0.048 3.855 1.010-14.721
LVEF 0.402 1.037 0.952-1.130
LVEDV 0.109 1.005 0.999-1.011
Atrial fibrillation 0.781 1.182 0.365-3.832
NYHA class II vs. III-IV 0.698 1.176 0.518-2.673
Type 2 diabetes mellitus 0.006 0.227 0.078-0.660
Ischemic aetiology (reference) 0.043
0.395-2.259 Idiopathic dilated cardiomyopathy 0.898 0.945
Valvular heart disease 0.017 0.109 0.018-0.675
GNB3 (TT+CT vs. CC) 0.004 3.155 1.434-6.941
ATP1B1 (TT+CT vs. CC) 0.024 2.853 1.149-7.084
TNFSF11 (AA+AG vs. GG) 0.051 0.436 0.189-1.005
NR3C2 (CC+CT vs. TT) 0.022 0.320 0.120-0.851
LVEF, left ventricular ejection fraction; LVEDV, left ventricular end diastolic volume; ATP1B1, Sodium/potassium-transporting ATPase subunit beta-1; GNB3, guanine nucleotide binding protein (G protein), beta polypeptide 3; TNFSF11, tumor necrosis factor (ligand) superfamily, member 11 (RANKL); NR3C2, mineralocorticoid receptor.
0.365-3.832222
0.518-2.2 677773333
a
c
h
abeeeetetetess meellllllititititus 0.006 0.227 0.078-0.660
aeete iioioi logy (referrennnce) 00.04443
0.395-2.259c dilated cardiommmyoyyopapp thyyyy 0.898 0.0.0.0.9499 5
hhheaeaeartrtrt dddisisiseaeaeaseseses 000.010101777 000.101010999 000.010101888-000.676767555
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
25
Table 4: Specificity, sensitivity and accuracy* of the applied machine learning algorithms
*Accuracy results are based on the 10-fold cross-validation approach except for Decision Table and Voting Feature Intervals in which the “Leave One Out” method was used.
Dataset “Clinical” “Genotypes” “Alleles” “Clinical & Genotypes” “Clinical & Alleles”
Method specificity sensitivity accuracy specificity sensitivity accuracy specificity sensitivity accuracy specificity sensitivity accuracy specificity sensitivity accuracy
Bayes Network 49.00% 70.09% 59.90% 75.00% 53.95% 64.74% 75.00% 55.26% 65.38% 68.75% 76.32% 72.44% 73.75% 67.11% 70.51%
Naive Bayes 62.00% 58.88% 60.39% 75.00% 55.26% 65.38% 75.00% 57.89% 66.67% 76.25% 76.32% 76.28% 77.50% 75.00% 76.28%
Multilayer Perceptron 85.00% 87.85% 86.47% 58.75% 93.42% 75.64% 51.25% 96.05% 73.08% 98.75% 100.00% 99.36% 98.75% 98.68% 98.72%
RBF Network 51.00% 67.29% 59.42% 67.50% 65.79% 66.67% 63.75% 72.37% 67.95% 77.50% 69.74% 73.72% 77.50% 69.74% 73.72%
K Nearest Neighbors 98.00% 97.20% 97.58% 62.50% 89.47% 75.64% 62.50% 89.47% 75.64% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
HyperPipes 100.00% 5.61% 51.21% 100.00% 0.00% 51.28% 100.00% 0.00% 51.28% 100.00% 5.26% 53.85% 100.00% 5.26% 53.85%
Voting Feature
Intervals74.00% 50.47% 61.84% 73.75% 56.58% 65.38% 75.00% 56.58% 66.03% 73.75% 76.32% 75.00% 77.50% 69.74% 73.72%
Decision Table 34.00% 84.11% 59.90% 57.50% 75.00% 66.03% 61.25% 68.42% 64.74% 67.50% 65.79% 66.67% 68.75% 67.11% 67.95%
Decision Table Naive
Bayes Combination
65.00% 63.55% 64.25% 52.50% 85.53% 68.59% 53.75% 81.58% 67.31% 77.50% 77.63% 77.56% 76.25% 73.68% 75.00%
RIPPER 44.00% 73.83% 59.42% 62.50% 65.79% 64.10% 68.75% 63.16% 66.03% 73.75% 47.37% 60.90% 67.50% 53.95% 60.90%
Non Nested Generalised Exemplars
100.00% 98.13% 99.03% 63.75% 75.00% 69.23% 63.75% 75.00% 69.23% 100.00% 98.68% 99.36% 100.00% 100.00% 100.00%
PART 69.00% 90.65% 80.19% 53.75% 89.47% 71.15% 50.00% 93.42% 71.15% 87.50% 81.58% 84.62% 83.75% 97.37% 90.38%
C4.5 66.00% 89.72% 78.26% 60.00% 84.21% 71.79% 61.25% 78.95% 69.87% 87.50% 85.53% 86.54% 77.50% 88.16% 82.69%
Random Forest 99.00% 100.00% 99.52% 57.50% 94.74% 75.64% 57.50% 94.74% 75.64% 98.75% 100.00% 99.36% 100.00% 98.68% 99.36%
Random Tree 100.00% 100.00% 100.00% 60.00% 92.11% 75.64% 60.00% 92.11% 75.64% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
98.75% 100.00000 %
77777777.50%50%50%50% 69.69.69.69.74%74%74%74%
%
.50% 89 47% 75 64% 62 50% 89 47% 75 64% 100 00% 100 00%
0 %
%
.50% 89.999 47%%%% 75.64% 62.50% 89.888 47% 75.64%%%% 100.00% 100.00%
0.00000 %%% 0.00%0% 0%0 51.515151 28%28%28%28% 1000000 .00%% %% 0.00% 0%0%0% 51.51 28%8%8%% 101001010 .00000000% %%% 5.25.25 25.26%
.75%%%% 56.565656 58%58%58%8% 65.656565 38%8%8%% 75.5.5.5.00%00%00%00% 565656.5 58%58%58%58% 66.66.66.66.03%03%03%03% 73.73.73.73 75%75%75%75% 76.76.76.76 323232%32
.50% 75.00%% 666666.03%03%03%0 6161.611 25%25%25%5% 6868.668 4442% 646464.74%74%74%4% 6767.677 50% 65.79%
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
26
Table 5: Rules for CRT patient classification generated by the PART algorithm
Rule Class Patients Correct Wrong Accuracy
Based on clinical parametersLVEF <= 30 AND aetiology = IHD RR- 10 8 2 80.00%aetiology = VALV AND Age > 45 RR- 10 8 2 80.00%NYHA = 3 AND aetiology = IHD AND LVESV <= 220 RR- 21 18 3 85.71%aetiology = IDC AND sex = F AND LVESV <= 156 RR- 7 6 1 85.71%aetiology = IDC AND diabetes = Yes RR- 8 7 1 87.50%NYHA = 2 AND chronic_AF = Yes RR- 9 8 1 88.89%sex = M AND aetiology = IDC AND NYHA = 3 RR- 10 9 1 90.00%aetiology = IHD AND diabetes = No AND NYHA = 3 AND LVESV > 125 AND LVESV <= 220 AND LVEF > 25 AND LVEDV <= 268 RR+ 12 11 1 91.67%
sex = F AND Age <= 63 RR+ 16 15 1 93.75%diabetes = Yes AND NYHA = 3 AND aetiology = IHD AND LVEDV <=230 RR- 9 9 0 100.00%
aetiology = IHD AND sex = M AND NYHA = 2 AND sustained_VA =Yes AND LVESV <= 150 RR- 5 5 0 100.00%
Based on alleles
rs5522 = T AND rs3766031 = T AND rs7325635 = G RR+ 21 17 4 80.95% rs5723 = G RR- 7 6 1 85.71%rs5723 = C AND rs5443 = T AND rs7325635 = G RR+ 24 23 1 95.83%
Based on genotypes
rs3766031 = CT AND rs5522 = TT RR+ 31 25 6 80.65%rs5723 = CC AND rs5443 = TT RR+ 11 9 2 81.82%rs5723 =CC AND rs5443 =TT AND rs5522 = TT RR+ 10 9 1 90.00%rs5723 =CC AND rs7325635 =GG AND rs5522 = TT RR+ 10 9 1 90.00%rs3766031 = CT AND rs5522 = TT AND rs7325635 = GG RR+ 13 12 1 92.31%rs5723 = GG RR- 6 6 0 100.00%
Based on clinical parameters and allelesdiabetes = No AND aetiology = IDC AND rs7325635 = G AND sex = MAND rs5443 = C AND LVEDV > 262 RR- 5 4 1 80.00%LVESV <= 266 AND sex = M AND aetiology = IHD AND rs5443 = T RR- 7 6 1 85.71%rs5723 = C AND rs5522 = C AND LVEF <= 34 AND NYHA = 3 RR- 17 15 2 88.24%rs5723 = C AND diabetes = Yes AND sex = M AND LVEF > 15 RR- 12 11 1 91.67%
rs5723 = C AND diabetes = No AND rs7325635 = G AND aetiology = IDC RR+ 15 14 1 93.33%
diabetes = No AND rs5443 = T AND LVEDV > 197 RR+ 26 25 1 96.15%
Based on clinical parameters and genotypes
LVEF <= 31 AND aetiology = IDC AND diabetes = No AND rs7325635 = GA RR- 10 8 2 80.00%
rs5723 = CC AND rs5443 = CT AND diabetes = No AND LVEDV> 190 RR+ 37 30 7 81.08%
rs5723 = CC AND NYHA = 3 AND rs5522 = CT RR- 16 13 3 81.25%rs5723 = CC AND rs3766031 = CC AND rs5522 = TT AND sex = MAND diabetes = Yes RR- 10 9 1 90.00%
rs7325635 = AA AND NYHA = 3 RR- 4 4 0 100.00%
NYHA = 2 AND rs5522 = TT RR- 8 8 0 100.00%
The table presents selected rules based on different classifiers using the PART algorithm. Only rules with accuracy
LVEF, left ventricular ejection fraction; LVEDV, left ventricular end diastolic volume; LVESV, left ventricular end systolic volume; RR, reverse remodelling, VA, ventricular arrhythmias, AF, atrial fibrillation.
5555 00 0 0 101010100.0.0.0.0000000
17 4444 80808080 951
D 3
y
A 5N 2D 0D 0
10
al parameters and alleles
RR- 7 6 1 85.71D rss55545 434344 = T ANNNNAA DDD rs7325635 = G RR+ 24 23 1 95.83
ypppes
ANNDN rsrr 5522 = TT RR+++ 3111 25555 66 6 800.65ND rsrsrss555444443333 = TTTTTTTT RRRRR+ 1111111 9 2 8111.82D rs5443 =TT ANAA D rss5555522 = TT RR+ 10 999 1 90.00D rs7325635 =GG ANNNNAA D rsrsrsrs55555252525 222 ==== TTTTTT RRRRRRRR+++ 10101010 99 1 90.00ANAA D rs5522 = TT ANNAA DD rs732566635 = GGGG RR+ 1311 12 1 92.31
RRRRRR --- 6666 6666 000 100.000
all papararamemetetersrs aandnd aalllleleleses by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
DOI: 10.1161/CIRCGENETICS.113.000384
27
Figure Legends
Figure 1: Flow chart of the CRT study analysis
Figure 2: Median changes in LVEDV and LVEF for RR+ and RR- groups. Changes in LVEDV
and LVEF were compared between baseline and follow-up and presented in a box plot diagram.
Significant differences, resulting from the defined remodelling phenotypes, were found between
RR+ and RR- for volume (p<0.001) and function changes (p<0.001).
Figure 3: Receiver Operating Characteristic (ROC) curves of patients’ clinical and clinical and
genetic data. Clinical data alone and clinical data combined with genetic information resulted in
two significantly different ROC curves (p=0.002). The C statistic documented the incremental
predictive value of the model combining clinical data with genetic information.
Receiver Operating Characteristic (ROC) curves of patients’ clinical and clinical
a t
cantly different ROC curves (p=0.002). The C statistic documented the incremen
v
Recececeivivivi ererer OOOpepp rararatittt ng Characteristic (ROCCC) ))) ccurves of patientstts’’’’ clinical and clinical
a... CCClinical dataa aloneee annnnd ddd clininnniici aaal dattta combm innnededed wwwithhh ggeneetiiic inininnformmmmaaatiooonnn reeesuult
cantltllly yy didididifffffferenenttt RORROROCCCC curvvvesseses ((((p=0.000 000000002)2)2)2 . ThThTThee C CC ststststatatatisisisticcc dodododocucummentntntntedeeded tthee iiincncrerememen
value of the modeddd l ll combbbbinii iniii g gg clllliiini iici alll ddddata iiwith gggeneticii infformatiiiion.
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
BrandMichele Bianchi, Malte Lenders, Eva Brand, Oberdan Parodi, Maurizio Lunati and Stefan-Martin
Landolina, Maurizio Gasparini, Jonica Campolo, Marina Parolini, Antonio Sanzo, Paola Galimberti, Boris Schmitz, Renata DeMaria, Dimitris Gatsios, Theodora Chrysanthakopoulou, Maurizio
Cardiac Resynchronization TherapyIdentification of Genetic Markers for Treatment Success in Heart Failure Patients: Insight from
Print ISSN: 1942-325X. Online ISSN: 1942-3268 Copyright © 2014 American Heart Association, Inc. All rights reserved.
TX 75231is published by the American Heart Association, 7272 Greenville Avenue, Dallas,Circulation: Cardiovascular Genetics
published online September 10, 2014;Circ Cardiovasc Genet.
http://circgenetics.ahajournals.org/content/early/2014/09/08/CIRCGENETICS.113.000384World Wide Web at:
The online version of this article, along with updated information and services, is located on the
http://circgenetics.ahajournals.org/content/suppl/2014/09/10/CIRCGENETICS.113.000384.DC1Data Supplement (unedited) at:
http://circgenetics.ahajournals.org//subscriptions/
is online at: Circulation: Cardiovascular Genetics Information about subscribing to Subscriptions:
http://www.lww.com/reprints Information about reprints can be found online at: Reprints:
document. Permissions and Rights Question and Answer this process is available in the
located, click Request Permissions in the middle column of the Web page under Services. Further information aboutnot the Editorial Office. Once the online version of the published article for which permission is being requested is
can be obtained via RightsLink, a service of the Copyright Clearance Center,Circulation: Cardiovascular Genetics Requests for permissions to reproduce figures, tables, or portions of articles originally published inPermissions:
by guest on May 17, 2018
http://circgenetics.ahajournals.org/D
ownloaded from
SUPPLEMENTAL MATERIAL
Supplemental Methods
Genotyping PCR conditions
TaqMan SNP genotyping assays were performed on the real-time PCR System ABI7900 (Life
Technologies Corporation, Carlsbad, USA) in a 384 well format (2.5 μl TaqMan Genotyping
Master Mix [2x], 0.125 μl TaqMan SNP Genotyping Assay [40x], 2.375 μl DNase free water
and 2 ng DNA). Real-time PCR conditions were as follows: initial denaturation at 95°C for 10
min; 40 cycles of 95°C for 15 sec and 60°C for 1 min.
Machine learning algorithms used in the study
Within each classifier, 15 different machine learning algorithms were applied. We used
Random Forest,1 Decision Tables,2 Bayesian Network,3,4 Naive Bayes,3,4 Multilayer
Perceptron,3,4 RBF Network,3 K Nearest Neighbors,4,5 HyperPipes,6 Voting Feature Intervals,7
Decision Table Naive Bayes Combination,8 Repeated Incremental Pruning to Produce Error
Reduction (RIPPER),9 Non Nested Generalised Exemplars (NNGE),10 PART,11 Decision Tree
Induction (C4.5)12 and Random Tree.6 The different methods were evaluated for their
specificity, sensitivity and accuracy for the detection of RR+ and RR- individuals.
1
Supplemental References
1. Breiman L. Random Forests. Machine Learning. 2001;45:5-32.
2. Kohavi R. The Power of Decision Tables. 8th European Conference on Machine
Learning. 1995;174-189.
3. Bishop C. Pattern Recognition and Machine Learning. 1st ed. New York: Springer; 2006.
4. Tan PN, Steinbach M, Kumar V. Introduction to Data Mining. Pearson Education Inc;
2006.
5. Aha J, Kibler D, Albert M. Instance - Based Learning Algorithms. Machine Learning.
1991;6:37-66.
6. Witten I, Frank E. Data mining: practical machine learning tools and techniques with Java
implementations. Morgan Kaufmann, 2000.
7. Demiroz G, Guvenir H. Classification by Voting Feature Intervals. Lecture Notes In
Computer Science. 1997;1224:85-92.
8. Hall M, Frank E. Combining Naive Bayes and Decision Tables. Association for the
Advancement of Artificial Intelligence 2008.
9. Cohen W. Fast effective rule induction. In Morgan Kaufmann. 1995;115-123.
10. Martin B. Instance-Based Learning: Nearest Neighbor With Generalization. Thesis to the
Department of Computer Science, University of Waikato, Hamilton, New Zealand; 1995.
11. Frank E, Witten IH. Generating Accurate Rule Sets Without Global Optimization.
1998;144-151.
12. Quinlan RJ. C4.5: programs for machine learning. San Francisco, CA: Morgan Kaufmann;
1993.
2
Supplemental table 1: List of independent classifiers used for the classification of heart failure patients in CRT responder and non-responder
“Clinical” “Genotypes” “Alleles” “Clinical & Genotypes”
“Clinical & Alleles”
sex rs5443 (CC/TT/CT) rs5443 (C/T) sex sex
age rs3766031 (CC/TT/CT) rs3766031 (C/T) age age
aetiology of heart failure rs5723 (CC/GG/CG) rs5723 (C/G) aetiology of HF aetiology of HF
LVEF (Left Ventricular Ejection Fraction) rs5522 (TT/CC/CT) rs5522 (C/T) LVEF LVEF
LVESV (LV End Systolic Volume) rs7325635 (GG/AA/AG) rs7325635 (A/G) LVESV LVESV
chronic AF (Atrial Fibrillation) chronic AF chronic AF
NYHA classification NYHA classification NYHA classification
LVEDV (LV End Diastolic Volume) LVEDV LVEDV
diabetes diabetes diabetes
sustained VA (Ventricular Arrhythmias) sustained VA sustained VA
rs5443 (CC/TT/CT) rs5443 (C/T)
rs3766031 (CC/TT/CT) rs3766031 (C/T)
rs5723 (CC/GG/CG) rs5723 (C/G)
rs5522 (TT/CC/CT) rs5522 (C/T)
rs7325635 (GG/AA/AG) rs7325635 (A/G)
NYHA, New York Heart Association.
3
Supplemental table 2: Clinical baseline parameters of patients available for data mining
All
(n=207) RR+
(n=107) RR-
(n=100) P-
value
Gender (male) 174 (84%) 85 (79%) 89 (89%) 0.086 Age (years) 63 [57-70] 64 [57-69] 63 [56-70] 0.740
Type 2 diabetes mellitus 33 (16%) 12 (11%) 21 (21%) 0.060
Atrial fibrillation 35 (17%) 15 (14%) 20 (20%) 0.271 NYHA class 0.550
class II 143 (69%) 31 (29%) 33 (33%) -
class III-IV 143 (69%) 76 (71%) 67 (67%) - Aetiology 0.169
IHD 98 (47%) 48 (45%) 50 (50%) -
IDC 94 (45%) 54 (51%) 40 (40%) - VALV 15 (7%) 5 (5%) 10 (10%) -
Medication
Beta-blockers 158 (81%) 80 (80%) 78 (82%) 0.719 RAS inhibitors 186 (95%) 94 (94%) 92 (97%) 0.499
Echocardiography
LVEF (%) 26 [22-30] 27 [22-30] 26 [22-30] 0.979 LVEDV (ml) 224 [182-286] 225 [194-293] 223 [172-284] 0.429
LVESV (ml) 170 [131-222] 170 [135-232] 163 [121-220] 0.279
Follow-up (month) 9 [7-12] 9 [7-12] 10 [7-13] 0.246
Values are expressed as n (frequency percent) or median [interquartile range]. P-values for categorical variables
were calculated by Chi-square or Fisher’s exact test, p-values for non-categorical variables were calculated by
Student’s t- or Mann-Whitney test. IHD, ischaemic heart disease; IDC, idiopathic dilated cardiomyopathy;
VALV, valvular defect; LVEF, left ventricular ejection fraction; LVEDV, left ventricular end diastolic volume;
LVESV, left ventricular end systolic volume; MI, myocardial infarction; RAS, renin-angiotensin system; RR,
reverse remodelling.
4
Supplemental table 3: Parameter settings for data mining algorithms used in the study
Algorithm Parameter settings
Bayesian Network# Search method: K2 algorithm
Maximum number of parents of a node: 1
Naive Bayes* none applied
Multilayer Perceptron
Hidden Layers: (number of attributes + number of classes)/2
Learning Rate: 0.3
Bias: 0.2
Normalization: From -1 to 1 All nominal attributes were converted into binary numeric attributes. An attribute with k values was transformed into k binary attributes if the class was nominal (using the one-attribute-per-value approach) Epochs: 500
RBF Network
Minimum Standard Deviation: 0.1
The number of clusters generated by K means: 2
Ridge value for the logistic or linear regression: 1.00E-08
K Nearest Neighbors
No Distance Weighting
Search Algorithm: Linear Search
Distance Function: Euclidean Distance
HyperPipes Bias: 0.6
Voting Feature Intervals Weight feature intervals by confidence
Cross Validation: Leave One Out
Decision Table
Evaluation of attribute combinations using: Accuracy Search method used to find good attribute combinations: Best First; Direction: Forward; Maximum size of the lookup cache: 1; Number of backtracks: 5 Cross Validation: Leave One Out
Decision Table Naive Bayes Combination
Measure used to evaluate the performance of attribute combinations: Accuracy Evaluation of attribute combinations using forward selection (naive Bayes)/backward elimination (decision table) Number of folds used for pruning: 3
RIPPER
Minimum total weight of the instances in a rule: 2
Number of optimization runs: 2
Number of attempts for generalization: 5
Non Nested Generalised Exemplars
Number of folders for mutual information: 2
Confidence factor for pruning: 0.25
PART
Minimum number of instances per rule: 2
Number of folds used for pruning: 3
Confidence factor for pruning: 0.25
C4.5
Minimum number of instances per rule: 2
Number of folds used for pruning: 3
Maximum depth of the trees: Unlimited
Random Forest
Number of attributes to be used in random selection: Unlimited
Number of trees to be generated: 10
Maximum depth of the trees: Unlimited
Random Tree Number of attributes to be used in random selection: log_2(number of attributes) + 1
Minimum number of instances per rule: 1
Modeling of continuous variables: #discretization by minimization heuristic; *assuming a Gaussian distribution.
5
Supplemental table 4: Results of Random Forest, C4.5, PART, Decision Table, Bayes Network and Multilayer Perceptron using different parameter values in the “Clinical” data set
Dataset “Clinical”
Method specificity sensitivity accuracy
Random Forest (2 Trees) 97.00% 91.59% 94.20%
Random Forest (10 Trees) 99.00% 100.00% 99.52%
Random Forest (20 Trees) 99.00% 100.00% 99.52%
Random Forest (30 Trees) 100.00% 100.00% 100.00%
Random Forest (40 Trees) 100.00% 100.00% 100.00%
Random Forest (50 Trees) 100.00% 100.00% 100.00%
C4.5 (min number of instances/leaf: 2) 58.00% 76.64% 67.63%
C4.5 (min number of instances/leaf: 5) 58.00% 74.77% 66.67%
C4.5 (min number of instances/leaf: 10) 27.00% 89.72% 59.42%
C4.5 (min number of instances/leaf: 15) 27.00% 89.72% 59.42%
C4.5 (min number of instances/leaf: 20) 89.00% 20.56% 53.62%
PART (min number of instances/rule: 2) 71.00% 77.57% 74.40%
PART (min number of instances/rule: 5) 54.00% 76.64% 65.70%
PART (min number of instances/rule:10) 68.00% 46.73% 57.00%
PART (min number of instances/rule: 15) 63.00% 56.07% 59.42%
PART (min number of instances/rule: 20) 0.00% 100.00% 51.69%
Decision Table (search method: BestFirst) 34.00% 84.11% 59.90%
Decision Table (search method: GreedyStepwise) 34.00% 84.11% 59.90%
Decision Table (search method: LinearForwardSelection) 34.00% 84.11% 59.90%
Decision Table (search method: RankSearch) 45.00% 76.64% 61.35%
Decision Table (search method: ScatterSearchV1) 45.00% 76.64% 61.35%
Decision Table (search method: SubsetSizeForwardSelection) 34.00% 84.11% 59.90%
Bayes Network (method for searching network structures: ICSSearchAlgorithm) 0.00% 100.00% 51.69%
Bayes Network (method for searching network structures: Naive Bayes) 49.00% 70.09% 59.90%
Bayes Network (method for searching network structures: gHillClimber) 49.00% 70.09% 59.90%
Bayes Network (method for searching network structures: gK2) 49.00% 70.09% 59.90%
Bayes Network (method for searching network structures: gRepeatedHillClimber) 49.00% 70.09% 59.90%
Bayes Network (method for searching network structures: gSimulatedAnnealing) 60.00% 71.03% 65.70%
Bayes Network (method for searching network structures: gabuSearch) 49.00% 70.09% 59.90%
Bayes Network (method for searching network structures: lHillClimber) 0.00% 100.00% 51.69%
Bayes Network (method for searching network structures: lK2) 49.00% 70.09% 59.90%
Bayes Network (method for searching network structures: lLAGDHillClimber) 0.00% 100.00% 51.69%
Bayes Network (method for searching network structures: lRepeatedHillClimber) 0.00% 100.00% 51.69%
Bayes Network (method for searching network structures: lSimulatedAnnealing) 0.00% 100.00% 51.69%
Bayes Network (method for searching network structures: lTabuSearch) 0.00% 100.00% 51.69%
Bayes Network (method for searching network structures: lTAN) 45.00% 71.03% 58.45% Multilayer Perceptron (1 hidden layer neurons = [number of attributes + number of classes]/2) 85.00% 87.85% 86.47%
Multilayer Perceptron (1 hidden layer 2 neurons) 39.00% 94.39% 67.63%
Multilayer Perceptron (1 hidden layer neurons = number of attributes) 82.00% 96.26% 89.37% Multilayer Perceptron (1 hidden layer neurons = number of attributes + number of classes) 86.00% 92.52% 89.37%
6
Supplemental table 5: Results of Random Forest, C4.5, PART, Decision Table, Bayes Network and Multilayer Perceptron using different parameter values in the “Alleles” data set
Dataset “Alleles”
Method specificity sensitivity accuracy
Random Forest (2 Trees) 66.25% 75.00% 70.51%
Random Forest (10 Trees) 57.50% 94.74% 75.64%
Random Forest (20 Trees) 56.25% 96.05% 75.64%
Random Forest (30 Trees) 56.25% 96.05% 75.64%
Random Forest (40 Trees) 56.25% 96.05% 75.64%
Random Forest (50 Trees) 56.25% 96.05% 75.64%
C4.5 (min number of instances/leaf: 2) 43.75% 88.16% 65.38%
C4.5 (min number of instances/leaf: 5) 55.00% 85.53% 69.87%
C4.5 (min number of instances/leaf: 10) 51.25% 86.84% 68.59%
C4.5 (min number of instances/leaf: 15) 68.75% 64.47% 66.67%
C4.5 (min number of instances/leaf: 20) 68.75% 64.47% 66.67%
PART (min number of instances/rule: 2) 71.25% 57.89% 64.74%
PART (min number of instances/rule: 5) 82.50% 30.26% 57.05%
PART (min number of instances/rule:10) 67.50% 53.95% 60.90%
PART (min number of instances/rule: 15) 67.50% 53.95% 60.90%
PART (min number of instances/rule: 20) 67.50% 53.95% 60.90%
Decision Table (search method: BestFirst) 61.25% 68.42% 64.74%
Decision Table (search method: GreedyStepwise) 61.25% 68.42% 64.74%
Decision Table (search method: LinearForwardSelection) 61.25% 68.42% 64.74%
Decision Table (search method: RankSearch) 68.75% 61.84% 65.38%
Decision Table (search method: ScatterSearchV1) 76.25% 51.32% 64.10%
Decision Table (search method: SubsetSizeForwardSelection) 58.75% 72.37% 65.38%
Bayes Network (method for searching network structures: ICSSearchAlgorithm) 51.25% 86.84% 68.59%
Bayes Network (method for searching network structures: Naive Bayes) 75.00% 55.26% 65.38%
Bayes Network (method for searching network structures: gHillClimber) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: gK2) 75.00% 55.26% 65.38%
Bayes Network (method for searching network structures: gRepeatedHillClimber) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: gSimulatedAnnealing) 60.00% 85.53% 72.44%
Bayes Network (method for searching network structures: gabuSearch) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: lHillClimber) 77.50% 50.00% 64.10%
Bayes Network (method for searching network structures: lK2) 75.00% 55.26% 65.38%
Bayes Network (method for searching network structures: lLAGDHillClimber) 68.75% 63.16% 66.03%
Bayes Network (method for searching network structures: lRepeatedHillClimber) 77.50% 50.00% 64.10%
Bayes Network (method for searching network structures: lSimulatedAnnealing) 57.50% 86.84% 71.79%
Bayes Network (method for searching network structures: lTabuSearch) 78.75% 50.00% 64.74%
Bayes Network (method for searching network structures: lTAN) 63.75% 71.05% 67.31%
Multilayer Perceptron (1 hidden layer neurons = [number of attributes + number of classes]/2) 51.25% 96.05% 73.08%
Multilayer Perceptron (1 hidden layer 2 neurons) 51.25% 93.42% 71.79%
Multilayer Perceptron (1 hidden layer neurons = number of attributes) 57.50% 94.74% 75.64%
Multilayer Perceptron (1 hidden layer neurons = number of attributes + number of classes) 57.50% 94.74% 75.64%
7
Supplemental table 6: Results of Random Forest, C4.5, PART, Decision Table, Bayes Network and Multilayer Perceptron using different parameter values in the “Genotypes” data set
Dataset “Genotypes”
Method specificity sensitivity accuracy
Random Forest (2 Trees) 67.50% 73.68% 70.51%
Random Forest (10 Trees) 57.50% 94.74% 75.64%
Random Forest (20 Trees) 56.25% 96.05% 75.64%
Random Forest (30 Trees) 57.50% 94.74% 75.64%
Random Forest (40 Trees) 57.50% 94.74% 75.64%
Random Forest (50 Trees) 57.50% 94.74% 75.64%
C4.5 (min number of instances/leaf: 2) 55.00% 90.79% 72.44%
C4.5 (min number of instances/leaf: 5) 55.00% 86.84% 70.51%
C4.5 (min number of instances/leaf: 10) 67.50% 65.79% 66.67%
C4.5 (min number of instances/leaf: 15) 67.50% 65.79% 66.67%
C4.5 (min number of instances/leaf: 20) 67.50% 65.79% 66.67%
PART (min number of instances/rule: 2) 63.75% 69.74% 66.67%
PART (min number of instances/rule: 5) 50.00% 84.21% 66.67%
PART (min number of instances/rule:10) 61.25% 68.42% 64.74%
PART (min number of instances/rule: 15) 61.25% 68.42% 64.74%
PART (min number of instances/rule: 20) 67.50% 55.26% 61.54%
Decision Table (search method: BestFirst) 57.50% 75.00% 66.03%
Decision Table (search method: GreedyStepwise) 67.50% 56.58% 62.18%
Decision Table (search method: LinearForwardSelection) 57.50% 75.00% 66.03%
Decision Table (search method: RankSearch) 35.00% 82.89% 58.33%
Decision Table (search method: ScatterSearchV1) 57.50% 75.00% 66.03%
Decision Table (search method: SubsetSizeForwardSelection) 57.50% 77.63% 67.31%
Bayes Network (method for searching network structures: ICSSearchAlgorithm) 70.00% 67.11% 68.59%
Bayes Network (method for searching network structures: Naive Bayes) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: gHillClimber) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: gK2) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: gRepeatedHillClimber) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: gSimulatedAnnealing) 68.75% 78.95% 73.72%
Bayes Network (method for searching network structures: gabuSearch) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: lHillClimber) 73.75% 52.63% 63.46%
Bayes Network (method for searching network structures: lK2) 75.00% 53.95% 64.74%
Bayes Network (method for searching network structures: lLAGDHillClimber) 73.75% 52.63% 63.46%
Bayes Network (method for searching network structures: lRepeatedHillClimber) 73.75% 52.63% 63.46%
Bayes Network (method for searching network structures: lSimulatedAnnealing) 72.50% 61.84% 67.31%
Bayes Network (method for searching network structures: lTabuSearch) 73.75% 52.63% 63.46%
Bayes Network (method for searching network structures: lTAN) 78.75% 56.58% 67.95% Multilayer Perceptron (1 hidden layer neurons = [number of attributes + number of classes]/2) 58.75% 93.42% 75.64%
Multilayer Perceptron (1 hidden layer 2 neurons) 47.50% 97.37% 71.79%
Multilayer Perceptron (1 hidden layer neurons = number of attributes) 56.25% 96.05% 75.64% Multilayer Perceptron (1 hidden layer neurons = number of attributes + number of classes) 58.75% 93.42% 75.64%
8
Supplemental table 7: Results of Random Forest, C4.5, PART, Decision Table, Bayes Network and Multilayer Perceptron using different parameter values in the “Clinical & Alleles” data set
Dataset “Clinical & Alleles”
Method specificity sensitivity accuracy
Random Forest (2 Trees) 98.75% 80.26% 89.74%
Random Forest (10 Trees) 100.00% 98.68% 99.36%
Random Forest (20 Trees) 100.00% 98.68% 99.36%
Random Forest (30 Trees) 100.00% 98.68% 99.36%
Random Forest (40 Trees) 100.00% 100.00% 100.00%
Random Forest (50 Trees) 100.00% 100.00% 100.00%
C4.5 (min number of instances/leaf: 2) 73.75% 84.21% 78.85%
C4.5 (min number of instances/leaf: 5) 55.00% 88.16% 71.15%
C4.5 (min number of instances/leaf: 10) 88.75% 27.63% 58.97%
C4.5 (min number of instances/leaf: 15) 87.50% 36.84% 62.82%
C4.5 (min number of instances/leaf: 20) 87.50% 36.84% 62.82%
PART (min number of instances/rule: 2) 67.50% 75.00% 71.15%
PART (min number of instances/rule: 5) 70.00% 64.47% 67.31%
PART (min number of instances/rule:10) 78.75% 46.05% 62.82%
PART (min number of instances/rule: 15) 82.50% 35.53% 59.62%
PART (min number of instances/rule: 20) 82.50% 35.53% 59.62%
Decision Table (search method: BestFirst) 68.75% 67.11% 67.95%
Decision Table (search method: GreedyStepwise) 68.75% 67.11% 67.95%
Decision Table (search method: LinearForwardSelection) 68.75% 67.11% 67.95%
Decision Table (search method: RankSearch) 68.75% 61.84% 65.38%
Decision Table (search method: ScatterSearchV1) 61.25% 68.42% 64.74%
Decision Table (search method: SubsetSizeForwardSelection) 68.75% 67.11% 67.95%
Bayes Network (method for searching network structures: ICSSearchAlgorithm) 68.75% 85.53% 76.92%
Bayes Network (method for searching network structures: Naive Bayes) 73.75% 67.11% 70.51%
Bayes Network (method for searching network structures: gHillClimber) 73.75% 71.05% 72.44%
Bayes Network (method for searching network structures: gK2) 73.75% 67.11% 70.51%
Bayes Network (method for searching network structures: gRepeatedHillClimber) 73.75% 71.05% 72.44%
Bayes Network (method for searching network structures: gSimulatedAnnealing) 91.25% 93.42% 92.31%
Bayes Network (method for searching network structures: gabuSearch) 73.75% 71.05% 72.44%
Bayes Network (method for searching network structures: lHillClimber) 77.50% 60.53% 69.23%
Bayes Network (method for searching network structures: lK2) 73.75% 67.11% 70.51%
Bayes Network (method for searching network structures: lLAGDHillClimber) 77.50% 60.53% 69.23%
Bayes Network (method for searching network structures: lRepeatedHillClimber) 77.50% 60.53% 69.23%
Bayes Network (method for searching network structures: lSimulatedAnnealing) 75.00% 80.26% 77.56%
Bayes Network (method for searching network structures: lTabuSearch) 75.00% 67.11% 71.15%
Bayes Network (method for searching network structures: lTAN) 76.25% 73.68% 75.00% Multilayer Perceptron (1 hidden layer neurons = [number of attributes + number of classes]/2) 98.75% 98.68% 98.72%
Multilayer Perceptron (1 hidden layer 2 neurons) 81.25% 82.89% 82.05%
Multilayer Perceptron (1 hidden layer neurons = number of attributes) 98.75% 98.68% 98.72% Multilayer Perceptron (1 hidden layer neurons = number of attributes + number of classes) 98.75% 100.00% 99.36%
9
Supplemental table 8: Results of Random Forest, C4.5, PART, Decision Table, Bayes Network and Multilayer Perceptron using different parameter values in the “Clinical & Genotypes” data set
Dataset “Clinical & Genotypes”
Method specificity sensitivity accuracy
Random Forest (2 Trees) 97.50% 81.58% 89.74%
Random Forest (10 Trees) 98.75% 100.00% 99.36%
Random Forest (20 Trees) 100.00% 98.68% 99.36%
Random Forest (30 Trees) 100.00% 100.00% 100.00%
Random Forest (40 Trees) 100.00% 100.00% 100.00%
Random Forest (50 Trees) 100.00% 100.00% 100.00%
C4.5 (min number of instances/leaf: 2) 75.00% 84.21% 79.49%
C4.5 (min number of instances/leaf: 5) 60.00% 85.53% 72.44%
C4.5 (min number of instances/leaf: 10) 50.00% 89.47% 69.23%
C4.5 (min number of instances/leaf: 15) 88.75% 27.63% 58.97%
C4.5 (min number of instances/leaf: 20) 88.75% 27.63% 58.97%
PART (min number of instances/rule: 2) 66.25% 82.89% 74.36%
PART (min number of instances/rule: 5) 92.50% 23.68% 58.97%
PART (min number of instances/rule:10) 100.00% 0.00% 51.28%
PART (min number of instances/rule: 15) 82.50% 35.53% 59.62%
PART (min number of instances/rule: 20) 82.50% 35.53% 59.62%
Decision Table (search method: BestFirst) 67.50% 65.79% 66.67%
Decision Table (search method: GreedyStepwise) 67.50% 65.79% 66.67%
Decision Table (search method: LinearForwardSelection) 67.50% 65.79% 66.67%
Decision Table (search method: RankSearch) 80.00% 44.74% 62.82%
Decision Table (search method: ScatterSearchV1) 57.50% 75.00% 66.03%
Decision Table (search method: SubsetSizeForwardSelection) 66.25% 65.79% 66.03%
Bayes Network (method for searching network structures: ICSSearchAlgorithm) 76.25% 73.68% 75.00%
Bayes Network (method for searching network structures: Naive Bayes) 68.75% 76.32% 72.44%
Bayes Network (method for searching network structures: gHillClimber) 70.00% 76.32% 73.08%
Bayes Network (method for searching network structures: gK2) 68.75% 76.32% 72.44%
Bayes Network (method for searching network structures: gRepeatedHillClimber) 70.00% 76.32% 73.08%
Bayes Network (method for searching network structures: gSimulatedAnnealing) 86.25% 88.16% 87.18%
Bayes Network (method for searching network structures: gabuSearch) 70.00% 76.32% 73.08%
Bayes Network (method for searching network structures: lHillClimber) 75.00% 52.63% 64.10%
Bayes Network (method for searching network structures: lK2) 68.75% 76.32% 72.44%
Bayes Network (method for searching network structures: lLAGDHillClimber) 75.00% 52.63% 64.10%
Bayes Network (method for searching network structures: lRepeatedHillClimber) 75.00% 52.63% 64.10%
Bayes Network (method for searching network structures: lSimulatedAnnealing) 72.50% 82.89% 77.56%
Bayes Network (method for searching network structures: lTabuSearch) 68.75% 71.05% 69.87%
Bayes Network (method for searching network structures: lTAN) 75.00% 71.05% 73.08% Multilayer Perceptron (1 hidden layer neurons = [number of attributes + number of classes]/2) 98.75% 100.00% 99.36%
Multilayer Perceptron (1 hidden layer 2 neurons) 82.50% 86.84% 84.62%
Multilayer Perceptron (1 hidden layer neurons = number of attributes) 100.00% 100.00% 100.00% Multilayer Perceptron (1 hidden layer neurons = number of attributes + number of classes) 97.50% 100.00% 98.72%
10