1
SYNTHETIC PEPTIDES ANTIMICROBIAL ACTIVITY 1
PREDICTION USING DECISION TREE MODEL 2
3
RUNNING TITLE 4
Synthetic peptides antimicrobial activity prediction using decision tree model 5
6
JOURNAL SECTION 7
Applied Microbiology 8
9
AUTHORS NAMES 10
Felipe Lira1, Pedro S. Perez2, José A. Baranauskas2, Sérgio R. Nozawa # 1 11
12
AUTHOR ADDRESS 13
1 Laboratório de Expressão Gênica, Universidade Nilton Lins. Av. Prof. Nilton Lins, 3259. 14
Postal Code: 69058-030 Manaus, Amazonas, Brazil. Tel: + 55 92 3643 2110 15
17
2 Departamento de Computação e Matemática, Faculdade de Filosofia, Ciências e Letras de 18
Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, Brazil. 19
20
21
Copyright © 2013, American Society for Microbiology. All Rights Reserved.Appl. Environ. Microbiol. doi:10.1128/AEM.02804-12 AEM Accepts, published online ahead of print on 1 March 2013
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
2
ABSTRACT 22
Antimicrobial resistance becomes a persistent problem at public health treatment. However, 23
recent attempts to find effective substitutes to combat microorganism infections have directed 24
efforts to identify natural antimicrobial peptides able to overcome the resistance to commercial 25
antibiotics. The aim of this study describes the development of synthetic peptides with 26
antimicrobial activity, created in silico by site-directed mutations modeling using wild-type 27
peptides as scaffold for these mutations. Using fragments of antimicrobial peptides there were 28
modeled using molecular modeling computational tools. To analyze these peptides there was 29
created a Decision Tree model, which indicated the action range of peptides on the types of 30
microorganisms on which they can exercise biological activity. The Decision Tree model was 31
trained using physicochemistry properties from known antimicrobial peptides available at 32
Antimicrobial Peptide Database (APD). The two most promising peptides were synthesized and 33
the antimicrobial assays showed inhibitory activity against gram-positive and gram-negative 34
bacteria. Colossomin-C and Colossomin-D were the most inhibitory peptides at 5μg/mL against 35
Staphylococcus aureus and Escherichia coli. The methods described in this work and the results 36
obtained are useful for the identification and development of new compounds with antimicrobial 37
activity through the use of computational tools. 38
39
KEYWORDS: 40
Antimicrobial activity, Synthetic peptide, Activity prediction, Peptide modeling. 41
42
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
3
INTRODUCTION 43
The increase in microbial resistance to commercial antibiotics and the need for protection 44
against pathogenic agents, have led to the development of rapid and efficient defense 45
mechanisms. In this context, antimicrobial peptides represent a primitive defense mechanism that 46
is present in all organisms, from invertebrates through to higher organisms, including humans(6). 47
In recent decades, various species of antimicrobial peptides have been isolated, and presented 48
a wide spectrum of activity against gram-positive and gram-negative bacteria(22) The 49
production of these peptides in higher organisms plays an important role in the adaptive defense 50
system and the regulation of various biological systems(8). This production is beneficial, as it 51
occurs at low metabolic cost; the peptides are easily stored in large quantities, and are rapidly 52
made available to neutralize infections caused by microorganisms. 53
Due to their importance for the organism's immune system, antimicrobial peptides have 54
become the object of much interest as a source of inspiration for the development of new drugs, 55
based on changes to already known molecules(11). The manipulation of these structures is a 56
promising source for the creation of new antimicrobial peptides capable of blocking or inhibiting 57
the growth of bacteria, fungi, parasites, tumor cells, and even encapsulated viruses like HIV(7, 58
8). Different methods are used to develop these artificial peptides, in particular, the synthesis of 59
analogous peptides, which differentiate from natural peptides into one or more positions of the 60
amino acid chain by substitution, deletion or insertion of residues (2). This enables the crucial 61
residues for the performance of antimicrobial activity to be determined, and the desired effects to 62
be modulated, with the purpose of making the analogous peptides more effective than the 63
parental peptide (14, 19). However, these processes are costly, and time-consuming to execute. 64
Various pharmaceutical companies have therefore encouraged the use of Bioinformatics, to 65
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
4
investigate bioactive peptides in the search for new drugs using computer tools, complemented 66
by genome, transcriptome and proteome studies(16). 67
Based on the accumulation of information on the mechanism of action of antimicrobial 68
peptides, various databases have been created, with detailed information about these 69
peptides(12). The diversity of forms and characteristics makes it difficult to develop methods 70
capable of predicting the antimicrobial activity of peptides based on the similarity of their 71
sequences alone. Therefore, there is a need for computational tools capable of minimizing the 72
costs of predicting the antibacterial activity of peptides, and planning peptides that are more 73
effective against pathogens. To this end, methods like the Quantitative Structure-Activity 74
Relationship (QSAR), Structure Activity Relationship (SAR), and Decision Trees (DT) were 75
developed, to look for similar sequences and predict the activity based on numerical data relating 76
to structure and antimicrobial activity(12). 77
Based on studies to discover new therapeutic agents through peptide modelling using known 78
antimicrobial peptides as backbone, this study describes the induction of a decision tree model to 79
predict the antimicrobial activity of synthetic peptides created through substitutions of amino 80
acids residues in the parental peptide, obtained from the cDNA library of Colossoma 81
macropomum (Tambaqui), an Amazonian Neotropical teleostean with high commercial value 82
representing an economic relevant fish species from the Amazon basin(1). 83
84
METHODOLOGY 85
Identification of potential antimicrobial peptides 86
The identification of coding-sequences genes for the antimicrobial peptide activity was 87
performed through the construction of the cDNA library of Colossoma macropomum, using the 88
SMART cDNA Library Construction Kit (Clontech, USA), and sequencing more than 300 89
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
5
clones. A BlastX search was performed in the GenBank database on a local server 90
(www.ncbi.nlm.nih.gov) in which a cDNA encoding a potential antimicrobial peptide with 19 91
residues was found. This peptide was initially named "Colossomin". 92
. 93
Synthetic peptides modelling 94
Based on the sequence of 19 amino acids of the "Colossomin" peptide, there were created five 95
of analogous peptides using a diagram proposed by Bordo & Argos to guide the substitutions of 96
amino acid residues, increasing and/or maintaining the antimicrobial activity demonstrated by 97
the parental peptide(3). The substitutions were based on the circumstances of net charge, total 98
hydrophobic ratio (%), positive charge distribution to arrange the hydrophobic residues on the 99
same surface, amphiphilic character and the protein-binding potential - Boman index (10). The 100
sequences obtained after the residues substitutions were submitted to the predictive tool 101
available at Antimicrobial Peptide Database v2.34 (APD2)(20) to verify their antimicrobial 102
potential (15). 103
Analogs peptides synthesis 104
To verify if the antimicrobial activity of the analogs peptides could maintain, or overcome, 105
the effects observed for the parental peptide, analogs peptides were constructed using the Fmoc 106
solid-phase peptide synthesis strategy(4). The C-terminal amino acid of the native peptide was 107
maintained in some analogs and named as Colossomin-C and Colossomin-D. 108
Decision Tree Experimental Setup 109
Induction of decision trees is a machine learning approach that has been applied to several 110
tasks. Decision trees (DT) are well-suited for large, real-world tasks as they scale well and can 111
represent complex concepts by constructing simple yet robust logic-based classifiers amenable to 112
direct expert interpretation(15). Top-down inductions of decision trees algorithms generally 113
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
6
choose a feature that partitions the training data according to some evaluation functions(17). The 114
partitions are then recursively split until some stopping criterion is reached. After that, the 115
decision tree is pruned in order to avoid over fitting(9). In our experiments, we used the 116
algorithm J48 from Weka(21), a library of several machine learning algorithms. J48 is a Java 117
implementation of the well-known C4.5 algorithm(17). 118
The training data is composed of 60 antimicrobial, each described by 53 molecular descriptors. 119
These descriptors include structure, net charge, hydrophobic residues, Boman index, among 120
others, and were obtained using the program package Marvin Beans 121
(www.chemaxon.com/download/marvin). Peptides were labelled into four classes, according to 122
their microbial activity level: NONE, LOW, MEDIUM and HIGH as follows. The class label for 123
a specific peptide is defined as none if no activity is found in any of the cell types, low if the 124
activity occurs in only one organism, medium if the activity occurs in exactly two organisms and 125
high if it occurs in three or more organisms. According to this procedure, the distribution of 126
classes as none, low, medium and high in the training data was 3 (5%), 17 (28%), 20 (33%) and 127
20 (33%), respectively. 128
In order to select the most predictive attributes and find the best configuration of parameters 129
for J48, we used Windowing(18), which is a technique that begins to learn with only a fraction 130
(window) of the examples in the dataset. A classifier is induced using the initial window, and it 131
is tested using the examples not present in the window. A fraction of the examples outside the 132
window, and that were misclassified, is added to the window. A new classifier is induced and 133
tested, and the process is repeated until there are no misclassifications. Windowing can be 134
repeated many times (trials), starting with a different initial window each time. We used the 135
windowing provided by C4.5. After applying the technique with different configurations of C4.5, 136
and windowing itself, we chose the tree with the best test error. We then built a new dataset, 137
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
7
composed of only those attributes found in the unpruned version of the best tree: NetCharge, 138
Hydrogen, Oxygen, Isoeletric Point, LogP of Non-ionic species, ASA_P, Balaban Index, 139
Dreiding Energy and Minimal Projection Radius. Using Weka, we ran J48 over the dataset with 140
the filtered attributes. The default parameters for the inducer were used, except for the parameter 141
M, which determines the minimum number of examples that a leaf must contain. As M value, we 142
used three, with which the best tree found before was set. 143
Spectrum of activity prediction using decision trees 144
After the detection of the peptides’ antimicrobial potential using the Antimicrobial Peptide 145
Database, they were classified by the decision tree, and activity spectrum inferred as none, low, 146
medium or high. This was done considering the criteria adopted to determine the peptide activity 147
according to the types of organisms on which it will act, inhibiting or extinguishing their growth. 148
Antimicrobial tests 149
The microbiological assays were carried using Staphylococcus aureus (Gram +)and 150
Escherichia coli (Gram -). Inoculated Petri dishes were submitted to disc agar-diffusion method 151
with 10µL of each synthetic peptide diluted with water to 5µg/ml. Four paper disks of 5mm in 152
diameter were equally disposed at all Petri dishes with solid LB (Luria Bertani) culture medium 153
and impregnated with diluted peptides. The Petri dishes were incubated for 20 hours at 37ºC to 154
determine the formation of growth inhibition disks. 155
Statistical analysis 156
The data analyses were carried measuring the diameter of the inhibition disks formed by each 157
synthetic peptide using the Wilcoxon-Mann-Whitney test at R program (http://www.r-158
project.org/) to compare theirs antimicrobial activities. The antimicrobial peptide of each peptide 159
was analyzed against different bacteria, and their specific activity to detect the most efficient 160
antimicrobial peptide was compared. 161
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
8
RESULTS 162
Synthetic peptides modelling 163
After computational modelling and prior analysis of the antimicrobial potential, five peptides 164
were designed using the parental peptide as scaffold and only two were selected to Fmoc solid 165
phase synthesis and microbiological tests (Table 1). 166
Decision Tree Experimental Setup 167
The decision tree induced is composed of nine decision nodes containing the eight attributes 168
(NetCharge, Hydrogen, Oxygen, Isoeletric Point, LogP of Non-ionic species, ASA_P, Balaban 169
Index and Dreiding Energy) and ten leaves indicating the level of activity of the synthetic 170
peptides (high, medium, low or no activity) (Figure 1). 171
The decision tree model was validated using leave-one-out. The values of accuracy, area under 172
the ROC curve, true positive rate, true negative rate, precision and F-measure are shown in Table 173
2. Each line of the table shows the measures for each of the four possible class values, and the 174
last line shows the weighted average of the measures, representing the overall values for the 175
classification task. 176
Antimicrobial tests 177
After an incubation period, an inhibition area was visualized surrounding the paper disks 178
containing Colossomin-C and Colossomin-D at Staphylococcus aureus and Escherichia coli 179
cultures. The average lengths of inhibition zones formed by Colossomim-C and Colossomim-D 180
at bacteria cultures are represented at Table 3. The parental peptide did not form inhibition disks 181
at any bacteria culture after the incubation. 182
Statistical analysis 183
After statistical analyses the synthetic peptide Colossomin-C demonstrated significant 184
difference (p-value = 0.0147) when compared with the inhibition disks showed at Escherichia 185
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
9
coli and Staphylococcus aureus plates by Colossomin-D. In both cases there was not possible to 186
reject the null hypothesis for any significant level above 1,5%. 187
188
CONCLUSIONS 189
The results of this study show that the use of decision trees to evaluate the antimicrobial 190
activity of synthetic peptides enables the creation of more effective models for use in the 191
development of new drugs using known peptides as scaffold for designing new compounds and 192
reducing costly and time-consuming research. As demonstrated at this study and shown in 193
previous works (5, 13), the development of algorithms for decision tree models is an efficient 194
tool for predicting the antimicrobial activity and construction of peptides for various therapeutic 195
uses. 196
The inhibitory activity shown in the Colossomin-C and Colossomin-D assays, submitted to in 197
vitro tests with Staphylococcus aureus and Escherichia coli, provides the bases for a series of 198
possible studies on the importance of these antimicrobial peptides and their mechanism of action. 199
It also increase the possibility of use these synthetic antimicrobial peptides as important 200
biotechnological products for treating multidrug resistant pathogens. The antimicrobial activity 201
demonstrated by Colossomin-C and Colossomin-D against Staphylococcus aureus shows that 202
these peptides are more efficient in inhibiting the growth of Gram-positive bacteria than that of 203
Gram-negative bacteria Escherichia coli. 204
The results indicate that the induction of the amphiphilic behaviour and the positive charge are 205
responsible for destabilization of the membrane surface, and may induct pores by the partial or 206
total insertion of the hydrophobic portion(18). Based on the methodology used in this study, and 207
the possibility of predicting the antimicrobial activity of synthetic peptides created by site-208
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
10
targeted mutations, it is affirmed that this methodological pipeline has great value in the 209
discovery and development of new natural antibiotics using known peptides. 210
On the other hand, this workflow may highlight the efficiency of two synthetic peptides as 211
promising antimicrobial agents for treating Gram-positive infections, deserving more attention 212
on these new methodologies for the discovery of new drugs. 213
214
ACKNOWLEDGMENTS 215
We thank to Dr. Adriano Monteiro de Castro Pimenta and Dr. Thiago Verano-Braga for help 216
with the peptide synthesis at Laboratorio de Venenos e Toxinas Animais – LVTA, UFMG. We 217
thank CNPq, CAPES, and FAPEAM for financial support. 218
219
220
REFERENCES 221
1. Araújo-Lima, C., and M. Goulding. 1998. Os frutos do Tambaqui: Ecologia, 222
conservação e cultivo na Amazônia. Volumen 4 de Estudos do Mamirauá, Tefé - 223
Amazonas. 224
2. Boman, H., D. Wade, I. Boman, B. Wahlin, and R. Merrifield. 1989. Antibacterial and 225
antimalarial properties of peptides that are cecropin-melittin hybrids. FEBS Letters 226
259:103–106. 227
3. Bordo, D., and P. Argos. 1991. Suggestions for safe residue substitutions in site-directed 228
mutagenesis. Journal of Molecular Biology 217:721–729. 229
4. Chan, W., and P. D. White. 2000. Fmoc solid phase peptide synthesis: a practical 230
approachilustrada,. Oxford University Press. 231
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
11
5. Dathe, M., and T. Wieprecht. 1999. Structural features of helical antimicrobial peptides: 232
their potential to modulate activity on model membranes and biological cells. Biochimica 233
et biophysica acta 1462:71–87. 234
6. Deslouches, B., S. M. Phadke, V. Lazarevic, M. Cascio, K. Islam, R. C. Montelaro, 235
and T. A. Mietzner. 2005. De novo generation of cationic antimicrobial peptides: 236
influence of length and tryptophan substitution on antimicrobial activity. Antimicrobial 237
agents and chemotherapy. Am Soc Microbiol 49:316–322. 238
7. Gordon, Y. J., E. G. Romanowski, and A. M. McDermott. 2005. A review of 239
antimicrobial peptides and their therapeutic potential as anti-infective drugs. Current eye 240
research. Informa UK Ltd UK 30:505–515. 241
8. Hancock, R. E. W., and M. G. Scott. 2000. The role of antimicrobial peptides in animal 242
defenses. Proceedings of the National Academy of Sciences of the United States of 243
America 97:8856–61. 244
9. Kingsford, C., and S. L. Salzberg. 2008. “What are decision trees?” Nature 245
Biotechnology 26:1011–1013. 246
10. Kourie, J. I., and a a Shorthouse. 2000. Properties of cytotoxic peptide-formed ion 247
channels. American journal of physiology. Cell physiology 278:C1063–87. 248
11. Landon, C., F. Barbault, M. Legrain, L. Menin, M. Guenneugues, V. Schott, F. 249
Vovelle, and J. L. Dimarcq. 2004. Lead optimization of antifungal peptides with 3D 250
NMR structures analysis. Protein Science. Wiley Online Library 13:703–713. 251
12. Lata, S., B. K. Sharma, and G. P. S. Raghava. 2007. Analysis and prediction of 252
antibacterial peptides. BMC bioinformatics 8:263. 253
13. Lee, S. Y., S. Kim, S. S. Kim, S. J. Cha, Y. K. Kwon, B. R. Moon, and B. J. Lee. 2004. 254
Application of Decision Tree for the Classification of Antimicrobial Peptide. Genomics & 255
Informatics 2:121–125. 256
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
12
14. Marshall, S. H., and G. Arenas. 2003. Antimicrobial peptides: A natural alternative to 257
chemical antibiotics and a potential for applied biotechnology. Electronic Journal of 258
Biotechnology 6. 259
15. Patrzykat, A., J. Gallant, J. Seo, J. Pytyck, and S. E. Douglas. 2003. Novel 260
antimicrobial peptides derived from flatfish genes. Antimicrobial Agents and 261
Chemotherapy 47:2464–2470. 262
16. Quinlan, J. R. 1993. C4.5: programs for machine learning. Morgan Kaufmann Publishers. 263
17. REZENDE, S. O. 2003. SISTEMAS INTELIGENTES: FUNDAMENTOS E 264
APLICAÇÕES. Manole. 265
18. Shai, Y. 2002. Mode of action of membrane active antimicrobial peptides. Peptide 266
Science. Wiley Subscription Services, Inc., A Wiley Company 66:236–248. 267
19. Tossi, A., Tarantino Chiara, and Romeo Domenico. 1997. Design of synthetic 268
antimicrobial peptides based on sequence analogy and amphipathicity. European Journal 269
of Biochemestry 250:549–558. 270
20. Wang, G., X. Li, and Z. Wang. 2009. APD2: the updated antimicrobial peptide database 271
and its application in peptide design. Nucleic acids research 37:D933–7. 272
21. Witten, I. H., and E. Frank. 2005. Data mining: practical machine learning tools and 273
techniques. Morgan Kaufman. 274
22. Yin, Z., W. He, W. Chen, J. Yan, J. Yang, S. Chan, and J. He. 2006. Cloning, 275
expression and antimicrobial activity of an antimicrobial peptide, epinecidin-1, from the 276
orange-spotted grouper, Epinephelus coioides. Aquaculture 253:204–211. 277
278
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
Table 1 - Peptides selected for Fmoc solid phase synthesis and microbiological tests after verification of yours antimicrobial activity through the
APD2 (http://aps.unmc.edu/AP/main.php). Residues single underlined are hydrophobic and those double underlined are both hydrophobic and
located on the same peptide surface.
Peptides Amino acid sequence Molecular formula ∆AA (%) MW (Da) Charge HR (%) BI (Kcal/mol) AP
Parental peptide C - V I V V L M A Q P G E C F L G L I F H - N C99H156N22O23S2 - 2086.59 0 68 -1.73 +
Colossomin-C C - L I I I L M K K P G E C F L S L I Y H - N C107H175N23O24S2 37 2231.84 +2 57 -1.09 +
Colossomin-D C - L I V V L M K K P G E C F L S L I Y H - N C105H171N23O24S2 26 2203.78 +2 57 -1.00 +
∆AA (%) – Percent of amino acid substitutions; MW – Molecular Weight; Charge – Peptide charge; HR (%) – Hydrophobic residues; BI –
Boman Index; AP – Antimicrobial prediction: (+) antimicrobial activity, (–) no antimicrobial activity.
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
Table 3 – The average length of inhibition zone formed by Colossomim-B and
Colossomim-C at bacteria cultures.
Inhibition disk diameter (cm)
Bacteria Parental peptide Colossomin-B Colossomin-C
Staphylococcus aureus 0.0 2.9 ± 0.1 2.25 ± 0.2
Escherichia coli 0.0 1.37 ± 0.08 0.625 ± 0.04
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from
Table 2. Performance measurements for the classification of peptide activity. Here,
AUC stands for area under the ROC curve; TP stands for true positive rate; and TN
represents the true negative rate.
Class value Accuracy AUC TP TN Precision F-measure
None 0.97 1.00 0.97 0.60 0.75
Low 0.81 0.65 0.81 0.58 0.61
Medium 0.82 0.70 0.85 0.70 0.70
High 0.87 0.70 0.95 0.88 0.78
Total 0.70 0.84 0.70 0.88 0.72 0.70
on October 5, 2018 by guest
http://aem.asm
.org/D
ownloaded from