Comprehensive genome analyses of Sellimonas intestinalis, a potential 1
biomarker of homeostasis gut recovery 2
Marina Muñoza,b, Enzo Guerrero-Araya1,2, Catalina Cortés-Tapia1,2, Ángela 3
Plaza-Garrido1,2, Trevor D. Lawley3 and Daniel Paredes-Sabja1,2 * 4
a Microbiota-Host Interactions and Clostridia Research Group, Departamento de 5
Ciencias Biológicas, Facultad de Ciencias de la Vida, Universidad Andrés Bello, 6
Santiago, Chile 7
b Millennium Nucleus in the Biology of Intestinal Microbiota, Santiago, Chile 8
c Host–Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Wellcome 9
Genome Campus, Hinxton, United Kingdom 10
11
* Corresponding author: 12
Dr. Daniel Paredes-Sabja, Microbiota-Host Interactions and Clostridia Research Group, 13
Facultad de Ciencias de la Vida, Universidad Andrés Bello, República 330, Santiago, 14
Chile. Tel: 02-770-3955; e-mail: [email protected]. 15
16
Running title: Sellimonas intestinalis comparative genomics 17
18
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
Comprehensive genome analyses of Sellimonas intestinalis, a potential 19
biomarker of homeostasis gut recovery 20
Sellimonas intestinalis is a Gram positive and anaerobic bacterial species 21 previously considered as uncultivable. Although little is known about this 22 Lachnospiraceae family member, its increased abundance has been reported in 23 patients who recovered intestinal homeostasis after dysbiosis events. In this 24 context, the aim of this work was taken advantage of a culturomics protocol that 25 allowed the recovery species extremely oxygen-sensitive from faecal samples, 26 which led to the establishment of an S. intestinalis isolate. Whole genome 27 sequencing and taxonomic allocation confirmation were the base to develop 28 comparative analyses including 11 public genomes closely related. 29 Phylogeographic analysis revealed the existence of three lineages (linage-I 30 including isolates from Chile and France, linage-II from South Korea and Finland, 31 and linage-III from China and one isolate from USA). Pangenome analysis on the 32 established dataset revealed that although S. intestinalis seems to have a highly 33 conserved genome (with 50.1% of its coding potential being part of the 34 coregenome), some recombination signals were evidenced. The identification of 35 cluster of orthologous groups revealed a high number of genes involved in 36 metabolism, including amino acid and carbohydrate transport as well as energy 37 production and conversion, which matches with the metabolic profile previously 38 reported for healthy microbiota. Additionally, virulence factors and antimicrobial 39 resistance genes were found (mainly in linage-III), which could favour their 40 survival during antibiotic-induced dysbiosis. These findings provide the basis of 41 knowledge about this species with potential as a bioindicator of intestinal 42 homeostasis recovery and contribute to advance in the characterization of gut 43 microbiota members with beneficial potential. 44
Keywords: Sellimonas intestinalis; phylogenomic; gut homeostasis; extremely 45 oxygen-sensitive species 46
47
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
Introduction 48
Gut microbiota plays important roles for human and other mammalian species, 49
including: i) the maintenance of the structural integrity of the intestinal epithelial 50
barrier;1 ii) the protection against the proliferation and colonization of enteropathogens;2 51
iii) metabolites production or conversion of substances for the host;3 and iv) the 52
stimulation of normal immune system functionality.4 All these functions are determined 53
by diversity and abundance of microbial taxa that have been associated with host status 54
(e.g. heath/disease, age, geographical origin among other comparison approaches). For 55
this reason, the scientific community has been focusing its efforts on deciphering the 56
composition of the microbial communities that inhabit this ecosystem. 57
Classical techniques to detect and study microorganisms involve in vitro culture, 58
however, it is well known that most species inhabiting human gut cannot be cultured 59
under standard conditions.5 To overcome this limitation, culture-independent DNA-60
based techniques, mainly based on next-generation sequencing (NGS), have been 61
widely used to decipher almost all species at the intestinal level, that is the case of 62
targeted NGS (tNGS) which has become the most popular scheme to depicting 63
microbiota composition, thanks to the use of high-resolution markers to identify the 64
taxonomic units (bacteria as well as eukaryotes and viruses), their variation among 65
individuals or populations, and to infer phylogenetic relationships among the dominant 66
taxa.6 This approach has been complemented with shotgun metagenomics technology, 67
which also leads to describe microbiota composition, but in addition it allows to 68
assembly whole genomes of the dominant taxa and to know the total content of nucleic 69
acids present in the studied environment, which in the case of the gut, could provide 70
informative markers of specific health/disease promoting factors.7 71
Studies based on culture independent NGS have shown that Ruminococcaceae and 72
Lachnospiraceae are the most abundant Clostridial families at gastrointestinal tract of 73
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
human and other mammals.8, 9 Although species diversity and the role of these two 74
families are being studied, changes in their relative abundance have been observed in 75
dysbiosis, being positively associated with healthy groups.8 In particular, the 76
Lachnospiraceae family has gained interest during the last years due to the ecological 77
adaptations exhibited by some of its species, associated with their capability to produce 78
short-chain fatty acid (SCFA) during glucose fermentation.10 This capability attributed 79
to commensal gut bacteria in healthy individuals11 has led to propose some 80
Lachnospiraceae species as potentially beneficial gut microbiota member; however, 81
few species of this family have been comprehensively studied. 82
One of the Lachnospiraceae species recently identified and poorly studied is Sellimonas 83
intestinalis, a Gram positive and obligately anaerobic bacteria,12 initially considered as 84
part of the gut microbiota fraction that remains uncultivated for its nature extremely 85
oxygen-sensitive ‘EOS’.13 This limitation could have been the cause of the limited 86
number of studies where S. intestinalis has been detected, being almost all aimed to 87
deciphering the microbiome composition from a shotgun metagenomics approach.13, 14 88
In these studies, an increased relative abundance of S. intestinalis was detected in 89
patients which recovered their intestinal homeostasis after suffering dysbiosis caused by 90
chemotherapy treatment against colorectal cancer15 or therapeutic splenectomy of 91
patients with liver cirrhosis.16 These findings suggest the potential of S. intestinalis as a 92
biomarker candidate of gut homeostasis recovery. Conversely, punctual transversal 93
studies have detected an incremented relative abundance of S. intestinalis in individuals 94
with altered gut microbiota associated with chronic kidney disease17 and systemic-onset 95
juvenile idiopathic arthritis.18 However, there are no studies aimed at clarifying the role 96
of S. intestinalis within the intestinal microbiome. 97
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
A pivotal step to clarify the implication of S. intestinalis in host’s gut homeostasis is to 98
know their genomic organization that allow to identify the genetic bases of their 99
ecological role. However, due to in vitro culture limitations, only eight draft genomes 100
have been obtained to November of 2019 101
(https://www.ncbi.nlm.nih.gov/genome/genomes/41970), that were assembled from 102
shotgun metagenomics data. These genomes have been reported mostly from Eastern 103
countries (China and South Korea), with a single genome reported from America 104
(USA).19 105
For this reason, in this study we have employed a culturomics approach directed to 106
isolate oxygen-sensitive intestinal microbiota species that allowed recovering S. 107
intestinalis. Subsequently, a comprehensive whole genome analysis of this species was 108
carried out to identify its genomic architecture, intra-taxa diversity, genetic population 109
structure, potential metabolic profiles codifying for its genome and the presence of 110
clinically important loci, as Virulence Factor markers (VFm) and antimicrobial 111
resistance genes (AMRg), which could play a detrimental role in the colonization and 112
relative abundance of this species in the complex intestinal environment. This approach 113
represents an initial step to define the genomic bases that could support the role of this 114
species in the intestinal microbiome and their potential as a biomarker of homeostasis 115
gut recovery. 116
Results 117
Isolate establishment and biological source 118
A Gram-positive bacterial isolate with coccoid morphology (Supplementary Figure 1) 119
was stablished under the conditions to recovery of microorganisms extremely sensitive 120
to oxygen at gastrointestinal level standardized by our research group. The biological 121
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
source of this isolate was a stool sample from a 23-year-old woman that despite being 122
healthy at the time of sample collection, has a diagnosis of idiopathic rheumatoid 123
arthritis, for this reason she was under treatment with Prednisone, a synthetic 124
corticosteroid with glucocorticoid modulation, which support its anti-inflammatory 125
effect and has proven as effective and safe for treatment of patients suffering this 126
pathology.20 The individual consumes in addition Chlorella (a microalgae containing 127
omega-3 fatty acids and carotenoids with antioxidant effect that have been proposed as 128
a potential source of renewable nutrition),21 vitamin E with selenium and Korean 129
ginseng. The individual did not use any antimicrobial treatment during the six months 130
prior to the sample collection. 131
Assembly genome and taxonomic placement 132
The assembly genome obtained showed a length of 3,096,198 base pairs (bp), 133
constituted by 32 contigs with an N50 of 3 and a length of 439,526 bp. The extraction 134
and subsequent comparison of 16S-rRNA sequence revealed that the analyzed genome 135
potentially belongs to one of the following genera: Ruminococcus, Drancourtella or 136
Sellimonas (Supplementary Table 1). The search of reads of this species in ENA 137
database, allowed to find the report for one isolate, that was assembly under the same 138
conditions of the genome analyzed in this study. The analysis of 2,902 genomes of 139
Ruminococcaceae and Lachnospiraceae genomes used for the revision of Clostridiales 140
order study allowed to identify which the analyzed genome and the ENA report makes 141
part of a node well supported which included other 9 genomes of Sellimonas intestinalis 142
(Supplementary Figure 2). These 11 genomes were then considered as the S. intestinalis 143
node. Interestingly, two incongruencies in taxonomic allocation of publicly available 144
genomes were detected, being previously deposited as Ruminococcus sp. DSM-100440 145
and Drancourtella massiliensis GD1, and consistently clustered with the genome set 146
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
under study (16S-rRNA phylogenetic reconstruction (Fig. 1A) and ANI analysis (Fig. 147
1B)), that hereafter will be treated as part of S. intestinalis node. This well supported 148
node was pruned join the closet node (with 7 genomes), that included mostly 149
Drancourtella genomes, and for that was identified as Drancourtella node. Within this 150
node were also found incongruences in taxonomic allocation, being included two 151
Ruminococcus and one Pseudoflavonifractor genomes (Supplementary Figure 2). Three 152
additional representative genomes clustering in related nodes were included as 153
outgroups (Lachnosclostridium sp. An181, Eubacterium sp. P3177 and 154
Lachnosclostridium sp. An118). Under these parameters a set of 21 assemblies were 155
included in the data set for subsequent analysis. 156
The phylogenetic reconstruction based on 16S-rRNA alignment for the 21 selected 157
genomes showed that the 11 genomes previously assigned to S. intestinalis node remain 158
clustered together (Fig. 1A). These findings were compared with the ANI percentage 159
identity which was higher than 95% for all these 11 S. intestinalis genomes (Fig. 1B), 160
which led to verify that under the traditional criteria to identify microbial species from 161
whole genome data (16S-rRNA and ANI), all 11 assemblies correspond to S. 162
instestinalis (Fig. 1A and 1B). The information on the genomes included in S. 163
intestinalis is described in the Supplementary Table 2. 164
Intra-species diversity and genetic population structure 165
A preliminary BLAST comparison of 11 S. intestinalis selected assemblies revealed a 166
high level of genome conservation; however, some genome regions were differentially 167
present in groups of isolates. The map comparing the complete genomes delimited by 168
these populations is described in the Supplementary Figure 3. As next step, the 169
pangenome analysis of S. instestinalis dataset showed a codifying potential of 4,627 170
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
genes (Supplementary Table 3), which are almost equally distributed between core 171
genes (n=2,318; 50.1%) and accessory genes (n=2,309; 49.9%). 172
A phylogeographic analysis, based on a Bayesian evolutionary approach, was 173
conducted from coregenome alignment (with 22,453 positions of length) of the 11 174
sequences selected assemblies. The phylogenetic three topology revealed that S. 175
intestinalis could have diversified into at least three major linages with a possible 176
relation by geographical origin (Fig. 2A). The first linage (Linage-I) included isolates 177
from Chile and France, while the second linage (Linage-II) isolates from South Korea 178
and Finland, and the third linage (Linage-III) isolates majority from China and only one 179
from the USA. This population genetic structure was confirmed by the phylogenetic 180
network topology that showed that although there are recombination signatures 181
(indicated by reticulation events observed), the three linages detected by 182
phylogeographic analysis are divergent among their, which supports the hypothesis of 183
the existence of three main populations within this species (Fig. 2B). 184
Cluster of orthologous groups 185
To explore the coding potential of the genome set under analysis, firstly a COGs was 186
developed for both global data set (Fig. 3A) and individual isolates according to the 187
linages to them belong (Fig. 3B). The results showed that this species directs much of 188
the coding potential to essential biological processes such as transcription, translation 189
and replication. However, it can be observed that an important part of their genes could 190
be involved in metabolism, including amino acid and carbohydrate transport as well as 191
energy production and conversion (Fig. 3A). Differential profiles were detected in the 192
identified populations, finding that the linage-I and linage-II clusters have more genes 193
involved in metabolic processes, while linage-III isolates revealed profiles with more 194
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
genes involved in the cell cycle, intracellular trafficking, secretion, and vesicular 195
transport (Fig. 3B). 196
Virulence factors and antimicrobial resistance genes 197
Considering that about half of the genes coding for this species are part of the accessory 198
genome, we inspect the genes differentially transported by the lineages detected (Fig. 199
4A). This analysis allowed to identify that the clustering in three populations is 200
maintained in the phylogenetic reconstruction based on the accessory genome, as was 201
found in coregenome phylogenetic analysis (Fig. 2A). 202
VFm and AMRg are important loci for survival of bacterial species because could 203
modulate changes in their abundance under different biological contexts and the 204
subsequent transmission dynamics between hosts (Fig. 4B). Although the exhaustive 205
search from both assemblies and reads revealed that the isolate from Chile (linage-I) not 206
carry known VFm nor AMRg, the extended search of this loci from assemblies included 207
in the comparative dataset reveled that the other genome clustering in the same linage-I 208
from France carry rpoB2 marker, associated with resistance to rifampin resistance. 209
rpoB2 was found in all other 9 evaluated genomes. tet(M) marker (associated with 210
tetracycline resistance) was the only one additional marker found in linage-II, being 211
transported by the isolate from Finland. Interestingly, linage-III exhibited the greatest 212
amount of AMRg, being found from 2 to 5 (in the case of AF14-9AC from China) 213
genes per genome. Among the genes with higher frequency were found: tet elements 214
(tet32 and tet(O)), present in five and four genomes, respectively, and cfr(C)_2 215
(conferring linezolid resistance) present in two genomes. In addition, (AGly)Aac6-216
Aph2, associated with aminoglycoside drug class resistance, ermB conferring 217
macrolide-lincosamide-streptogramin antibiotic resistance, and lnuA associated to 218
lincosamide resistance, all of these present in a single genome each one. 219
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
Discussion 220
Recent studies based on amplicon-based sequencing and shotgun metagenomics have 221
contributed to the description of diversity and abundance of gut microbial communities 222
22 and it has even been possible to propose associations with host states 23 and make 223
inferences about the possible functions of specific members of this complex ecological 224
network.24 However, genomic characterization of gut microbiota members represents a 225
challenge to deciphering the genetic bases supporting the biological function of 226
microbial species inhabiting gut, being an essential initial step their recover by in vitro 227
culture, that have an increased complexity for EOS species.25 For this reason, this work 228
describes the isolation and genomic features of S. intestinalis, a understudied 229
Lachnospiraceae species recovered during a culturomics approach directed to recover 230
EOS species within microbiome environment. 231
During a genomic characterization study is essential a precise taxonomic allocation of 232
target genomes and those included in the comparative dataset to avoid mistakes in the 233
biological inferences. In this study, inconsistencies in taxonomic classification were 234
detected at different levels: i) in the allocation of species to families with little 235
phylogenetic relationship, as is the case of Clostridium difficile that had been included 236
within the Clostridiaceae family, but after detailed analysis of the Phylogenetic 237
relationships were classified within the Peptostreptococcaceae family,26 or ii) in the 238
taxonomic assignment of individuals, as revealed even before this work for S. 239
intestinalis, which in other works had previously been detected as Ruminococcus, but 240
later of the sequencing of its complete genome, it was correctly assigned.13 These types 241
of findings reveal limitations in the traditional analysis schemes of complete genome 242
data and make clear the need for further studies that lead to clarify the classification of 243
under-studied anaerobic families. 244
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
The study of genetic population structure represents an important tool to determine the 245
population sizes, dispersal potential and evolutionary rates lead over geographical scales 246
during characterization of a microbial species.27 For the case of S. intestinalis, the 247
limited number of isolates analyzed represents a limitation, for example, some 248
remarkable profiles were identified, as is the case of the Sin-II and Sin-III lineages in 249
which the isolates that comprise them show a slight degree of divergence (Fig. 2A and 250
2B), so that the increased number of individuals could lead to the identification of 251
independent populations. These findings were supported by pangenome results that 252
revealed that despite the limited number of genes of core genome (n=2,318, 50.1%), 253
this could be a first indicator of the high intra-taxa diversity of this species. This type of 254
finding has been detected in species such as Pseudomonas aeruginosa,28 a species of 255
interest in health that exhibits a high frequency of gene loss and gain. The pangenome 256
data also allowed the evaluation of phylogenetic relationships from the coregenome 257
(Fig. 2A), which led to the detection of three potential linages, which were subsequently 258
ratified through the construction of phylogenetic networks (Fig. 2B), which showed that 259
despite the potential recombination events, supported by crosslinking in the networks, 260
these populations could be highly divergent among them. Interestingly, possible 261
common geographical origins were identified among the observed populations, with the 262
exception of the USA isolate that clustering with isolates from China (Sin-III linage), 263
that could be attributed to migration of human population, as has been identified for 264
other pathogens.29 265
The effect of specific members over gut microbiome composition have been attributed 266
mainly to their metabolic profiling in which some subproducts can stimulate specific 267
process in the complex gut environment.30 For that reason, the metabolic profiling of S. 268
intestinalis from whole genome data using COGs analysis was determinant to 269
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
deciphering that the genes required for the survival of the bacteria (Fig. 3A), the genes 270
associated with amino acid and carbohydrates transport and metabolism were highly 271
frequent in this species. The individual analyzes grouped by the detected populations, 272
showed that the isolates of oriental origin (linage-III), codify a greater number of genes 273
of AMRg than the other geographical origins. These results are of importance, because 274
these types of genes are determinants for the realization of metabolic processes that lead 275
to the production of short-chain fatty acid (SCFA) during glucose fermentation,10 276
mainly butyric acid,31 which It has been proposed as one of the characteristics of the 277
candidate members for healthy microbiota.11 In the specific case of the patient from 278
whom the isolate studied in this work was obtained despite the immunosuppression 279
caused as a possible consequence of the anti-inflamatory effect of Prednisone (the 280
corticosteroid consumed as treatment of rheumatoid arthritis)20, the healthy lifestyle 281
habits and the consumption of substances with potential restorative effect of the gut 282
microbiota as Chlorella (the microalgae consumed by the donor individual because has 283
shown potential as antioxidant and treatment for different health conditions)21, they 284
could contribute to the restored effect of the Microbiota, and the presence of S. 285
intestinallis could then be complying with the hypothesis as a biomarker of the recovery 286
of intestinal homeostasis. 287
The isolation and characterization of microbiota members contribute to deciphering the 288
genome bases of their effect in the gut microbial ecology,32 as well as to detect 289
members that potentially play a role as reservoirs of antibiotic resistance.33 In the 290
particular case of S. intestinalis, several genes associated with antibiotic resistance, such 291
as rpoB2 in most of the analyzed isolates, and mobile genetic elements, such as the 292
family of tet AMRg (Fig. 4). These findings could represent the basis for the survival of 293
this species at the intestinal level, despite the adverse conditions that this niche naturally 294
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
represents. Additionally, it could explain its role as a biomarker, after the presentation 295
and restoration of homeostasis after dysbiosis generated by different causes. The 296
relevance of these findings should be subsequently confirmed through the application of 297
phenotypic tests that lead to identify the minimum inhibitory concentrations at which 298
the proliferation of this species is inhibited in vitro. 299
Considering the differential presence of genes biological and clinically relevant among 300
the three S. intestinalis linages found in this work, is necessary to develop future studies 301
conducting to develop a molecular typing method to quickly identify isolates and which 302
contributes to clarifying the phylogenetic relationships and evolutionary history of this 303
species. Additionally, taking into account that this study was aimed at analyzing data 304
from complete genomes of S. intestinalis and no phenotypic tests were performed, it is 305
necessary to carry out further studies that lead to identify the impact of the expression of 306
these VFm and AMRg and their potential role in modulation of the relative abundance 307
of this species under different biotic contexts. Despite this limitation, the identification 308
of these markers could support the hypothesis that some members of the microbiota 309
could fulfill resistance reservoir-function from which bacterial pathogens can acquire 310
resistance is the human gut microbiota,13 generating interest at the health level. This 311
approach represents the first step conducing to the genomic bases that support S. 312
intestinalis survival under conditions of dysbiosis and subsequent proliferation after the 313
homeostasis reestablishment that could play an important role in maintaining the 314
optimal conditions for host development. 315
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
Materials and Methods 316
Sample collection 317
A culturomics approach was applied to stool samples from adult Chilean individuals, 318
within the framework of the project Millennium Nucleus in the Biology of Intestinal 319
Microbiota, aimed at detecting and characterizing the microorganisms that make up the 320
intestinal microbiota of healthy individuals in Latin America, developed by our research 321
team. This project was approved by Comité de Bioética de la Facultad de Ciencias de 322
Vida, Universidad Andrés Bello, through the act 013-2017. All patients enrolled in this 323
study agreed to participate and signed an informed consent form. 324
Bacterial isolate recovery 325
This approach involved the optimization of a protocol for Extremely Oxygen-Sensitive 326
(EOS) intestinal bacteria isolation as follow: stool samples (collected in sterile 327
containers without preservation media) were processed within the first 72 hours. Next, 328
the samples were mechanically homogenized and divided into two fractions that were 329
treated independently. The first (approximately 50%), was washed with 100% ethanol 330
to reach 70% (w/v) and incubated for 4 hours in anaerobiosis. The biological material 331
was then precipitated by centrifugation, to discard the ethanol, and then washed twice 332
with sterile molecular grade water. The second fraction of the sample was processed 333
without washing. The two fractions were weighed and then independently resuspended 334
in sterile 1X PBS (1mL per 100mg of feces), to be then serially diluted (from 10-1 to 10-335
5 for the sample washed with ethanol and from 10-1 to 10-8 for the sample processed 336
directly). Each dilution of the two treatments was seeded in duplicate on the complex 337
and broad-range YCFA medium,34 in two formats: traditional or supplemented with 338
taurocholate (Winckler) (0.1% v/v). Finally, they were incubated for 72-96 hours at 37 339
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
°C under anaerobic conditions. The manipulation and incubation of samples were 340
conducted in an anaerobic chamber Bactron EZ2 (ShellLab). 341
The colony forming units (CFU) obtained were streaked on YCFA plates, and after 24-342
48 hours of incubation under the conditions described, and then their quality and 343
morphology were evaluated by classical microbiological techniques (macro and 344
microscopic observation). The verified colonies were propagated in liquid YCFA 345
medium to increase their biomass to establish the isolates, using the same incubation 346
conditions. This isolate was named 6K002. 347
DNA extraction and whole genome sequencing (WGS) 348
The biomass recovered from isolate incubation in broth medium was subjected to DNA 349
extraction using the commercial kit Wizard® Genomic DNA Purification Kit (Promega 350
Corporation, Madison, WI, USA), following the manufacturer's recommendations. The 351
DNA sequencing was carried out by Wellcome Trust Sanger Institute on an Illumina 352
HiSeq 2000 platform, with a read length of 100 bp, under the “developing and 353
implementing an institute-wide data sharing policy following conditions”.35 354
Genome assembly and quality control verification 355
The reads obtained from WGS were De novo assembled using Unicycler v0.4.8, an 356
assembly pipeline for bacterial genomes defined as a SPAdes-optimiser (Spades 357
v3.13.1) which generates the best possible assembly,36 using parameters by default. The 358
quality of the genome assembly was evaluated using the GenomeQC_Filter_v1-5 script, 359
37 which considers as parameters the maximum number of contigs per genome (fixed to 360
400) and a maximum size of each genome (considering 8 Mbp) and then extracts the 361
small subunit 16S rRNA gene sequences (16S-rRNA). 362
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
Taxonomic placement and data retrieval 363
Initially, the 16S-rRNA sequence previously extracted was used to search sequence 364
similarity against the data available in public datasets using the BLASTn algorithm 365
[39], results that were subsequently verified by 16S-rRNA sequence alignment using 366
SILVA Incremental Aligner (SINA) service.38 367
Next, a dataset with 2,902 Ruminococcaceae and Lachnospiraceae genome assemblies, 368
publicly available in PATRIC,39, 40 ENA41 and NCBI42 databases and which passed the 369
assembly quality test previously described, were analyzed to identify the genomes most 370
related with the analyzed assembly. This dataset makes part of a parallel work of our 371
research team directed to evaluate the phylogenetic relationships of Clostridiales order. 372
Parallelly, a search of reads for ‘Sellimonas’ genus was conducted in the European 373
Nucleotide Archive (https://www.ebi.ac.uk/ena/data/search?query=Sellimonas), with 374
the aim of recovering the greatest number of genomes for analysis. The obtained reads 375
were subject to the genome assembly and quality control verification methodology 376
describe in the previous section. 377
The complete genome dataset was the base to select the node closely related to the 378
analyzed genomes, throughout phylogenetic reconstruction based on 16S-rRNA 379
sequence under the parameters described in the corresponded section. The set of 380
assemblies selected were subjected to a step of delimiting species using average identity 381
of nucleotides (ANI),43 using pyANI 0.2.10, a Python3 module and script that provides 382
support for calculating average nucleotide identity (ANI) and related measures for 383
whole genome comparisons, and rendering relevant graphical summary output 384
(https://github.com/widdowquinn/pyani).44 pyANI analyses was developed using blast 385
and other settings by default. Scores of ANI higher than 95.0%, were used to verify that 386
the genomes belong to the same species. 387
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
A graphical map of the genome assemblies identified as belonging to the same species 388
of studied genome, was built in the CGview server,45 where a comparison was made in 389
pairs to identify the differences between the genomes, using the tool based on the 390
BLAST algorithm, included inside the server. 391
Annotation and pangenome analysis 392
An automated annotation pipeline was applied to the complete set of evaluated 393
genomes. This pipeline is based on Prokka v1.13,46 as follows: Infernal v1.1.247 was run 394
to predict RNA structures, followed by an analysis in Prodigal v2.6.348 to predict 395
proteins. Aragorn v1.2.3849 was used to predict tRNAs and tmRNAs, and Rnammer50 396
was used to predict ribosomal RNAs. All predicted genes were then annotated 397
throughout databases search following this order: genus specific databases were 398
generated by retrieving the annotation from RefSeq.51 The protein sequences were then 399
merged using CD-hit version 4.8.152 to produce a non-redundant blast protein database. 400
Next UniprotKB/SwissProt53 was searched, considering kingdom specific databases for 401
Bacteria. The complete set of genomes evaluated was submitted to the aforementioned 402
annotation pipeline. 403
As a next step, the pangenome was determined using the Roary tool version 3.11.2,54 404
taking as definition of coregenome a percentage identity of 95% using Protein-Protein 405
BLAST 2.9.0+ and the presence in 99% of the analyzed genomes. 406
Phylogeographic analyses 407
The phylogenetic relationships among Ruminococcaceae and Lachnospiraceae 408
assemblies was evaluated to identify the data most closely related with the studied 409
genome. For that, the 16S-rRNA sequences extracted during the quality control 410
verification step were aligned using MAFFT v7.40755 using parameters by default and 411
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
then an approximately-maximum-likelihood phylogenetic tree was built in FastTree 412
double precision version 2.1.1056 with settings by default. The robustness of the nodes 413
was evaluated using the Bootstrap method (BT, with 1,000 replicates). 414
After the definition of dataset to analyze, the phylogenetic relationships among isolates 415
were evaluated using a Bayesian evolutionary approach based on Markov Chain Monte 416
Carlo (MCMC) implemented in Beast v1.10.457 from the pangenome alignment (with a 417
length of 22,453 nucleotides) of the 11 sequences selected assemblies. GTR substitution 418
model was chosen as the best model in jModelTest v0.1.1,58 an uncorrelated relaxed 419
clock model and the skyline population model, were considered initial parameters. 420
Twenty independent MCMC was carried out, each with a chain length of 100,000,000 421
states and resampling every 10,000 states. Log files were summarized with Tree 422
Annotator v2.4.8 [44] using 10% burning. The effective sample size (ESS) were >200 423
for all parameters; convergence and mixing were assessed using trace plot in Tracer 424
v1.7.1.59 Tree files generated were summarized with Tree Annotator v2.4.860 using 10% 425
burning, with maximum clade credibility and node heights at the heights of common 426
ancestors. A node dating step was conducted using isolate metadata (date of isolate and 427
geographic origin). The graphic visualization of all phylogenetic trees was obtained in 428
the web tool Interactive Tree of Life V3 (http://itol.embl.de).61 Additionally, 429
phylogenetic networks were conducted with the aim to detect recombination signatures 430
in the analyzed population. This analyses were carried out in SplitsTree5,62 using 431
neighbor-net method. 432
Codifying potential of S. intestinalis genome 433
The annotation outputs were additionally used to identify Clusters of Orthologous 434
Groups (COG) using eggNOG-mapper v2 under default settings, a tool for fast 435
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
functional annotations of sequence collections.63 The COG categories were 436
subsequently were represented in a histogram. 437
Virulence factor markers (VFm) and antimicrobial resistance genes (AMRg) were 438
identified from whole genome assembles using Abricate 0.8.4 439
(https://github.com/tseemann/abricate), making BLAST against the sequences 440
previously reported in the following databases: CARD (1,749 sequences, last update: 441
Jul 8, 2017),64 Resfinder (1,749 sequences, last update: Jul 8, 2017),65 NCBI (1,749 442
sequences, last update: Jul 8, 2017),66 ARG-ANNOT (1,749 sequences, last update: Jul 443
8, 2017),67 VFDB (1,749 sequences, last update: Jul 8, 2017)68 and PlasmidFinder 444
(1,749 sequences, last update: Jul 8, 2017)69. Minimum DNA identity of 75% was used 445
as detection thresholds. As a confirmation step about VFm and AMR presence, Ariba 446
(Antimicrobial Resistance Identification By Assembly) version 2.070 was run from reads 447
of the studied isolate. 448
Acknowledgements 449
The authors thank Wellcome Trust Sanger Institute, in particular to the core library and 450
sequencing teams for whole genome sequencing of Sellimonas intestinalis GK002 and 451
pathogen informatics team for the use of several automated pipelines during the 452
processing and analysis of the whole genome sequence data. 453
Financial support 454
This work was supported by: i) EULac project ‘Genomic Epidemiology of Clostridium 455
difficile in Latin America’ (T020076); ii) Fondo Nacional de Ciencia y Tecnología de 456
Chile (FONDECYT Grant 1191601); iii) Fondo de Fomento al Desarrollo Científico y 457
Tecnológico (FONDEF) ID18|10230 to M.P-G and D.P-S and iv) Millennium Science 458
Initiative of the Ministry of Economy, Development and Tourism to D.P-S. 459
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
References 460
1. Barko PC, McMichael MA, Swanson KS, Williams DA. The Gastrointestinal 461 Microbiome: A Review. J Vet Intern Med 2018; 32:9-25. 462 2. Rinninella E, Raoul P, Cintoni M, Franceschi F, Miggiano GAD, Gasbarrini A, 463 et al. What is the Healthy Gut Microbiota Composition? A Changing Ecosystem across 464 Age, Environment, Diet, and Diseases. Microorganisms 2019; 7. 465 3. Blaut M. Ecology and physiology of the intestinal tract. Curr Top Microbiol 466 Immunol 2013; 358:247-72. 467 4. Eberl G. The microbiota, a necessary element of immunity. C R Biol 2018; 468 341:281-3. 469 5. Carabeo-Perez A, Guerra-Rivera G, Ramos-Leal M, Jimenez-Hernandez J. 470 Metagenomic approaches: effective tools for monitoring the structure and functionality 471 of microbiomes in anaerobic digestion systems. Appl Microbiol Biotechnol 2019; 472 103:9379-90. 473 6. Fraher MH, O'Toole PW, Quigley EM. Techniques used to characterize the gut 474 microbiota: a guide for the clinician. Nat Rev Gastroenterol Hepatol 2012; 9:312-22. 475 7. Mitchell SL, Simner PJ. Next-Generation Sequencing in Clinical Microbiology: 476 Are We There Yet? Clin Lab Med 2019; 39:405-18. 477 8. Lozupone CA, Stombaugh J, Gonzalez A, Ackermann G, Wendel D, Vazquez-478 Baeza Y, et al. Meta-analyses of studies of the human microbiota. Genome Res 2013; 479 23:1704-14. 480 9. Suchodolski JS. Intestinal microbiota of dogs and cats: a bigger world than we 481 thought. Vet Clin North Am Small Anim Pract 2011; 41:261-72. 482 10. Meehan CJ, Beiko RG. A phylogenomic view of ecological specialization in the 483 Lachnospiraceae, a family of digestive tract-associated bacteria. Genome Biol Evol 484 2014; 6:703-13. 485 11. Brestoff JR, Artis D. Commensal bacteria at the interface of host metabolism 486 and the immune system. Nat Immunol 2013; 14:676-84. 487 12. Seo B, Yoo JE, Lee YM, Ko G. Sellimonas intestinalis gen. nov., sp. nov., 488 isolated from human faeces. Int J Syst Evol Microbiol 2016; 66:951-6. 489 13. Versluis D, de JBGT, Zoetendal EG, Passel M, Smidt H. High throughput 490 cultivation-based screening on porous aluminum oxide chips allows targeted isolation 491 of antibiotic resistant human gut bacteria. PLoS One 2019; 14:e0210970. 492 14. Sun Y, Chen Q, Lin P, Xu R, He D, Ji W, et al. Characteristics of Gut 493 Microbiota in Patients With Rheumatoid Arthritis in Shanghai, China. Front Cell Infect 494 Microbiol 2019; 9:369. 495 15. Kong C, Gao R, Yan X, Huang L, He J, Li H, et al. Alterations in intestinal 496 microbiota of colorectal cancer patients receiving radical surgery combined with 497 adjuvant CapeOx therapy. Sci China Life Sci 2019; 62:1178-93. 498 16. Liu Y, Li J, Jin Y, Zhao L, Zhao F, Feng J, et al. Splenectomy Leads to 499 Amelioration of Altered Gut Microbiota and Metabolome in Liver Cirrhosis Patients. 500 Front Microbiol 2018; 9:963. 501 17. Lun H, Yang W, Zhao S, Jiang M, Xu M, Liu F, et al. Altered gut microbiota 502 and microbial biomarkers associated with chronic kidney disease. Microbiologyopen 503 2019; 8:e00678. 504 18. Dong YQ, Wang W, Li J, Ma MS, Zhong LQ, Wei QJ, et al. Characterization of 505 microbiota in systemic-onset juvenile idiopathic arthritis with different disease 506 severities. World J Clin Cases 2019; 7:2734-45. 507
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
19. Poyet M, Groussin M, Gibbons SM, Avila-Pacheco J, Jiang X, Kearney SM, et 508 al. A library of human gut bacterial isolates paired with longitudinal multiomics data 509 enables mechanistic microbiome research. Nat Med 2019; 25:1442-52. 510 20. Krasselt M, Baerwald C. Efficacy and safety of modified-release prednisone in 511 patients with rheumatoid arthritis. Drug Des Devel Ther 2016; 10:1047-58. 512 21. Barkia I, Saari N, Manning SR. Microalgae for High-Value Products Towards 513 Human Health and Nutrition. Mar Drugs 2019; 17. 514 22. Laudadio I, Fulci V, Palone F, Stronati L, Cucchiara S, Carissimi C. 515 Quantitative Assessment of Shotgun Metagenomics and 16S rDNA Amplicon 516 Sequencing in the Study of Human Gut Microbiome. OMICS 2018; 22:248-54. 517 23. Lynch SV, Pedersen O. The Human Intestinal Microbiome in Health and 518 Disease. N Engl J Med 2016; 375:2369-79. 519 24. Gillings MR, Paulsen IT, Tetu SG. Ecology and Evolution of the Human 520 Microbiota: Fire, Farming and Antibiotics. Genes (Basel) 2015; 6:841-57. 521 25. Lagier JC, Dubourg G, Million M, Cadoret F, Bilen M, Fenollar F, et al. 522 Culturing the human microbiota and culturomics. Nat Rev Microbiol 2018; 16:540-50. 523 26. Galperin MY, Brover V, Tolstoy I, Yutin N. Phylogenomic analysis of the 524 family Peptostreptococcaceae (Clostridium cluster XI) and proposal for reclassification 525 of Clostridium litorale (Fendrich et al. 1991) and Eubacterium acidaminophilum (Zindel 526 et al. 1989) as Peptoclostridium litorale gen. nov. comb. nov. and Peptoclostridium 527 acidaminophilum comb. nov. Int J Syst Evol Microbiol 2016; 66:5506-13. 528 27. Gerstein AC, Jean-Sebastien M. Small is the new big: assessing the population 529 structure of microorganisms. Mol Ecol 2011; 20:4385-7. 530 28. Kung VL, Ozer EA, Hauser AR. The accessory genome of Pseudomonas 531 aeruginosa. Microbiol Mol Biol Rev 2010; 74:621-41. 532 29. Motayo BO, Oluwasemowo OO, Olusola BA, Opayele AV, Faneye AO. 533 Phylogeography and evolutionary analysis of African Rotavirus a genotype G12 reveals 534 district genetic diversification within lineage III. Heliyon 2019; 5:e02680. 535 30. Lin L, Zhang J. Role of intestinal microbiota and metabolites on gut homeostasis 536 and human diseases. BMC Immunol 2017; 18:2. 537 31. Fu X, Liu Z, Zhu C, Mou H, Kong Q. Nondigestible carbohydrates, butyrate, 538 and butyrate-producing bacteria. Crit Rev Food Sci Nutr 2019; 59:S130-S52. 539 32. Suzuki TA, Worobey M. Geographical variation of human gut microbial 540 composition. Biol Lett 2014; 10:20131037. 541 33. van Schaik W. The human gut resistome. Philos Trans R Soc Lond B Biol Sci 542 2015; 370:20140087. 543 34. Browne HP, Forster SC, Anonye BO, Kumar N, Neville BA, Stares MD, et al. 544 Culturing of 'unculturable' human microbiota reveals novel taxa and extensive 545 sporulation. Nature 2016; 533:543-6. 546 35. Dyke SO, Hubbard TJ. Developing and implementing an institute-wide data 547 sharing policy. Genome Med 2011; 3:60. 548 36. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial 549 genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 550 13:e1005595. 551 37. Gualtero SM, Abril LA, Camelo N, Sanchez SD, Davila FA, Arias G, et al. 552 [Characteristics of Clostridium difficile infection in a high complexity hospital and 553 report of the circulation of the NAP1/027 hypervirulent strain in Colombia]. Biomedica 554 : revista del Instituto Nacional de Salud 2017; 37:466-72. 555
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
38. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA 556 ribosomal RNA gene database project: improved data processing and web-based tools. 557 Nucleic acids research 2013; 41:D590-6. 558 39. Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. 559 PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic acids 560 research 2014; 42:D581-91. 561 40. Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. 562 Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis 563 Resource Center. PATRIC 3.5.4. Search criteria: Genomes/Clostridium 2017. 564 41. Silvester N, Alako B, Amid C, Cerdeno-Tarraga A, Clarke L, Cleland I, et al. 565 The European Nucleotide Archive in 2017. Nucleic acids research 2018; 46:D36-D40. 566 42. Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, et al. 567 Database resources of the National Center for Biotechnology Information. Nucleic acids 568 research 2000; 28:10-4. 569 43. Figueras MJ, Beaz-Hidalgo R, Hossain MJ, Liles MR. Taxonomic affiliation of 570 new genomes should be verified using average nucleotide identity and multilocus 571 phylogenetic analysis. Genome announcements 2014; 2. 572 44. Richter M, Rossello-Mora R. Shifting the genomic gold standard for the 573 prokaryotic species definition. Proc Natl Acad Sci U S A 2009; 106:19126-31. 574 45. Grant JR, Stothard P. The CGView Server: a comparative genomics tool for 575 circular genomes. Nucleic acids research 2008; 36:W181-4. 576 46. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 577 30:2068-9. 578 47. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. 579 Bioinformatics 2013; 29:2933-5. 580 48. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: 581 prokaryotic gene recognition and translation initiation site identification. BMC 582 bioinformatics 2010; 11:119. 583 49. Laslett D, Canback B. ARAGORN, a program to detect tRNA genes and 584 tmRNA genes in nucleotide sequences. Nucleic acids research 2004; 32:11-6. 585 50. Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW. 586 RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic acids 587 research 2007; 35:3100-8. 588 51. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences 589 (RefSeq): current status, new features and genome annotation policy. Nucleic acids 590 research 2012; 40:D130-5. 591 52. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-592 generation sequencing data. Bioinformatics 2012; 28:3150-2. 593 53. UniProt C. The universal protein resource (UniProt). Nucleic acids research 594 2008; 36:D190-5. 595 54. Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MT, et al. Roary: 596 rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015; 31:3691-3. 597 55. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 598 7: improvements in performance and usability. Molecular biology and evolution 2013; 599 30:772-80. 600 56. Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution 601 trees with profiles instead of a distance matrix. Molecular biology and evolution 2009; 602 26:1641-50. 603
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
57. Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. 604 Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus 605 Evol 2018; 4:vey016. 606 58. Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new 607 heuristics and parallel computing. Nat Methods 2012; 9:772. 608 59. Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior 609 Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst Biol 2018; 67:901-4. 610 60. Bouckaert R, Heled J, Kuhnert D, Vaughan T, Wu CH, Xie D, et al. BEAST 2: a 611 software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014; 612 10:e1003537. 613 61. Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the 614 display and annotation of phylogenetic and other trees. Nucleic acids research 2016; 615 44:W242-5. 616 62. Huson DH, Bryant D. Application of phylogenetic networks in evolutionary 617 studies. Molecular biology and evolution 2006; 23:254-67. 618 63. Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, et al. 619 eggNOG 4.5: a hierarchical orthology framework with improved functional annotations 620 for eukaryotic, prokaryotic and viral sequences. Nucleic acids research 2016; 44:D286-621 93. 622 64. Jia B, Raphenya AR, Alcock B, Waglechner N, Guo P, Tsang KK, et al. CARD 623 2017: expansion and model-centric curation of the comprehensive antibiotic resistance 624 database. Nucleic acids research 2017; 45:D566-D73. 625 65. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, et al. 626 Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother 627 2012; 67:2640-4. 628 66. Feldgarden M, Brover V, Haft DH, Prasad AB, Slotta DJ, Tolstoy I, et al. Using 629 the NCBI AMRFinder Tool to Determine Antimicrobial Resistance Genotype-630 Phenotype Correlations Within a Collection of NARMS Isolates. bioRxiv 2019:550707. 631 67. Gupta SK, Padmanabhan BR, Diene SM, Lopez-Rojas R, Kempf M, Landraud 632 L, et al. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes 633 in bacterial genomes. Antimicrob Agents Chemother 2014; 58:212-20. 634 68. Chen L, Zheng D, Liu B, Yang J, Jin Q. VFDB 2016: hierarchical and refined 635 dataset for big data analysis--10 years on. Nucleic acids research 2016; 44:D694-7. 636 69. Carattoli A, Zankari E, Garcia-Fernandez A, Voldby Larsen M, Lund O, Villa L, 637 et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid 638 multilocus sequence typing. Antimicrob Agents Chemother 2014; 58:3895-903. 639 70. Hunt M, Mather AE, Sanchez-Buso L, Page AJ, Parkhill J, Keane JA, et al. 640 ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. 641 Microb Genom 2017; 3:e000131. 642
643
644
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint
Figure legends 645
Figure 1. Taxonomic allocation analyses of studied genome using a phylogenomic 646
approach. A) Phylogenetic reconstruction based on 16S-rRNA alignment for the 21 647
selected genomes. Sequences were aligned using MAFFT55 and then an approximately-648
maximum-likelihood phylogenetic tree was built in FastTree double precision version 649
2.1.10.56 Interactive Tree of Life V3 (http://itol.embl.de) was used for the graphic 650
visualization.61 Red dots represent Bootstrap ≥ 90.0. B) Average Nucleotide Identity 651
(ANI) analysis for the selected dataset. Two genomes with ANI results higher than 95% 652
belong to the same microbial species. The analysis was developed using pyANI 653
(https://github.com/widdowquinn/pyani). 654
655
1009590
75
50
ANI percentage identity
25
0
B.A.
Sellimonas intestinalis MHI-CRG-PUJ-666
Ruminococcaceae bacterium MGYG-HGUT-02290
Sellimonas intestinalis BR72
Drancourtella sp. An210
Pseudoflavonifractor sp. BSD2780061688
Sellimonas intestinalis BIOML-A1
Ruminococcus sp. DSM-100440
Drancourtella sp. An57
Sellimonas intestinalis AF37-2AT
Eubacterium sp. Marseille-P3177
Sellimonas intestinalis AF37-1
Sellimonas intestinalis AF07-16
Drancourtella sp. An177
Sellimona intestinalis AM38-2BH
Sellimonas intestinalis AM14-42
Drancourtella massiliensis GD1
Sellimonas intestinalis AF14-9AC
Ruminococcus sp. OM05-10BH
Lachnoclostridium sp. An181
Lachnoclostridium sp. An118
Drancourtella sp. An12
Tree scale: 0.01
Lachnoclostridium sp. An181
Eubacterium sp. P3177
Lachnosclostridium sp. An118
Drancourtella sp. An210
Drancourtella sp. An177
Drancourtella sp. An57
Drancourtella sp. An12
Pseudoflavonifractor sp. BSD2780061688
Ruminococcus sp. OM05-10BH
Ruminococcaceae bacterium MGYG-HGUT-02290
Sellimonas intestinalis AF07-16
Sellimonas intestinalis 6K002
Ruminococcus sp. DSM-100440
Drancourtella massiliensis GD1
Sellimonas intestinalis AM38-2BH
Sellimonas intestinalis AF37-1
Sellimonas intestinalis AM14-42
Sellimonas intestinalis AF14-9AC
Sellimonas intestinalis BR72
Sellimonas intestinalis AF37-2AT
Sellimonas intestinalis BIOML-A1
Figure 2. Phylogeographic analysis and phylogenetic networks used to predict the 656
genetic population structure of Sellimonas intestinalis. A) Bayesian evolutionary 657
analysis based on Markov Chain Monte Carlo (MCMC) implemented in BEAST-260 658
carried out from the core genome alignment (with 22,453 positions of length) of the 11 659
sequences selected assemblies. GTR substitution model was chosen as the best model in 660
jModelTest v0.1.1.58 B) phylogenetic network using neighbor-net method conducted in 661
SplitsTree5.62 662
663
664
0 0 0 0 0 0 0.0010.001 0.001
GD1
BR72
DSM-100440
AF07-16
BIOML-A1
AM14-42
AF14-9AC
AM38-2BH
6K002
AF37-2AT, AF37-1
Linage-III
Linage-II
Linage-I
AM14-42 (China)
GD1 (Fran
ce)
AF1
4-9A
C (C
hina
)
AF07-16 (C
hina)
AF37
-1 (C
hina
)
BR72 (South Korea)
DSM-100440 (Finland)
6K002 (Chile)
BIOML-A1 (United states)
AF37-2AT (China)
AM38-2BH
(China)
Tree scale: 100000000A.
B.
Figure 3. Cluster of orthologous groups (COGs) for A) global data set and B) 665
individual isolates. eggNOG-mapper v2 was used as a tool for fast functional 666
annotations of sequence collections.63 667
668
1845
3060
1939
495
706
1758
164
1421
854
1758
2256
2786
950
971
592
318
296
1086
0 500 1000 1500 2000 2500 3000 3500
J: Translation, ribosomal structure and biogenesis
K: Transcription
L: Replication, recombination and repair
D: Cell cycle control, cell division, chromosome partitioning
O: Posttranslational modification, protein turnover, chaperones
M: Cell wall/membrane/envelope biogenesis
N: Cell motility
P: Inorganic ion transport and metabolism
T: Signal transduction mechanisms
C: Energy production and conversion
G: Carbohydrate transport and metabolism
E: Amino acid transport and metabolism
F: Nucleotide transport and metabolism
H: Coenzyme transport and metabolism
I: Lipid transport and metabolism
U: Intracellular trafficking, secretion, and vesicular transport
Q: Secondary metabolites biosynthesis, transport and catabolism
V: Defense mechanisms
GD1
6K002
BR72
DSM-100440
AM14-42
AF07-16
BIOML-A1
AM38-2BH
AF14-9AC
AF37-1
AF37-2AT
164 168 167 168 166 168 168 168 167 172 169
258 273 281 282 280 289 288 266 272 290 281
151 165 186 189 178 183 181 160 173 191 182
50 43 44 44 45 45 45 43 45 46 45
59 64 68 69 63 64 64 63 64 64 64
155 158 158 156 161 164 164 154 155 167 166
13 10 15 15 16 16 16 16 16 16 15
126 129 134 129 128 132 132 130 127 127 127
75 73 77 77 78 82 82 78 77 78 77
159 171 161 160 159 159 159 157 159 157 157
211 203 203 202 206 207 207 206 203 204 204
257 250 262 263 252 253 252 253 238 253 253
88 83 86 87 87 88 88 85 86 86 86
87 89 87 91 90 87 87 88 87 89 89
53 56 50 52 53 55 55 55 55 54 54
23 25 30 27 29 31 31 24 30 35 33
21 26 29 29 28 27 27 28 24 29 28
97 93 97 96 100 105 105 97 95 101 100
B.A.
1653434.7.fna
1653434.9.fna
1653434.60.fna
1653434.5.fna
1671366.3.fna
1632013.5.fna
1653434.6.fna
SRR9222402.fna
1653434.4.fna
1653434.8.fna
ERR2703811.fna
Tree scale: 0.1
1653434.7.fna
1653434.9.fna
1653434.60.fna
1653434.5.fna
1671366.3.fna
1632013.5.fna
1653434.6.fna
SRR9222402.fna
1653434.4.fna
1653434.8.fna
ERR2703811.fna
Tree scale: 0.1
Figure 4. Virulence factors and antimicrobial resistance genes detected in 669
Sellimonas intestinalis genomes. A) phylogenetic reconstruction from accessory 670
genome alignment. B) frequency of markers found in each assembly; and C) presence-671
absence matrix which describe the markers detected in each genome. Abricate 0.8.4 672
(https://github.com/tseemann/abricate) was used to make BLAST against the sequences 673
previously reported in the following databases: CARD,64 Resfinder,65 NCBI,66 ARG-674
ANNOT,67 VFDB68 and PlasmidFinder.69 675
676 677
1 2 3 4 5
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0
1 0 0 0 0 1 0 1
1 0 1 0 0 1 0 0
1 0 0 0 0 0 1 0
1 0 0 0 0 1 0 1
1 0 0 1 1 1 0 1
1 1 0 0 0 0 0 1
1 1 0 0 0 0 0 1
lnuA
tet(3
2)
tet(M
)
tet(O
)
Number of markers
rpoB
2
cfr(
C)_
2
(AG
ly)A
ac6-
Aph2
Erm
B
1671366.3.fna
ERR2703811.fna
1632013.5.fna
1653434.60.fna
1653434.9.fna
1653434.8.fna
1653434.7.fna
1653434.4.fna
1653434.5.fna
1653434.6.fna
SRR9222402.fna
Tree scale: 0.1
B. C.A.
Isolate Country
GD1 France
6K002 Chile
BR72 South Korea
DSM-100440 Finland
AM14-42 China
AF07-16 China
BIOML-A1 United states
AM38-2BH China
AF14-9AC China
AF37-1 China
AF37-2AT China
Supplementary material legends 678
Supplementary Figure 1. Microscopic morphology of Sellimonas intestinalis isolate. 679
The results indicate a Gram-positive bacterial with coccoid morphology. 680
Supplementary Figure 2. Phylogenetic reconstruction of 2,902 genomes of 681
Lachnospiraceae and Ruminococcacea genomes based on 16S-rRNA alignment which 682
allowed to define a node well supported which included the studied assembly and other 683
9 genomes. 684
Supplementary Figure 3. Graphical map of the 11 genome assemblies identified as 685
belonging to the same species of studied genome, built in the CGview server.45 686
Supplementary Table 1. Comparison of 16S-rRNA sequence using BLAST which 687
revealed that the analyzed genome belongs to one of the following genera: 688
Ruminococcus, Drancourtella or Sellimonas. 689
Supplementary Table 2. Information of S. intestinalis genomes included in 690
comparative analyses. 691
Supplementary Table 3. Pangenome analysis of S. instestinalis dataset showed a 692
codifying potential of 4,627 genes. 693
694
695