research article open access genomic encyclopedia ......research article open access genomic...

19
RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov 1,2 , Chen Yang 1,3 , Xiaoqing Li 1 , Irina A Rodionova 1 , Yanbing Wang 4 , Anna Y Obraztsova 4,7 , Olga P Zagnitko 5 , Ross Overbeek 5 , Margaret F Romine 6 , Samantha Reed 6 , James K Fredrickson 6 , Kenneth H Nealson 4,7 , Andrei L Osterman 1,5* Abstract Background: Carbohydrates are a primary source of carbon and energy for many bacteria. Accurate projection of known carbohydrate catabolic pathways across diverse bacteria with complete genomes constitutes a substantial challenge due to frequent variations in components of these pathways. To address a practically and fundamentally important challenge of reconstruction of carbohydrate utilization machinery in any microorganism directly from its genomic sequence, we combined a subsystems-based comparative genomic approach with experimental validation of selected bioinformatic predictions by a combination of biochemical, genetic and physiological experiments. Results: We applied this integrated approach to systematically map carbohydrate utilization pathways in 19 genomes from the Shewanella genus. The obtained genomic encyclopedia of sugar utilization includes ~170 protein families (mostly metabolic enzymes, transporters and transcriptional regulators) spanning 17 distinct pathways with a mosaic distribution across Shewanella species providing insights into their ecophysiology and adaptive evolution. Phenotypic assays revealed a remarkable consistency between predicted and observed phenotype, an ability to utilize an individual sugar as a sole source of carbon and energy, over the entire matrix of tested strains and sugars. Comparison of the reconstructed catabolic pathways with E. coli identified multiple differences that are manifested at various levels, from the presence or absence of certain sugar catabolic pathways, nonorthologous gene replace- ments and alternative biochemical routes to a different organization of transcription regulatory networks. Conclusions: The reconstructed sugar catabolome in Shewanella spp includes 62 novel isofunctional families of enzymes, transporters, and regulators. In addition to improving our knowledge of genomics and functional organization of carbohydrate utilization in Shewanella, this study led to a substantial expansion of our current version of the Genomic Encyclopedia of Carbohydrate Utilization. A systematic and iterative application of this approach to multiple taxonomic groups of bacteria will further enhance it, creating a knowledge base adequate for the efficient analysis of any newly sequenced genome as well as of the emerging metagenomic data. Background Carbohydrates comprise a key natural source of carbon and energy for a variety of heterotrophic microbes. The diversity of carbohydrates (polymers, oligo- and mono- saccharides) in various ecosystems as well as the diver- sity of microbial lifestyles is reflected in substantial var- iations of sugar catabolic machinery even between phylogenetically closely-related microorganisms. An abil- ity to confidently reconstruct this machinery directly from genomic sequence is crucial to an understanding of microbial ecophysiology, evolution, adaptation, and even interactions between microorganisms and their animal, insect, or plant hosts. However, despite many studies, the bulk of our knowledge about sugar utiliza- tion pathways is limited to a handful of model bacteria such as E. coli. An accurate projection of this knowledge across the growing variety of divergent bacteria with completely sequenced genomes constitutes a substantial * Correspondence: [email protected] 1 Burnham Institute for Medical Research, La Jolla, California 92037, USA Full list of author information is available at the end of the article Rodionov et al. BMC Genomics 2010, 11:494 http://www.biomedcentral.com/1471-2164/11/494 © 2010 Rodionov et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: others

Post on 03-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

RESEARCH ARTICLE Open Access

Genomic encyclopedia of sugar utilizationpathways in the Shewanella genusDmitry A Rodionov1,2, Chen Yang1,3, Xiaoqing Li1, Irina A Rodionova1, Yanbing Wang4, Anna Y Obraztsova4,7,Olga P Zagnitko5, Ross Overbeek5, Margaret F Romine6, Samantha Reed6, James K Fredrickson6,Kenneth H Nealson4,7, Andrei L Osterman1,5*

Abstract

Background: Carbohydrates are a primary source of carbon and energy for many bacteria. Accurate projection ofknown carbohydrate catabolic pathways across diverse bacteria with complete genomes constitutes a substantialchallenge due to frequent variations in components of these pathways. To address a practically and fundamentallyimportant challenge of reconstruction of carbohydrate utilization machinery in any microorganism directly from itsgenomic sequence, we combined a subsystems-based comparative genomic approach with experimentalvalidation of selected bioinformatic predictions by a combination of biochemical, genetic and physiologicalexperiments.

Results: We applied this integrated approach to systematically map carbohydrate utilization pathways in19 genomes from the Shewanella genus. The obtained genomic encyclopedia of sugar utilization includes~170 protein families (mostly metabolic enzymes, transporters and transcriptional regulators) spanning 17 distinctpathways with a mosaic distribution across Shewanella species providing insights into their ecophysiology andadaptive evolution. Phenotypic assays revealed a remarkable consistency between predicted and observedphenotype, an ability to utilize an individual sugar as a sole source of carbon and energy, over the entire matrix oftested strains and sugars.Comparison of the reconstructed catabolic pathways with E. coli identified multiple differences that are manifestedat various levels, from the presence or absence of certain sugar catabolic pathways, nonorthologous gene replace-ments and alternative biochemical routes to a different organization of transcription regulatory networks.

Conclusions: The reconstructed sugar catabolome in Shewanella spp includes 62 novel isofunctional families ofenzymes, transporters, and regulators. In addition to improving our knowledge of genomics and functionalorganization of carbohydrate utilization in Shewanella, this study led to a substantial expansion of our currentversion of the Genomic Encyclopedia of Carbohydrate Utilization. A systematic and iterative application of thisapproach to multiple taxonomic groups of bacteria will further enhance it, creating a knowledge base adequate forthe efficient analysis of any newly sequenced genome as well as of the emerging metagenomic data.

BackgroundCarbohydrates comprise a key natural source of carbonand energy for a variety of heterotrophic microbes. Thediversity of carbohydrates (polymers, oligo- and mono-saccharides) in various ecosystems as well as the diver-sity of microbial lifestyles is reflected in substantial var-iations of sugar catabolic machinery even between

phylogenetically closely-related microorganisms. An abil-ity to confidently reconstruct this machinery directlyfrom genomic sequence is crucial to an understandingof microbial ecophysiology, evolution, adaptation, andeven interactions between microorganisms and theiranimal, insect, or plant hosts. However, despite manystudies, the bulk of our knowledge about sugar utiliza-tion pathways is limited to a handful of model bacteriasuch as E. coli. An accurate projection of this knowledgeacross the growing variety of divergent bacteria withcompletely sequenced genomes constitutes a substantial

* Correspondence: [email protected] Institute for Medical Research, La Jolla, California 92037, USAFull list of author information is available at the end of the article

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

© 2010 Rodionov et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Page 2: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

challenge. This is largely due to the aforementioned var-iations including numerous cases of non-orthologousgene replacements, families of paralogous proteins withvarying substrate specificity as well as alternative andpresently unknown biochemical routes. Due to thiscomplexity, genomic annotations of carbohydrate utili-zation genes derived solely from sequence similarityanalysis are often imprecise and incomplete, especiallyfor relatively distant and poorly studied bacteria.Shewanellaceae are a versatile family of Gram-negative

g-proteobacteria that have adapted for survival in highlyvaried aquatic and sedimentary environments that exhibitextremes in salinity, temperature, redox chemistry, andhydrostatic pressure [1]. Although Shewanella canrespire with a diverse array of electron acceptors thatinclude organic compounds and metal oxides [2], theywere thought to possess a relatively narrow capacity forutilizing electron donors, preferring simple carbon com-pounds such as formate, lactate, and acetate. The recentresults from comparative and functional genomic analysis[3], physiological experiments [4] and microarray analysis[5] demonstrated that S. oneidensis MR-1 can use a widearray of complex carbon compounds including nucleo-tides, amino acids, mono- and polysaccharides.A subsystems-based approach to genome analysis

allows us to substantially improve the accuracy ofgenomic annotations and to predict functions of pre-viously unknown gene families [6,7]. This approachincludes a combination of comparative genomic tech-niques such as the analysis of conserved operons andregulons, with pathway reconstruction across a largevariety of genomes. Recently, we used a subsystems-based approach to predict and experimentally verifynovel pathways of N-acetylglucosamine and lactateutilization in Shewanella [8,9]. In the current study wehave expanded this analysis towards genomic recon-struction of the entire repertoire of carbohydrate utili-zation pathways in a group of 19 species from thegenus Shewanella with completely sequencedgenomes.

ResultsReconstruction of carbohydrate utilization machinery inShewanella sppTo address the challenge of genomic reconstruction ofthe carbohydrate utilization machinery we took advan-tage of several features characteristic of many sugar uti-lization pathways, such as:Uniform functional organizationA typical sugar utilization pathway includes a transportsystem for sugar uptake and a set of intracellularenzymes that perform biochemical transformations.Although the latter processes vary from sugar to sugar(and, to a degree, from organism to organism), most of

them are performed by a rather narrow spectrum ofenzymatic activities such as kinases, isomerases, oxidore-ductases, hydrolases, and aldolases. Sugar kinases orphosphoenolpyruvate:sugar phosphotransferase systems(PTS) catalyze an essential phosphorylation step (as allintermediates of central carbon pathways to the point ofpyruvate are anchored by one or two phosphates). Wealso included in this analysis associated transcriptionalregulators of sugar utilization pathways that mediatespecific induction of a catabolic pathway. For utilizationof various natural polysaccharides, many pathways areequipped by upstream hydrolytic enzymes producingoligo-, di- or mono-saccharides that are then trans-ported into the cell and further metabolized. These andother upstream and auxiliary (e.g. involved in sensingand chemotaxis) components of the carbohydrate utiliza-tion machinery were typically excluded from our analy-sis, except for those that share genomic context withother sugar utilization components.Ubiquitousprotein familiesDespite their functional diversity, many characterizedcomponents of sugar utilization pathways occur in alimited number of protein families containing multipleparalogs with varying substrate specificities. Representa-tives of these families can be recognized by homology-based genomic searches, thus providing a source of genecandidates for pathway reconstruction. On the otherhand, homology-based methods, taken alone, often failto reliably identify the exact substrate specificity. Thiscan be accomplished by additional reasoning based onthe analysis of functional and genomic context (func-tional coupling) [10].Strong functional couplingBacterial genes involved in sugar catabolism are oftenorganized in compact operons and/or regulated by com-mitted transcription factors. Therefore, genome contextanalysis techniques [6] based on the identification ofconserved chromosomal clusters (operons) and sharedregulatory sites (regulons) are particularly efficient foraccurate functional assignment of previously uncharac-terized genes of sugar utilization pathways.Based on the above considerations we developed the

bioinformatic workflow that was applied for the analysisof 19 complete Shewanella genomes (Fig. 1). The keystarting point of this workflow was a collection ofmanually curated subsystems in the SEED genomicdatabase [7] capturing a substantial fraction of knownsugar utilization pathways projected across many bacter-ial species. A compilation of ~480 groups of isofunc-tional homologs from this collection (see additional datafile 1) was used as a source of queries for homology-based scanning of 19 Shewanella genomes integrated inthe SEED database. The underlying assumption was thatany sugar utilization pathway, even extremely divergent

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 2 of 19

Page 3: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

or fully unknown, should contain at least one proteinrecognizably homologous to some of the querysequences from this collection. Therefore, all of therevealed homologs would be considered potential

“kernels” of sugar utilization pathways, even though aspecific function of some of them might be unclear (anddistinct from the query). During further analysis manyof the initially identified gene candidates whose func-tional roles were deemed unrelated to sugar utilization(e.g. involved in biosynthetic pathways) or remaineddubious (lacking any evidence for functional assignment)were rejected.By applying this workflow we tentatively defined the

first draft of the Genomic Encyclopedia of Sugar Utiliza-tion (or, shortly, sugar catabolome) in 19 Shewanellaspp as comprised of 170 isofunctional protein families(FIGfams) [11]. The complete list and some statistics ofthe analyzed genomes are provided in Table 1. Thedetailed results of this analysis are captured in the SEEDsubsystem ‘Sugar catabolome in Shewanella species’available online http://theseed.uchicago.edu/FIG/subsys.cgi and summarized in additional files 2 and 3. Func-tional annotation of the identified protein families wasconducted using simultaneous genome context analyses,and homology searches combined with the metabolicreconstruction and identification of missing steps in theputative catabolic pathway.Specific functional assignments have been proposed for

157 FIGfams comprising 17 peripheral sugar utilizationpathways and the key components of the central carbonmetabolism (CCM) pathways (Table 2, Fig. 2). A total of21 FIGfams involved in CCM upstream of pyruvate areconserved in all analyzed Shewanella spp. They comprisecomplete canonical pentose phosphate (PP) and Entner-Doudoroff (ED) pathways. However, an essential enzymeof glycolysis, 6-phosphofructokinase (Pfk), is missing inall of these species suggesting that sugar utilization inShewanella can proceed only through the ED or PP path-ways. This is in agreement with previous metabolicreconstruction reports [1,3] and biochemical studies ofCCM enzymes in cell extracts of certain Shewanellastrains [12]. All other canonical enzymes of glycolysis/gluconeogenesis are present in all analyzed Shewanellagenomes pointing to the presence of intact gluconeoge-netic route. The peripheral pathways are comprised of136 FIGfams showing a mosaic distribution among 19compared species. The total number of proteins thatcomprise the peripheral sugar utilization machinery ofindividual species varies broadly, from 18 proteins com-prising 3 pathways in S. sediminis up to 74 proteins com-prising 10 pathways in S. baltica OS223.13 FIGfams have not been assigned a specific function

in any sugar utilization pathway due to the lack of anysuggestive genomic context (see additional data files 2and 3). Among them there are three genes encodingputative sugar kinases of unknown specificity, and genesfrom a hypothetical sugar utilization gene cluster (formore details see additional data file 4).

Figure 1 Workflow for genomic reconstruction of carbohydrateutilization machinery in Shewanella spp.

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 3 of 19

Page 4: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

Table 1 Genomic properties and isolation site characteristics of analyzed Shewanella species

Genome Alias Total genes1 Sugar genes2 Sugarpathways3

Isolation site characteristics

S.oneidensis MR-1 MR1 4318 56 4 Lake Oneida, NY, USA (sediment)

S.putrefaciens CN-32 CN32 3972 64 4 Albuquerque, NM, USA (subsurface)

S.putrefaciens W3-18-1 W3181 4044 64 4 Washington Coast, Pacific Ocean (sediment)

S.sp. ANA-3 ANA3 4111 87 7 Eel Pond, Woods Hole, MA, USA (brackish water)

S.sp. MR-4 MR4 3924 86 7 Black Sea (sea water - 5 m)

S.sp. MR-7 MR7 4006 92 8 Black Sea (sea water - 60 m)

S.baltica OS155 Sbal 4307 77 7 Baltic Sea (sea water - 90 m)

S.baltica OS185 OS185 4323 84 8 Baltic Sea (sea water - 120 m)

S.baltica OS195 OS195 4499 80 7 Baltic Sea (sea water - 140 m)

S.baltica OS223 OS223 4250 110 10 Baltic Sea (sea water - 120 m)

S.denitrificans OS217 Sden 3754 58 4 Baltic Sea (sea water - 120 m)

S.loihica PV-4 PV4 3859 59 5 Loihi Seamount, Pacific Ocean (hydrothermal vent)

S.amazonensis SB2B Sama 3645 89 6 Amazon River Delta, Brazil (sediment)

S.frigidimarina NCIMB 400 Sfri 4029 70 8 North Sea, coast of Aberdeen, UK (sea water)

S.piezotolerans WP-2 Spie 4933 68 6 West Pacific site (sediment - 1914 m)

S.pealeana ANG-SQ1 Spea 4241 71 6 Woods Hole harbor, MA, USA (squid gland)

S.halifaxens HAW-EB4 Shal 4278 59 5 Halifax Harbor, Nova Scotia, CA (sediment - 215 m)

S.sediminis HAW-EB3 Ssed 4497 47 3 Halifax Harbor, Nova Scotia, CA (sediment - 215 m)

S.woodyi ATCC 51908 Swoo 4880 84 7 Strait of Gibraltar, Mediterranean Sea (370 m)1 protein coding genes2 number of genes involved in the reconstructed sugar utilization pathways.3 number of the reconstructed sugar utilization pathways.

Table 2 Reconstructed sugar utilization pathways and newly assigned genes in Shewanella spp

Sugar utilization pathway(subsystem)

Number ofgenomes

Number ofgenes

Newly assigned genes

Carbohydrates Abbrev. regulation transport enzymes auxiliary

Central carbonmetabolism

CCM 19 21 - - - -

N-acetylglucosamine,chitin

Nag 18 15 nagR nagP, ompNag nagBII, nagK, mcpNag,nagX

Glycerate Grt 17 3 - grtP - -

b-glucosides, cellobiose Bgl 9 8 bglR bglT, glcPBgl ompBgl glkII -

Sucrose Scr 8 5 scrRII scrTII, ompScr - -

Maltodextrins Mal 14 13 malR malT, glcPMal ompMal

(2)- -

Arabinose, arabinosides Ara 6 18 araRII araUVWZ, araT,ompAra

araM, araY araX

Galactose, galactosides Gal 8 10 - galPII, ompGal - -

Gluconate Gnt 4 3 - - - -

N-acetylgalactosamine Aga 4 8 - agaP, ompAga agaAII, agaK,agaS

agaO

Mannosides Man 4 14 manRI,manRII

manPI, manPII,ompMan

manI, manK -

Trehalose Tre 3 6 treRII treT, ompTre treP -

Xylitol Xlt 2 6 xltR xltABC - -

Ribose Rbs 2 6 - - - -

Sialic acids Nan 1 10 - nanP, ompNan - -

Alginate Alg 1 6 algR algT - -

Mannitol Mtl 1 5 mtlRII mtlP mtlZII mtlX

Unassigned - 19 13 - - - -

Total number: 17 19 170 11 34 12 5

Predicted novel functional assignments verified by targeted experiments are marked by bold type and underlined. The details of all functional assignmentssummarized in this table are provided in additional files 2 and 3.

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 4 of 19

Page 5: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

Using the genome context analysis combined withmetabolic reconstruction we inferred specific functionalassignments for 62 families of isofunctional homologswhose functions were previously unknown or definedonly at the level of general class (Table 2, see also addi-tional file 5). The novel functional assignments, includ-ing 34 components of transport systems, 11transcriptional regulators (and corresponding DNAmotifs), 12 metabolic enzymes, and 5 auxiliary proteins,are supported by the overall internal consistency of theentire reconstruction where all of the pathways are com-plete (no gaps or missing genes), and all of the anno-tated genes are associated with complete pathways (no

genes out of context). Selected functional assignmentsmostly for novel sugar transporters from those pathwaysthat are present in most of the analyzed Shewanella spe-cies were tested by targeted biochemical and geneticexperiments (for details see next section and additionalfile 6). An ultimate validation of the entire reconstruc-tion was attained by the experimental testing of growthphenotypes (an ability of a given strain to grow on aspecific sugar substrate as a sole source of carbon andenergy) predicted by the presence or absence of respec-tive reconstructed pathways. The pathway presence wasdefined based on the presence of all components of thepathway in a particular genome. This phenotype

Figure 2 Reconstructed pathways of carbohydrate utilization in Shewanella pan-genome. Abbreviations for carbohydrates are listed inTable 2. Functional roles implicated in the same catabolic pathway are shown by matching background colors. Novel functional roles predictedin this work are in red boxes. Enzymatic and transport routes are shown by solid and dotted lines, respectively. Transcriptional factors predictedto control sugar utilization pathways are shown in ovals of matching colors. Reactions and enzymes of the central carbohydrate metabolism areshown in magenta. Note that none of the individual Shewanella species contain all (or even most) of the shown pathways.

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 5 of 19

Page 6: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

profiling was performed using a “matrix” approachwhere 14 representative Shewanella species were pro-filed against a panel of 18 diagnostic sugar substrates(Table 3).

Peripheral sugar utilization pathwaysIn this section we describe the key novel aspects of eightreconstructed sugar utilization pathways that were foundto be relatively widely distributed in Shewanella genus.The “core” subset includes pathways for utilization ofD-glucose, N-acetylglucosamine, D-glycerate, b-gluco-sides (present in 14 or more species). The “intermediate”subset includes the sucrose, maltodextrins, D-galactose,and L-arabinose catabolic pathways (conserved in 6 ormore species). We also provide the results of experi-mental validation of selected novel functional roles infive sugar catabolic pathways. The details of the othernine reconstructed pathways for utilization of alginate,gluconate, mannitol, mannosides, N-acetylgalactosamine,ribose, sialic acids, trehalose, and xylitol that were foundonly in one to four Shewanella strains ("rare” pathways)are described in the additional file 4. We also report theresults of growth phenotype characterization for most ofthe respective sugar substrates.D-glucose(Glc) is utilized by bacteria using either (i) hexose per-mease and glucokinase, or (ii) PTSGlc system. Theinability to ferment glucose as a carbon source underaerobic conditions was originally attributed to the She-wanella genus, whereas later studies have identified sev-eral species that were able to utilize glucose such as S.baltica and S. frigidimarina [13-15]. In this study we

have tentatively identified respective genes, glucosetransporters glcPBgl and glcPMal and glucokinase glkII,that are conserved in most Shewanella genomes, wherethey are clustered on the chromosome with the genesfrom b-glucoside (Bgl) and maltodextrin (Mal) utiliza-tion pathways, respectively (Fig. 3A). These catabolicpathways include several secreted glucosidases (e.g.BglAII, CgA, SusB) generating extracytoplasmic glucosethat can be utilized via associated glucose transportersand the glucokinase (see below). The predicted glucosetransporters belong to the glucose-galactose permease(GGP) family, and are most similar to the glucose andgalactose transporter GluP from Brucella abortus [16]and fucose permease FucP from E. coli. The predictedglucokinase GlkII belongs to the ROK family of sugarkinases [17], and is similar to fructokinase Mak, D-allosekinase AlsK, and N-acetylglucosamine kinase NagK fromE. coli, and glucokinase GlcK from B. subtilis. Orthologsof both glcPMal/glcPBgl and glkII are absent from E. coliand other Enterobacteria. The GlcPBgl and GlcPMal

transporters in Shewanella were found only in the con-text of the Bgl and Mal pathways, whereas GlkII seemsto play a general housekeeping role being conserved inall analyzed Shewanella genomes (see additional datafiles 2 and 3). In 9 of 19 genomes glkII was also foundwithin the Bgl utilization gene cluster, and in 6 of thesecases it is a second copy of the gene in the genome.To confirm the functional assignment of GlcP, we

constructed the ΔglcPMal knockout mutant in Shewa-nella sp. ANA-3, and tested it for glucose-dependentgrowth. In contrast to the wild-type ANA-3, theΔglcPMal strain was unable to grow on D-glucose as a

Table 3 Consistency between the predicted and experimentally determined growth phenotypes of Shewanella spp

Strain Glc Nag Grt Bgl Scr Mal Ara Gal Gnt Aga Tre Mtl

MR1 +/n* +/p +/p -/n -/n +/n* -/n -/n -/n -/n -/n -/n

CN32 -/n +/p +/p -/n -/n -/n +/p -/n -/n -/n -/n -/n

W3181 -/n +/p +/p -/n -/n -/n +/nd -/n -/nd -/nd -/n -/nd

OS155 +/p +/p +/p +/p +/p +/p -/n -/n +/p -/n -/n -/n

OS185 +/p +/p +/p +/p +/p +/p -/nd +/p +/nd -/nd -/n -/nd

OS195 +/p +/p +/p +/p +/p +/p -/nd -/n +/nd -/nd -/n -/nd

OS223 +/p +/p +/p +/p +/p +/p -/nd +/p +/nd -/nd +/p -/nd

MR7 +/p +/p +/p -/n +/p +/p +/p -/n -/n +/p -/n -/n

MR4 +/p +/p +/p -/n +/p +/p +/p -/n -/n +/p -/n -/n

ANA3 +/p +/p +/p -/n +/p +/p +/p -/n -/n +/p -/n -/n

Sden +/w +/p -/n +/p -/n +/p -/n -/n -/n -/n -/n -/n

Sfri +/p -/n +/p +/p +/p +/p -/n -/n -/n -/n +/p +/n*

PV4 +/w +/p +/p -/n -/n +/w -/n +/w -/n -/n -/n -/n

Sama +/p +/p -/n +/p -/n +/p -/n -/n -/n +/p -/n -/n

Aliases for analyzed Shewanella strains are described in Table 1. Each cell in the table combines the data on the predicted and experimentally determined sugarutilization phenotype. The ability of Shewanella species to grow on a panel of sugar substrates was predicted based on the presence (+) or absence (-) of therespective reconstructed pathways in their genomes. The experimental results of growth experiments from this study (see additional data files 7 and 8) are: ‘p’,positive growth; ‘w’, weak growth; ‘n’, no growth. Inconsistencies between the predicted and experimentally determined growth phenotypes are in bold andmarked by asterisk.

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 6 of 19

Page 7: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

Figure 3 Genomic context and regulons of genes involved in sugar utilization in representative Shewanella genomes. Panel A containsmetabolic pathways discussed in the main text as the “core” and intermediate” subsets. Panel B contains “rare’ pathways described in theadditional file 4. Genes are colored by their functional classification: sugar transport systems, yellow; cytoplasmic enzymes catalyzing biochemicaltransformations, blue; transcriptional regulators, black; extracytoplasmic sugar hydrolytic enzymes and auxiliary proteins functionally linked to thepathway, gray. Genes with novel functional roles predicted in this work are outlined in red. For each sugar-specific transcription factor, predictedDNA binding sites are shown by black dots and a consensus DNA-binding motif is shown as a sequence logo.

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 7 of 19

Page 8: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

sole carbon and energy source, confirming the glucosetransporter functional assignment (see additional datafile 6 for original experimental data). A representative ofthe predicted glucokinase subfamily (GlkII) from S. bal-tica OS155 was experimentally characterized as a partof our analysis of the Bgl pathway (see below). To assessthe functionality of the predicted glucose catabolic path-way we performed phenotypic characterization of She-wanella for growth on D-glucose as a sole carbon andenergy source (see additional data files 7 and 8 for theoriginal physiological growth data). Among 14 Shewa-nella strains tested only S. oneidensis and two S. putrefa-ciens strains were unable to grow on glucose. Theseresults are consistent with the distribution of glcP trans-porters in these Shewanella genomes (Table 3, see alsoadditional file 2). The inability of S. oneidensis MR-1 togrow on glucose is most likely due to a frameshift in theglcPMal gene[18].The second (PTS-driven) route of glucose utilization is

not expected to support the growth of bacteria (such asShewanella spp) with the incomplete glycolytic pathwaydue to the inability of the ED or PP pathways to gener-ate enough phosphoenolpyruvate to compensate for itsconsumption in the phosphotransferase reaction. Thisassertion is supported by the observed impaired growthof the E. coli pfk mutant on glucose using PTSGlc [19].Therefore, the actual physiological role of Shewanellagenes orthologous to the components of the E. coliPTSGlc system (ptsHI-crr and ptsG) remains a mystery.It would be tempting to speculate that this PTS systemis used by Shewanella for glucose consumption in thepresence of other substrates contributing to nonglycoly-tic phosphoenolpyruvate generation. However, ourattempts to grow MR-1 on the mixture of lactate andglucose did not confirm this conjecture (data notshown).N-acetylglucosamine(Nag) and chitin catabolic pathway (Fig. 2, see alsoadditional file 9) conserved in most Shewanella sppcontains several novel functional roles (Table 2). Pre-viously we have experimentally confirmed the predictedenzymatic activities of novel Nag kinase (NagK) andglucosamine-6-phosphate deaminase (NagBII) andreconstituted the entire three-step biochemical conver-sion of Nag to fructose-6-phosphate in vitro [9]. Wehave also confirmed that all of the tested Shewanellastrains can grow on Nag as a single source of carbonand energy with the exception of S. frigidimarina,which does not contain these genes (Table 3). Here wepresent additional experimental results supporting func-tional assignments of the predicted transporter NagP inS. oneidensis MR-1. NagP belongs to the GGP family oftransporters and has a limited similarity to the glucosepermeases GlcPBgl and GlcPMal. We have constructed

the S. oneidensis ΔnagP targeted deletion mutant anddemonstrated the loss of its ability to grow on Nag (seeadditional data file 6).The Nag catabolic genes shows their wide distribution

in the Alteromonadales lineage, suggesting the presenceof the Nag pathway in the common ancestor of thislineage. The absence of Nag pathway in S. frigidimarinais attributed to a loss of the entire chromosomal genecluster. The observed wide distribution of Nag utiliza-tion pathway in Shewanella could be related to theirability to utilize chitin, a highly abundant constituent ofaquatic invertebral exoskeleton.D-glycerate(Grt) utilization pathway in Shewanella species involvesD-glycerate kinase GarK, transcriptional regulator SdaR,and a novel D-glycerate permease termed here GrtP(Fig. 2, see also additional file 9). In E. coli, GarK isinvolved in the glucarate/galactarate catabolic pathway,generating 2-phosphoglycerate as product, and SdaR is acommon transcriptional regulator of this pathway [20].Most of the glucarate/galactarate utilization genes areabsent in Shewanella species. The only exception is agarK gene ortholog, which was found in 17 Shewanellastrains in a conserved operon with a hypothetical geneencoding the predicted D-glycerate transporter GrtP(see additional data files 2 and 3). The grtP-garK operonis clustered on the chromosome with an sdaR geneortholog. A comparative genomic reconstruction of theSdaR regulon in g-proteobacteria allowed us to predict acandidate SdaR-binding site located upstream of thegrtP-garK operon in Shewanella genomes (Fig. 3A).The predicted D-glycerate uptake transporter GrtP

belongs to the gluconate permease family and has ortho-logs in Pseudomonadales and Vibrionales but not inE. coli or other Enterobacteria. The predicted functionof GrtP was confirmed by the loss of ability of theS. oneidensis ΔgrtP strain to grow on D-glycerate as asole carbon and energy source (see additional data file6). Of the 14 Shewanella strains, all but 2 (S. denitrifi-cans and S. amazonensis) could grow on D-glycerate(see additional data files 7 and 8), which is fully consis-tent with the distribution of the grtP-garK operonamong the analyzed Shewanella genomes (Table 3). Thenatural source of D-glycerate in aquatic environments isyet to be elucidated.b-glucoside’Bgl’ utilization pathway revealed in 9 Shewanella speciesinvolves b-glucanase LamA, two b-glucosidases BglAI

and BglAII, two novel b-glucoside transporters BglT andOmpBgl, a novel LacI-type transcriptional regulator BglR,glucose permease GlcPBgl and ROK-type glucokinaseGlkII (Fig. 2, see additional data files 2, 3 and 9). The pre-viously described Bgl catabolic pathways use PTS-typetransport systems and 6-phospho-b-glucosidases (as in

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 8 of 19

Page 9: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

B. subtilis and E. coli) or ABC-type transport systems andb-glucosidases (as in Streptomyces spp and Archaea) [21].The predicted b-glucoside uptake transporter BglT is amember of the glycoside-pentoside-hexuronide (GPH)transporter family, and has orthologs in other Alteromo-nadales species. Homologs of BglT in Enterobacteria (e.g.YicJ and YagG from E. coli that show 40-44% sequencesimilarity to BglT) are predicted xyloside transportersregulated by the xylose activator XylR [22]. OmpBgl

belongs to the TonB-dependent outer membrane trans-porter (TBDT) family, which includes proteins involvedin high-affinity binding and energy-dependent uptake ofvarious substrates (including oligosaccharides) into theperiplasm [23]. The predicted Bgl-specific transporterOmpBgl is a nonorthologous replacement of the Bgl-spe-cific outer membrane porin BglH from E. coli. Compara-tive genomic reconstruction of the BglR regulon allowedus to predict candidate BglR-binding sites locatedupstream of the divergently transcribed bglAI-bglT-bglRand glcPBgl-bglAIIoperons, and upstream of the ompBgl

gene (Fig. 3A). Thus the predicted novel transporters arepositionally clustered and co-regulated with other Bglcatabolic genes.The reconstructed Bgl pathway in Shewanella involves

three glucosidases, two of which, LamA and BglAII, arepredicted to be secreted outside of the cell and to theperiplasm, respectively, whereas BglAI is likely acytoplasmic enzyme. We propose that b-glucoside-con-taining glucans are first degraded by extracellular endo-b-1,3-glucanase LamA, the resulting oligo-b-glucosidesare transported to the periplasm by OmpBgl, and subse-quently utilized by BglAII to produce D-glucose andshorter b-glucosides (e.g., cellobiose, gentibiose). Thelatter products are taken up by the predicted BglTtransporter into the cytoplasm where they are finallyhydrolyzed by the BglAI enzyme. D-glucose is taken upby the predicted glucose transporter GlcPBgl and phos-phorylated by the GlkII glucokinase (Fig. 2).The newly predicted b-glucoside BglT transporter was

experimentally confirmed by genetic complementationin E. coli ΔbglF mutant (a knockout of cellobiose PTSsystem). This strain, when transformed by an expressionvector containing an operon bglAI- bglT cloned from S.baltica OS155 gained the ability to grow on minimalmedium with cellobiose as the sole carbon and energysource. In the same conditions, no growth was observedfor this strain transformed by vector only or by the plas-mids containing single genes, bglAI or bglT (see addi-tional data file 6).The predicted glucokinase activity of the S. baltica

OS155 glkII gene product was confirmed by an in vitroenzyme assay. Recombinant purified GlkII exhibited abroad substrate specificity at 37°C with the highestactivity with D-glucose (22 μmol/mg/min), and an

appreciable activity with some other tested hexosesincluding D-mannose, D-fructose, D-glucosamine, andD-mannosamine (see additional data file 6).The results of the phenotype profiling of 14 Shewa-

nella strains showed that only half of them (four S. bal-tica species, S. amazonensis, S. frigidimarina, andS. denitrificans) were able to grow on cellobiose as asole carbon and energy source (see additional data files7 and 3). This result is consistent with the distributionof the respective genes in the analyzed Shewanella gen-omes (Table 3).The Bgl catabolic genes are sparsely distributed among

Shewanella species (Fig. 4) and many other species fromthe Alteromonadales lineage. Therefore, it might be anancestral pathway predating the speciation of Alteromo-nadales that could have been independently lost severaltimes within the Shewanella genus. The species-specificloss of Bgl pathway by many Shewanella could be linkedto a particular ecophysiology of their habitat. For exam-ple, S. pealeana which colonizes the accessory nidamen-tal gland of the squid may not have access to Bglsubstrate and thus lost the respective gene cluster fromits chromosome [15].Sucrose(Scr) utilization pathway in 8 Shewanella speciesinvolves sucrose phosphorylase ScrP, fructokinase ScrK,two novel sucrose transporters (ScrTII and OmpScr), anda novel LacI-type transcriptional regulator ScrRII (Fig. 2;see also see additional data files 2, 3 and 9). This path-way variant is quite different from the canonical path-way known in Enterobacteria, which includes thesucrose-specific PTS-driven uptake and phosphorylationfollowed by sucrose-6-phosphate hydrolase [24]. Analternative variant of the Scr pathway previouslydescribed in Bifidobacterium spp includes sucrose per-mease ScrT from the MelB melibiose transporter familyand sucrose phosphorylase [25]. The major disctinctionin Shewanella is a nonorthologous replacement of ScrTby a predicted transporter ScrTII of the GGP family ofsugar transporters. The novel sucrose regulator ScrRII inShewanella is another nonorthologous replacement ofthe previously characterized ScrR repressor from otherbacteria. A genomic reconstruction of the ScrRII regulonallowed us to predict its candidate binding sitesupstream of the divergently transcribed scrP and scrTII

genes, as well as upstream of the ompScr-scrK operon(Fig. 3A). The predicted Scr-specific TBDT OmpScr isfunctionally equivalent to the Scr-specific outer mem-brane porin ScrY from Enterobacteria, and is presum-ably involved in the uptake of sucrose into theperiplasm[26].The ScrTII transporter was validated by complementa-

tion in E. coli K-12, which lacks the Scr pathway and isunable to utilize sucrose (see additional data file 6). Two

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 9 of 19

Page 10: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

divergently transcribed genes from S. frigidimarina,scrTII and scrP, were expressed in E. coli under the con-trol of an endogenous promoter in their common inter-genic region. As a negative control, we used an emptyvector as well as single scrTII or scrP genes expressed inthe same strain. The cell growth was monitored in aminimal medium with sucrose as the only carbon andenergy source. We have found that only the cells carry-ing both scrTII and scrP were able to grow on sucroseproviding an experimental verification of the recon-structed Scr utilization pathway.The growth phenotype profiling of 14 Shewanella spe-

cies demonstrated that eight of them (Shewanella sp.ANA-3, MR-4, MR-7, four S. baltica strains, S. frigidi-marina) are able to grow on sucrose (see additional datafiles 7 and 8). These results are in agreement with thepresence of the respective genes in these 8 Shewanellagenomes (Table 3).The Scr catabolic genes are present in a large group of

Shewanella species (Fig. 4) and in some other speciesfrom the Alteromonadales lineage. The most parsimo-nious scenario suggests that the Scr pathway was pre-sent in a common ancestor of the Shewanella genus and

that it has been independently lost several times in theevolutionary history of Shewanella. Distribution of theScr pathway among the Shewanella species providesanother possible link to their ecophysiology, the avail-ability of plant-derived sucrose in their respective ecolo-gic niches.Maltodextrin(Mal) utilization gene locus identified in 14 Shewanellaspecies contains the amy, cga, susA, susB, and malZgenes encoding orthologs of previously characterizedstarch and maltodextrin hydrolytic enzymes, as well asgenes encoding novel transporters termed MalT,GlcPMal and OmpMal, and a novel LacI-type transcrip-tional regulator MalR (see additional data file 2). Thereconstructed Mal catabolic pathway in Shewanella (Fig.2) uses a predicted novel sodium:solute symporter MalT(a homolog of the melibiose permease MelB of E. coli)instead of the maltose ABC transporter MalEFGK pre-viously described in E. coli [27]. The outer membranemaltose/maltodextrin uptake in Shewanella is presum-ably mediated by a novel TBDT OmpMal, the functionalequivalent of E. coli maltoporin LamB[26]. The novelglucose transporter GlcPMal is a close paralog (67%

Figure 4 Distribution of 17 sugar utilization pathways encoded in 19 Shewanella genomes. Shewanella species abbreviations are asdescribed in Table 1. Asterisks indicate cases when a pathway is impaired by the presence of an insertion sequence element or a frameshiftmutation and, thus, deemed nonfunctional.

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 10 of 19

Page 11: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

identity) of the predicted glucose transporter GlcPBgl

from the Bgl utilization pathway in Shewanella. We pro-pose that D-glucose is produced in the periplasm by theaction of maltodextrin a-glucosidases and transportedinto the cytoplasm by GlcPMal. A comparative genomicreconstruction of the MalR regulon allowed us to pre-dict candidate MalR-binding sites located upstream ofthe divergently transcribed ompMal and malZ-glcPMal

operons, as well as upstream of the susB and malT-cgaoperons (Fig. 3A).The growth phenotype characterization of 14 Shewa-

nella species demonstrated that 11 of them (all exceptMR-1 and two S. putrefaciens strains) are able to growon maltodextrin as a sole carbon and energy source (seeadditional data files 7 and 8). These phenotypic resultsare in agreement with the genomic reconstruction ofthe Mal pathway (Table 3). The inability of S. oneidensisMR-1 to grow on maltodextrin is attributed to signifi-cant genetic perturbations within the Mal utilizationgene loci (e.g. glcPMal and susA are pseudogenes withframeshifts, whereas ompMal is interrupted by an inser-tion element [18]).The Mal catabolic genes are widely distributed among

Shewanella species (Fig. 4) and several other speciesfrom the Alteromonadales lineage. Such phyletic patternsuggests the presence of an ancestral pathway in thecommon ancestor of the Shewanella genus and indepen-dent species-specific pathway losses. Interestingly, whilethe Mal pathway in S. oneidensis MR-1 demonstratesgenetic signs of decay (see above), two closely related S.putrefaciens strains have completely lost the Mal path-way genes. The loss of the Mal catabolic genes in someShewanella spp. is most likely caused by a significantshift in available nutrient composition in their habitat,e.g. the absence of plant materials as a source ofmaltodexrins.L-arabinose(Ara) and arabinoside utilization gene locus identified insix Shewanella species contains more than 20 genes.Nearly half of these genes are similar to previously char-acterized Ara catabolic genes whereas others are novel(see additional data files 2 and 9). The reconstructedAra catabolic pathway in Shewanella involves the fol-lowing predicted functional roles: arabinose ABC trans-porter AraUVWZ, two arabinoside transporters AraTand OmpAra, a novel GntR-type transcriptional regulatorAraRII, L-arabinose mutarotase AraM, and Ara 1-dehy-drogenase AraY (Table 2, Fig. 2).Two alternative routes of Ara utilization are known in

bacteria. The major one present in E. coli and B. subtilisdepends on L-arabinose isomerase AraA, L-ribulokinaseAraB, and L-ribulose-phosphate epimerase AraD,whereas the alternative Ara pathway characterized inAzospirillum brasiliense proceeds through L-arabinose

dehydrogenase, arabinolactonase, and L-arabonate dehy-dratase [28]. In addition to the araBAD genes encodingthe conventional Ara utilization pathway through L-ribulose, the Ara metabolic locus in Shewanella containstwo genes from the alternative Ara pathway, namely L-arabonate dehydratase araC and arabinolactonase araL,although other genes from the alternative pathway aremissing (see additional data file 9). Based on genomecontext and distant homology analysis we havepredicted the gene araY, which is similar to D-xylose 1-dehydrogenase from Caulobacter crescentus (45%similarity), to be the missing L-arabinose 1-dehydrogen-ase. A member of aldose 1-epimerase family encoded inthe Ara gene cluster was tentatively assigned the func-tional role Ara mutarotase (AraM), which interconvertsalpha and beta anomers of L-arabinose.The predicted ABC-type arabinose uptake transporter

system AraUVWZ is similar to the hypothetical sugartransporter YtfQRST from E. coli (58% similarity) and tothe ribose transport systems in E. coli and B. subtilis.The predicted TBDT OmpAra and the GPH-familytransporter AraT are presumably involved in the uptakeof arabinosides through the outer and inner membrane,respectively. Comparative genomic reconstruction of theAraRII regulon allowed us to predict its candidate bind-ing sites located in the likely regulatory regions of mostoperons in the ara locus (Fig. 3A). Thus, the predictednovel transporters are both positionally clustered andco-regulated with the other Ara catabolic genes (seeadditional data files 2 and 3).The functionality of the predicted arabinose utilization

pathway is supported by the growth phenotype profileof Shewanella spp on L-arabinose (see additional datafiles 7 and 8), which correlates perfectly with the pre-sence/absence of the Ara catabolic genes (Table 3).The Ara catabolic genes are present in a large group

of Shewanella species (Fig. 4), among which MR-1 andthree S. baltica strains have lost the complete ara genecluster. Outside of the Shewanella genus the most simi-lar ara catabolic gene clusters were found in two otherg-proteobacteria, the polysaccharide-degrading marinebacterium Saccharophagus degradans and the plant cellwall-degrading soil bacterium Cellvibrio japonicus. Ara-binose is an important component of plant cell wall.Therefore, the distribution of the Ara catabolic pathwayamong Shewanella species may also reflect differences inthe availability of plant-derived arabinose in theirenvironments.D-galactose(Gal) metabolism genes encoding galactokinase GalK,mutarotase GalM, UDP-glucose epimerase GalE, andUTP-glucose-1-phosphate uridylyltransferase GalU areconserved in all Shewanella genomes (see additionaldata file 2). Utilization of extracellular galactose or

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 11 of 19

Page 12: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

galactosides requires specific transport systems andsugar hydrolases. The galKM locus in S. halifaxensis, S.loihica, S. pealeana, S. piezotolerans, and S. sediminisinvolves two additional genes, galPII and lacZ, that arepresumably involved in galactose and lactose catabolism.The predicted galactose permease GalPII belongs to theSSF superfamily and is similar to the myo-inositol andglucose transporters from mammals (34% identity). Theb-galactosidase LacZ has a candidate signal peptide clea-vage site and is presumably a secreted enzyme. We pro-pose that b-galactosides are degraded in the periplasmby LacZ, and the resulting D-galactose residues aretaken up by GalPII transporter (Fig. 2). The ompGal-lacZ-galTK-galPII-galE operon in two S. baltica strains(OS185 and OS223) encodes an additional novel trans-porter from TBDT family, named OmpGal, which islikely involved in lactose uptake through the outermembrane.The galTKPII operon identified in S. woodyi is pre-

sumably involved in D-galactose monosaccharide utiliza-tion, because of the absence of candidate galactosidehydrolase genes in this genome. An ortholog of thetranscriptional regulator gene galR from E. coli is clus-tered with the galTKPII operon in S. woodyi and withthe ompGal-lacZ-galTK-galPII-galE operon in two S. bal-tica strains, and candidate GalR-binding sites were iden-tified in their corresponding upstream regions (Fig. 3).The growth phenotype characterization of 14 Shewa-

nella species demonstrated that only three strains (S. loi-hica PV-4, and two S.batica strains) are able to grow onD-galactose (see additional data files 7 and 8). This pat-tern is consistent with the distribution of the predictedgalactose permease gene galPII in the analyzed Shewa-nella genomes (Table 3).A mosaic distribution of the galPII genes in two

groups of the Shewanella spp., and the presence oftheir orthologs in several other marine bacteria fromthe Alteromonadales lineage suggest multiple geneloss events in the evolutionary history of the Gal cata-bolic pathway in the Shewanella genus. The Gal utili-zation genes could have been lost in those Shewanellaspecies that do not share the same ecological nichewith some of the marine animals that are thoughtto provide a natural source of galactose and b-galactosides.

Novel sugar utilization pathway variants in ShewanellaThe reconstructed peripheral pathways in Shewanellaspp contain 62 variations distinguishing them fromthose previously described in model species, thus pro-viding a vivid illustration of the aforementioned intrinsicvariability of the sugar utilization machinery (see addi-tional data file 5). Most common are numerous cases ofnonorthologous gene replacements (corresponding to

novel FIGfams), when a functional role is encoded by agene that is not orthologous (and, in many cases, nothomologous) to any of the previously known genes ofthe same function [29]. In our analysis, such deviationsas well as a few cases of alternative biochemical routeswere initially recognized as inconsistencies or gaps(missing genes) in reconstructed pathways. Such gapswere filled-in by the most likely gene candidatesrevealed by genome context analysis (e.g. those func-tionally coupled with canonical genes of the respectivepathways via operons and/or regulons). This analysisalso allowed us to extend some of the pathways by add-ing components that would not be commonly perceivedas genuine gaps, such as transcriptional regulators andtransporters.Althogh this study is restricted to Shewanella spp.,

most of the newly assigned protein families containmultiple representatives outside the Shewanella genus,thus contributing to the reconstruction of respectivepathways in a variety of species. It is worth noting thatthe addition of 62 novel FIGfams led to an appreciable(> 12%) expansion of our original collection of 480FIGfams associated with sugar utilization subsystems(Fig. 1). Similar analysis applied to new groups of spe-cies may reveal additional FIGfams and their combina-tions involved in the carbohydrate utilizationmachinery, thus iteratively expanding the entire collec-tion. Of no less importance is the fact that nearly allof the newly assigned genes were originally identifiedas distant homologs of previously characterized geneswith distinct but related functions. For example, anovel Aga kinase was identified as a distant homologof other known sugar kinases; the novel D-glyceratetransporter - as a homolog of E. coli D-gluconate per-mease; the novel BglR repressor - as a homolog ofmany sugar-specific regulators from the LacI family.Although this observation obviously reflects intrinsiclimitations of homology-based predictions, it also deli-vers a more important message that our knowledge ofprotein families involved in the carbohydrate utiliza-tion is close to saturation at the level of general classfunctions recognized by homology-based methods.Otherwise, at least some of the gap-filling gene candi-dates predicted by genome context analysis would haveshared no homology with previously known compo-nents from the collection. Not a single example of thatkind was observed in this study. Moreover, not a singlegap has remained in any of the 17 reconstructed sugarutilization pathways clearly supporting the aboveinterpretation.Characteristic features of the Shewanella sugar catabo-

lome contrasted with E. coli and other Enterobacteriaare described below.

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 12 of 19

Page 13: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

Carbohydrate uptakestrategies appear to be strikingly different between She-wanella and Enterobacteria. Overall, 34 gene familiesencoding novel versions of sugar uptake systems wereidentified in Shewanella spp. Most notably, in contrastto an extensive repertoire of 21 PTS systems activelyused by E. coli for the uptake of various sugars [30],most Shewanella species contain only one PTS systemof unclear physiological role. As already discussed, thelack of Pfk genes blocks the conventional glycolyticroute in Shewanella making the ED pathway the onlyfeasible way of hexose utilization. The inability of thelatter pathway to sustain energy requirements of PTSsystems (due to phosphoenolpyruvate regeneration bal-ance) provides a rationale for their absence in Shewa-nella spp. Thus, the uptake of Glc, Man, Scr, Tre, Nag,and Aga in E. coli is mediated by dedicated PTS sys-tems, whereas in Shewanella species these sugars aretransported by predicted novel permeases of the GGPfamily, GlcP, ManP, ScrTII, TreT, NagP, AgaPII, respec-tively. A combination of inner membrane transportersfrom the GGP and several other sugar permease familieswith committed outer membrane transporters of theTBDT family appears to be the predominant strategy ofsugar uptake in Shewanella (Fig. 2). Committed TBDTouter membrane transporters were mapped for 10 ofthe 17 reconstructed sugar utilization pathways in She-wanella species based on their occurrence withinrespective operons and regulons (Table 2). The pre-dicted Shewanella TBDT transporters are functionallyequivalent to sugar-specific outer membrane porins ofEnterobacteria (e.g. BglH, ScrY, LamB, NanC). However,in contrast to porins mediating transport across theouter membrane by passive diffusion, TBDT transpor-ters employ a unique energizing mechanism utilizingthe TonB complex and the proton-motive force of thecytoplasmic membrane. It is tempting to speculate thata relative abundance of TBDT transporters in Shewa-nella and some other environmental Proteobacteriareflects their rather limited access to carbohydrates ascompared to Enterobacteria.Transcriptional regulationis another highly variable aspect of the sugar utilizationmachinery. Indeed, 11 of the 17 transcriptional regula-tors tentatively associated with Shewanella sugar utiliza-tion pathways are nonorthologous to their counterpartspreviously characterized in E. coli and other species(Table 2). Moreover, 9 of these transcription factorswere recruited from structurally unrelated proteinfamilies, and they are predicted to bind to completelydistinct DNA motifs. For example, the transcriptionalrepressor NagR of the Nag utilization pathway in She-wanella belongs to the LacI family, whereas a function-ally equivalent NagC regulator of E. coli belongs to the

ROK family. Therefore, a functional repertoire of co-regulated genes appears to be generally better conservedbetween species than the respective transcription factorsand their DNA binding motifs [31]. Members of theLacI family are most abundant among the regulators ofsugar catabolism in Shewanella. They control 10 of the17 reconstructed pathways, whereas the remaining path-ways are regulated by the members of GntR and DeoRfamilies (Fig. 3). The majority of genes from sugar cata-bolic pathways were identified as candidate members ofrespective sugar catabolic regulons in Shewanella gen-omes (see additional data file 3).Sugar catabolic enzymesare far less variable between species than associatedtransporters or regulators. In case of Shewanella, themost notable variations are associated with sugar phos-phorylation, which is at least partially due to a func-tional replacement of the uptake-coupledphosphorylation (characterstic of PTS) by a combinationof permease and kinase or, less frequently, phopshory-lase (for disaccharides). Thus, in the aforementionedNag and Aga pathways, novel sugar kinases NagK andAgaK combined with respective permeases (NagP andAgaP) are employed in Shewanella spp rather thancanonical PTS systems. In a novel variant of Scr utiliza-tion pathway predicted and experimentally validated inS. frigidimarina (see additional data file 6), the hydroly-sis of sucrose upon ScrT-mediated uptake and phos-phorylation of its glucose moiety is performed by thesucrose phosphorylase (ScrP). This single-enzyme trans-formation is a functional replacement of two consecu-tive enzymatic reactions, phosphotransferase by PTS(ScrA) followed by sucrose-6-phosphate hydrolase(ScrB), in a canonical version of the pathway describedin Enterobacteria [24]. Nonorthologous replacementsappear to be quite common for sugar kinases that areknown to occur in a number of distinct protein families.For example, the glucokinase GlkII of Shewanellabelongs to the ROK family, whereas its functional coun-terpart in E. coli and other Enterobacteria belongs tothe bacterial glucokinase family.

DiscussionMost heterotrophic bacteria are capable of utilizing atleast some carbohydrates as a source of carbon andenergy via a matching repertoire of sugar catabolic path-ways. Such pathways that include specialized transpor-ters and intracellular enzymes catalyzing biochemicaltransformations of a particular sugar into one of thecommon CCM intermediates often form operons con-trolled by committed transcription factors. Despite ouradvanced understanding of sugar catabolic pathways ina few model bacteria, their genomics-based projectionacross a variety of species from other taxonomic groups

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 13 of 19

Page 14: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

is quite challenging due to the intrinsic variability of thecarbohydrate utilization machinery. To address thischallenge we have established a subsystems-based com-parative genomic approach (Fig. 1), and assessed its uti-lity by building a genomic encyclopedia of carbohydratecatabolism (sugar catabolome) in the Shewanella genus.This analysis covered 19 complete genomes of diverseShewanella species isolated from various aquatic habi-tats. The key components of our approach include: (i)using a collection of protein families capturing the cur-rent knowledge of the microbial carbohydrate utilizationmachinery for homology scanning of target genomesand preliminary identification of gene candidates; (ii)analyzing genomic and functional contexts to recognizefunctional equivalents of previously characterized genesand to predict novel genes and pathway variants; (iii)validating selected bioinformatic predictions by a combi-nation of biochemical, genetic and physiologicalexperiments.One of the critical factors that contributed to the suc-

cessful outcome of this analysis was the availability of alarge set of completely sequenced genomes (largely dueto the DOE Joint Genome Institute, http://www.jgi.doe.gov/) representing a substantial phylogenetic, geographicand ecophysiological diversity within an otherwise com-pact taxonomic group – Shewanella genus [1]. This wasparticularly instrumental for mapping and prediction ofregulons as well as for reliably establishing orthologyrelationships within protein families containing multipleparalogs. In addition to improving our knowledge ofgenomics, functional organization and evolution of thesugar catabolome, this study confirmed the efficiency ofthe established approach, which is scalable and applic-able to other groups of microorganisms. A systematicapplication of this approach to a growing number ofdivergent bacterial lineages would allow us to rapidlyexpand the current genomic knowledgebase and estab-lish the capability of highly accurate automated annota-tion and assertion of sugar catabolic pathways in allbacteria with completely sequenced genomes. Moreover,a fine-grain functional annotation of all protein families(including those with multiple branches and paralogs)that comprise carbohydrate utilization machinery incomplete genomes will enable accurate recognition ofthe corresponding functions and pathway variants inmetagenomic samples.An overall internal consistency of the reconstructed

catabolic machinery confirmed the efficiency of thebioinformatic workflow used in this study. The currentversion of the genomic encyclopedia of carbohydrateutilization pathways of the Shewanella genus has neitherobvious gaps such as “missing genes” (required by func-tional context but not identified in the genomes), norinconsistencies such as “disconnected genes” (those that

were functionally assigned but lack any functional con-text). The only notable exception is an already men-tioned putative PTS system whose function remainsobscure despite its high similarity with E. coli glucosePTS. Additional support of the developed genomicreconstruction was provided by direct experimental test-ing of selected bioinformatic predictions using both invitro assays of purified recombinant enzymes and in vivophenotype assessment of gene knockouts in Shewanellaand genetic complementation in E. coli. Despite theoverall successful outcome and utility of these experi-ments, the scale of this reconstruction makes a tradi-tional targeted verification of individual conjecturesrather impractical. On the other hand, a predicted pat-tern of presence/absence of certain sugar utilizationpathways within a representative group of species maybe seamlessly translated to a predicted testable pheno-type, the ability/inability of these species to metabolizerespective sugars.We assessed the panel of 14 Shewanella species for

the aerobic growth on a series of 18 sugars used as asingle source of carbon and energy (see additional datafiles 7 and 8). The results of the extended growth profil-ing in this study revealed a remarkable consistencybetween predicted and observed phenotypes for theentire matrix of tested strains and sugars despite somestrain-to-strain variations in the growth rate and effi-ciency (Table 3). In addition, the absence of the D-xylose and D-fructose catabolic pathways in the entireShewanella group was also confirmed by their inabilityto grow on either of these sugars as a carbon and energysource. The only two observed inconsistencies, theinability of S. oneidensis to grow on maltodextrin and ofS. frigidimarina to grow on mannitol, can be rationa-lized by genetic rearrangements. Among the genes inthe mal locus of S. oneidensis disrupted by insertion ele-ments and frameshifts is an already mentioned transpor-ter GlcPMal. This observation also provided anexplanation of the intrinsic inability of S. oneidensis togrow on Glc, in contrast to many other Shewanella spp,including ANA-3, where the role of this transporter inGlc utilization was experimentally confirmed. It is worthmentioning that our phenotype predictions are largelyconsistent with the results of earlier growth studies pub-lished for a limited set of strains and sugar substrates[13,15,32-35].Analysis of the distribution of sugar utilization path-

ways within the Shewanella genus and a comparison withE. coli and other bacteria with different life-styles andnatural environments yields interesting observations andconjectures about their adaptive evolution (Fig. 4). Aclear distinction between the core and the rare pathwaysreflects an apparent difference in their evolutionary his-tory. As mentioned above, the former group probably

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 14 of 19

Page 15: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

existed in a common ancestor of Shewanella spp. More-over, some of these pathways appear to remain essentialin most of the diverse Shewanella habitats as they werelost in only a few rare cases. Thus, the Nag pathway ispreserved in 18 and the Grt pathway in 17 of the 19 com-pared species. This is reminiscent of the distribution of arecently described lactate utilization machinery [8],which is present in all but one species consistent with itsproposed role of the main natural supplier of carbon andenergy characteristic of the entire Shewanella genus. Onthe other hand, rare pathways (present in 1-4 species)have more likely originated via lateral gene transferevents, possibly from species sharing the same ecologicalniche. This may be illustrated by the example of the Gntutilization pathway that is present in all 4 sequencedstrains of S. baltica but not in any other Shewanella spp.A conservation of sequence and genomic arrangementwithin the Gnt and Rbs gene loci between some closelyrelated Shewanella strains and distant members of Enter-obacteriaceae, Pasteurellaceae and Vibrionaceae points totheir likely acquisition via a relatively recent lateral genetransfer event (see additional data file 10).

ConclusionsOverall, the results of this study revealed elements ofconservation and variation that appear to be characteris-tic of the sugar utilization machinery. All reconstructedpathways show a strong tendency to be encoded withincompact operons and regulons, and nearly all of thenewly identified individual genes are at least distantlyhomologous (and functionally related) to those pre-viously characterized in other species and included inour original collection of sugar utilization subsystems.The reconstructed sugar catabolome in Shewanella

spp utilizes 170 distinct proteins forming 17 peripheralsugar catabolic pathways and CCM (Table 2, Fig. 1). Anapparent “core” subset of peripheral pathways for theutilization of Nag, Grt, Mal and Glc is conserved in>70% of compared Shewanella genomes. This level ofconservation not only points to their ancestral originbut to an apparent importance of these pathways (andrespective sugars) for the physiology of Shewanella sppin various ecosystems. All other peripheral pathways arepresent in <50% of the compared genomes, and amongthem, 9 pathways are present only in 1-4 species. Thelatter group seemingly originates from the lateral genetransfer and likely reflects adaptation to specific envir-onmental conditions, which is particularly notable in S.frigidimarina and S. pealeana, each containing nonover-lapping sets of 3 “rare” pathways (Fig. 4). Even the mostversatile Shewanella species implement only a fractionof the entire pan-Shewanella sugar catabolome reflectiveof their relatively “lean sugar diet”.

MethodsThe subsystems approach and the SEED platformOur approach to the reconstruction of sugar catabolicpathways in a selected group of genomes was based onfunctional gene annotation and prediction using twoprincipal comparative genomics techniques: (i) homol-ogy-based methods and (ii) genome context analysis.Both these methods are implemented in the SEED geno-mic platform http://theseed.uchicago.edu/FIG/ thatcombines a large and rapidly growing integration of>700 complete annotated genomes (mostly bacterial)with advanced tools for comparative analysis, geneannotation, genome context analysis and functionalreconstruction based on subsystems technology [7]. Newgenomes are automatically annotated by the RAST ser-ver http://rast.nmpdr.org/, a new generation of subsys-tem-based genome annotation tools [36]. Subsystems inthe SEED provide the framework for further improve-ment of these annotations and functional predictions.They are sets of functional roles that capture the currentknowledge of cellular processes and metabolic pathwaysincluding interspecific variation. Each functional role istypically associated with a set of homologous genes thatimplement this role in specific organisms. In addition tohomology-based analysis suggesting at least general classgene functional assignments, genome context analysisprovides evidence of functional coupling between genesof known and unknown functions [10]. The most com-mon type of functional coupling evidence comes fromthe tendency of functionally related genes (e.g., membersof the same pathway) to be clustered on the chromo-some. Other important types of evidence are domainfusion events, conservation of upstream regulatory sites(i.e., reg-ulons) and co-occurrence profiles of genesacross a range of genomes. We used the tools in SEEDand other public servers to compute and analyze alltypes of functional coupling evidence for each genefamily in our analysis.

Reconstruction of regulonsFor identification of a candidate regulatory motif for aparticular sugar catabolic pathway we started from atraining set of potentially co-regulated genes participat-ing in the pathway. Upstream regions of genes from thetraining set and their orthologs from multiple Shewa-nella genomes were used as an input for a DNA motifdetection algorithm. A simple iterative procedure imple-mented in the program SignalX (as described previouslyin [31]) was used for construction of a common tran-scription factor-binding motif in sets of upstream genefragments. Each genome encoding the studied transcrip-tion factor was scanned with the constructed profileusing the GenomeExplorer software [37], and genes

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 15 of 19

Page 16: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

with candidate regulatory sites in the upstream regionswere selected [38]. The threshold for the site search wasdefined as the lowest score observed in the training set.Among new candidate members of a regulon, onlygenes having candidate sites conserved in at least twoother genomes were retained for further analysis. Wealso included new candidate regulon members that arefunctionally related to the reconstructed sugar catabolicpathways. Sequence logos for derived regulatory motifswere drawn using the WebLogo package http://weblogo.berkeley.edu[39]. The details of reconstructed regulonsare captured and displayed in the specialized databaseRegPrecise http://regprecise.lbl.gov[40].

WorkflowReconstruction of metabolic and regulatory pathwaysinvolved in the carbohydrates utilization was performedfor 19 species of the Shewanella genus with completelysequenced genomes uploaded from Genbank and inte-grated in the SEED genomic platform. The overall work-flow is illustrated by Fig. 1. First we performed a surveyof all prokaryotic genes known to be involved or poten-tially involved in utilization of mono- and di-sacchar-ides. A collection of ~480 FIGfams from 35 SEED sugarmetabolic subsystems were classified by their generalfunctional role (i.e. sugar transport, transcriptional regu-lation, biochemical transformation, and upstream/auxili-ary) (see additional data file 1). Each FIGfam comprisesa functionally uniform group of orthologous proteins inrelated organisms. This extensive collection was thenused for homology searches against 19 Shewanella gen-omes resulting in identification of numerous FIGfamspotentially implicated in sugar metabolism. Manualcuration of the identified Shewanella FIGfams using theSEED and other genomic resources and tools (seebelow) rejected many of them as well as identified someadditional candidate FIGfams based on genome contextanalysis. As a result of this iterative process we haveidentified ~170 FIGfams present in at least one Shewa-nella genome and tentatively assigned a role in utiliza-tion of a paticular sugar substrate (see additional datafile 2). The identified FIGfams were used for metabolicreconstruction of Shewanella sugar catabolic pathwaysusing the subsystems-based approach. The refined func-tional annotations were combined in the aggregatedSEED subsystem ‘Sugar catabolome in Shewanella spe-cies’. Besides SEED, we routinely use other bioinformatictools and databases featuring: genomes (Genbank), geneannotations (UniProt, IMG), primary literature(PubMed), reactions and pathways (KEGG, BioCyc),conserved domains and motifs (COG, PFAM, ProDom),distant homology searches and alignments (PsiBlast,FFAS, T-Coffee), genome context and occurence profiles(STRING, Microbes on Line), transcriptional regulation

(RegTransBase, RegulonDB), protein localization predic-tion (TMPRED, SignalP).

Experimental validation of functional predictionsBacterial strains, plasmids, and reagentsE. coli strains DH5a (Invitrogen, Carlsbad, CA) and theknockout mutant ΔbglF from the KEIO collection(a kind gift from Dr. Mori) [41], were used for comple-mentation analyses. E. coli BL21/DE3 (Gibco-BRL, Rock-ville, MD) was used for protein overexpression andpurification. For expression of genes in E. coli, a pBAD-TOPO vector containing the arabinose promoter(Invitrogen, Carlsbad, CA) or a pET-derived vector con-taining the T7 promoter, His6 tag, and tobacco etchvirus-protease cleavage site were used. Enzymes for PCRand DNA manipulations were from New England Bio-labs Inc. (Beverly, MA). Plasmid purification kits werefrom Promega (Madison, WI). PCR purification kits andnickel-nitrilotriacetic acid (Ni-NTA) resin were fromQIAGEN Inc. (Valencia, CA). Oligonucleotides for PCRand sequencing were synthesized by Sigma-Genosys(Woodlands, TX). All other chemicals, includingsucrose, cellobiose, NADH, ATP, phosphenolpyruvate,lactate dehydrogenase, pyruvate kinase, D-glucose,2-deoxy-D-glucose, D-mannose, D-galactose, D-allose,D-fructose, L-sorbose, D-tagatose, D-glucosamine, D-mannosamine, D-galactosamine, N-acetylglucosamine,N-acetylmannosamine, N-acetylgalactosamine were pur-chased from Sigma-Aldrich (St. Louis, MO).PCR amplification and cloningThe 2.9-kb fragment from Shewanella frigidimarinaNCIMB400, which contains two divergently transcribedgenes, scrTII and scrP (Sfri_3989 and Sfri_3990, respec-tively) and the intergenic region, was amplified usingthe primers: 5′-gcgTTATTTAGCGTCTGCGGTCAA-CAAATG (forward-1) and 5′-gagTTATCCGCTGTGTT-TAGCCAGTAAATC (reverse-1). For cloning thefragment containing scrTII only and the intergenicregion, the primers of forward-1 and 5′-CGCTATAC-GATCTACGTAAGTGATCAGCTG were used. Forcloning the fragment containing scrP only and the inter-genic region, the primers of 5′-CTCATTATCCTTCA-TATGGTTAACCATAC and reverse-1 were used.The 2.7-kb fragment from Shewanella baltica OS155,

which contains the bglAIand bglT genes (Sbal_0544 andSbal_0545, repectively), was amplified using the primers:5′-gaggaataataaATGAAAATATCTTTACCAAAG (for-ward-2) and 5′-cacTTAGCTGGCTCTATTTAATTC-CAG (reverse-2). For cloning Sbal_0544 only, theprimers of forward-2 and 5′-gccTTATTTATTATTGT-TATTAGTAAAGTGAGC were used. For cloningSbal_0545 only, the primers of 5′-gaggaataataaATGAT-TAGCATAAAAGAAAAAATAG and reverse-2 wereused. Nucleotides not present in the original sequence

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 16 of 19

Page 17: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

are shown in lowercase. PCR amplification was per-formed using the genomic DNAs of S. frigidimarinaNCIMB400 and S. baltica OS155. The PCR fragmentsas describe above were cloned into the pBAD-TOPOexpression vector.The glkII gene from S. baltica OS155 (Sbal_1134)

was amplified using the primers: 5′-ggcgcacATGTTAC-GAATTGGTATCGATCTTG (forward-3) and 5′-gcaacgtcgacTTAGCGTCCCCACAACCAAGC (reverse-3). Introduced restriction sites (PciI for forward-3 pri-mer and SalI for reverse-3 primer) are shown in bold-face. PCR fragment was cloned into the pET-derivedexpression vector cleaved by NcoI and SalI. Selectedclones were confirmed by DNA sequence analysis.Protein purificationRecombinant protein of glkII (Sbal_1134) from S. bal-tica OS155 was overexpressed as N-terminal fusionwith a His6 tag in E. coli strain BL21/DE3. Cells weregrown on LB media to OD600 = 0.8 at 37°C, inducedby 0.2 mM IPTG, and harvested after 12 h shaking at20°C. Protein purification was performed using rapidNi-NTA agarose minicolumn. Briefly, harvested cellswere resuspended in 20 mM HEPES buffer pH 7containing 100 mM NaCl, 0.03% Brij 35, and 2 mM b-mercaptoethanol supplemented with 2 mM phenyl-methylsulfonyl fluoride and a protease inhibitorcocktail (Sigma-Aldrich). Lysozyme was added to1 mg/mL, and the cells were lyzed by freezing-thawingfollowed by sonication. After centrifugation at 18,000rpm, the Tris-HCl buffer (pH 8) was added to thesupernatant (50 mM, final concentration), and it wasloaded onto a Ni-NTA agarose column (0.2 ml). Afterwashing with the starting buffer containing 1 M NaCland 0.3% Brij-35, bound proteins were eluted with 0.3ml of the starting buffer containing 250 mM imidazole.Protein size, expression level, distribution betweensoluble and insoluble forms, and extent of purificationwere monitored by SDS-PAGE.Complementation analysesIn all complementation analyses, cells were pre-culturedon LB media to exponential growth phase, harvested bycentrifugation, and washed for three times with M9minimal media without any carbon sources. All cultureswere started with the same optical density at 600 nm(OD600 nm = 0.03), and performed at 37°C in triplicatesin 200 μl of the respective media. The cell growth wasmonitored spectrophotometrically at 600 nm using amicroplate reader (ELx808, BioTek Inc., Winnoski,Vermont).Recombinant proteins of S. frigidimarina NCIMB400

ScrTII (Sfri_3989) only, ScrP (Sfri_3990) only, andScrTII-ScrP were expressed in E. coli K-12 strain DH5aunder the control of endogenous promoter in the inter-genic region. The empty pBAD-TOPO vector was

expressed in the same strain and used as a negative con-trol. The complementation analysis was performed onM9 minimal media supplemented with 50 μg/ml of thia-mine and 20 mM of sucrose.Recombinant proteins of S. baltica OS155 BglAI

(Sbal_0544) only, BglT (Sbal_0545) only, and BglAI-BglT were expressed under the control of arabinose pro-moter in E. coli ΔbglF mutant. The empty pBAD-TOPOvector was expressed in the same strain and used as anegative control. The complementation analysis wasperformed on M9 minimal media supplemented with0.15% of L-arabinose and 20 mM of cellobiose.In vitro enzyme assaysGlucokinase activity was assayed by coupling the forma-tion of ADP to the oxidation of NADH to NAD+ viapyruvate kinase and lactate dehydrogenase and moni-tored at 340 nm. Briefly, 0.1-0.2 μg of purified glucoki-nase was added to 200 μL of reaction mixturecontaining 50 mM Tris buffer (pH 7.5), 10 mM MgSO4,1.2 mM ATP, 1.2 mM phosphoenolpyruvate, 0.3 mMNADH, 1.2 U of pyruvate kinase, 1.2 U of lactate dehy-drogenase, and 5 mM D-glucose at 37°C. No activitywas detected in a control experiment, in which an unre-lated gene (SO3505) was expressed in the same vectorand purified in parallel. The substrate specificities wereexamined by using the same assay method and exchan-ging glucose for 5 mM of other potential substrates:2-deoxyglucose, D-mannose, D-galactose, D-allose,D-fructose, L-sorbose, N-acetyl-D-mannosamine,N-acetyl-D-glucosamine, N-acetyl-D-galactosamine,D-galactosamine, D-glucosamine, D-mannosamine. Inthe coupled assays, the change in NADH absorbance wasmonitored at 340 nm using a Beckman DTX-880 multi-mode microplate reader. An NADH extinction coefficientof 6.22 mM-1cm-1 was used for rate calculation.Phenotypic analysis of Shewanella sppTotal 14 Shewanella strains were tested for their abilityto grow on 20 different carbon sources as a sole carbonand energy source. D-/L-lactate mixture was used as acontrol. Growth conditions and other details of two dif-ferent experimental techniques used for growth pheno-type analysis are provided in additional files 7 and 8.In-frame deletion mutagenesisIn-frame deletion mutagenesis of glyT (SO1771) or nagP(SO3503) was performed using previously publishedmethod [42] with minor modifications. Upstream anddownstream fragments flanking the target locus werePCR amplified using S. oneidensis MR-1 genomic DNAand fused via overlap extension PCR. The fusion PCRamplicon was ligated into XcmI-digested pDS3.0. Theresulting recombinant plasmids were used to transformE. coli ß-2155 or WM3063 and subsequently transferredto S. oneidensis strain MR-1 by conjugation. The pri-mary integrants were selected by plating on LB medium

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 17 of 19

Page 18: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

containing 7.5 μg/ml gentamycin. The screening for asecond round of homologous recombination to removethe plasmid sequence was accomplished by counter-selection on LB medium with 5% sucrose. Coloniesgrowing in the presence of sucrose were screened forsensitivity to gentamycin (7.5 μg/ml) and then screenedfor deletion of the gene of interest using PCR. Theresulting PCR amplicon was then used as the templatefor DNA sequencing of the deleted and flanking regionsinvolved in the recombination events (ACGT, Inc.Wheeling, IL). In order to validate that the observedphenotype could be attributed to the targeted deletion,complementing plasmids were constructed by insertingappropriate genes downstream to the lac promoterencoded in pBBR1MCS-5.

Additional material

Additional file 1: Collection of protein families involved incarbohydrate utilization in bacteria.

Additional file 2: Summary on distribution of carbohydrateutlization genes in Shewanella spp.

Additional file 3: Complete distribution of sugar catabolome genesin Shewanella genomes.

Additional file 4: Description of rare sugar utilization pathways inShewanella.

Additional file 5: Predicted novel functional roles for isofunctionalprotein families from Shewanella sugar catabolome.

Additional file 6: Experimental verification of novel enzymes,transporters, and regulators involved in sugar utilization inShewanella. A. Phenotypic characterization of glcPMal (Shewana3_2310)for its involvement in glucose utilization in Shewanella sp. ANA-3; B.Phenotypic characterization of nagP (SO3503) for its involvement in N-acetylglucosamine (Nag) utilization in Shewanella oneidensis MR-1; C.Phenotypic characterization of grtP (SO1771) for its involvement in D-glycerate; D. Complementation of the E. coli cellobiose utilization by thebglA-bglT (Sbal_1133-1132) genes from S. baltica OS155; E. Substratespecificity of Shewanella baltica OS155 GlkII (Sbal_1134) kinase; F. Growthof E. coli DH5a strain (Scr-) containing heterologously expressed sucroseutilization genes scrTII-scrP (Sfri_3989-3990) from S. frigidimarina.

Additional file 7: Growth phenotypes of Shewanella on variouscarbon sources determined by manual assay.

Additional file 8: Growth phenotypes of Shewanella on variouscarbon sources determined by microplate assay using Bioscreen CMBR system.

Additional file 9: Comparison of reconstructed sugar utilizationpathways in Shewanella and Enterobacteria. A. N-acetylglucosamine(Nag) and chitin utilization pathways; B. D-glycerate (Grt) and glucarate/galactarate utilization pathways; C. b-glucoside (Bgl) utilization pathway;D. Sucrose (Scr) utilization pathway; E. L-arabinose (Ara) and arabinosidesutilization pathways.

Additional file 10: The gluconate (A) and ribose (B) utilization geneloci in some closely related Shewanella strains and other g-proteobacteria.

List of abbreviationsPTS: phosphotransferase system; FIGFAMS: isofunctional protein families inthe SEED database; CCN: central carbon metabolism; ED pathway: Entner-Doudoroff pathway; PP pathway: pentose phosphate pathway; TBDT: TonB-dependent outer membrane transporter; Aga: N-acetylgalactosamine; Ara: L-

arabinose; Bgl: b-glucoside (cellobiose); Gal: D-galactose; Glc: D-glucose; Gnt:D-gluconate; Grt: D-glycerate; Mal: maltodextrin; Man: mannosides; Mtl:mannitol; Nag: N-acetylglucosamine; Nan: sialic acids; Rbs: D-ribose; Scr:sucrose; Tre: trehalose; Xlt: xylitol.

Authors’ contributionsDAR, CY, and AO conceived and supervised the research, and wrote themanuscript. DAR performed comparative genomic analysis to infer novelmetabolic pathways and regulons. RO and OPZ performed computationalsimilarity searches, gene annotation and subsystem encoding in the SEEDdatabase. CY, XL, and IAR carried out genetic complementation experiments.MFR and SR produced targeted gene knockout strains in Shewanella. YW,AYO, and SR performed phenotypic characterization of Shewanella strains.JKF and KHN contributed to the development of the manuscript and designof the study. All authors read and approved the final manuscript.

AcknowledgementsWe are grateful to Pavel Novichkov (LBNL) for help with regulon analysisand presentation in RegPrecise database. This research was supported bythe U.S. Department of Energy (DOE) Office of Biological and EnvironmentalResearch under the Genomics:GTL Program via the Shewanella Federationconsortium and the Microbial Genome Program (MGP). Pacific NorthwestNational Laboratory is operated for the DOE by Battelle Memorial Instituteunder Contract DE-AC05-76RLO 1830. D.A.R. was partially supported byNational Science Foundation (DBI-0850546), Russian Foundation for BasicResearch (10-04-01768), Russian Academy of Sciences program ‘Molecularand Cellular Biology’ and Russian President’s grant for young scientists (MK-422.2009.4). C. Y. was supported by National Science Foundation of China(30970035) and One-hundred-Talented-People program of Chinese Academyof Sciences (KSCX2-YW-G-029).

Author details1Burnham Institute for Medical Research, La Jolla, California 92037, USA.2Institute for Information Transmission Problems, Russian Academy ofSciences, Moscow 127994, Russia. 3Key Laboratory of Synthetic Biology,Institute of Plant Physiology and Ecology, Shanghai Institutes for BiologicalSciences, Chinese Academy of Sciences, Shanghai 200032, China.4Department of Earth Sciences, University of Southern California, LosAngeles, California 90089, USA. 5Fellowship for Interpretation of Genomes,Burr Ridge, Illinois 60527, USA. 6Biological Sciences Division, PacificNorthwest National Laboratory, Richland, Washington 99352, USA. 7J. CraigVenter Institute, San Diego, California 92121, USA.

Received: 24 June 2010 Accepted: 13 September 2010Published: 13 September 2010

References1. Fredrickson JK, Romine MF, Beliaev AS, Auchtung JM, Driscoll ME,

Gardner TS, Nealson KH, Osterman AL, Pinchuk G, Reed JL, Rodionov DA,Rodrigues JL, Saffarini DA, Serres MH, Spormann AM, Zhulin IB, Tiedje JM:Towards environmental systems biology of Shewanella. Nat Rev Microbiol2008, 6:592-603.

2. Hau HH, Gralnick JA: Ecology and biotechnology of the genusShewanella. Annu Rev Microbiol 2007, 61:237-258.

3. Serres MH, Riley M: Genomic analysis of carbon source metabolism ofShewanella oneidensis MR-1: Predictions versus experiments. J Bacteriol2006, 188:4601-4609.

4. Pinchuk GE, Ammons C, Culley DE, Li SM, McLean JS, Romine MF,Nealson KH, Fredrickson JK, Beliaev AS: Utilization of DNA as a sole sourceof phosphorus, carbon, and energy by Shewanella spp.: ecological andphysiological implications for dissimilatory metal reduction. Appl EnvironMicrobiol 2008, 74:1198-1208.

5. Driscoll ME, Romine MF, Juhn FS, Serres MH, McCue LA, Beliaev AS,Fredrickson JK, Gardner TS: Identification of diverse carbon utilizationpathways in Shewanella oneidensis MR-1 via expression profiling. GenomeInform 2007, 18:287-298.

6. Osterman A, Overbeek R: Missing genes in metabolic pathways: acomparative genomics approach. Curr Opin Chem Biol 2003, 7:238-251.

7. Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, deCrecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S,Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N,

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 18 of 19

Page 19: RESEARCH ARTICLE Open Access Genomic encyclopedia ......RESEARCH ARTICLE Open Access Genomic encyclopedia of sugar utilization pathways in the Shewanella genus Dmitry A Rodionov1,2,

Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H,Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA,Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O,Vonstein V: The subsystems approach to genome annotation and its usein the project to annotate 1000 genomes. Nucleic Acids Res 2005,33:5691-5702.

8. Pinchuk GE, Rodionov DA, Yang C, Li X, Osterman AL, Dervyn E,Geydebrekht OV, Reed SB, Romine MF, Collart FR, Scott JH, Fredrickson JK,Beliaev AS: Genomic reconstruction of Shewanella oneidensis MR-1metabolism reveals a previously uncharacterized machinery for lactateutilization. Proc Natl Acad Sci USA 2009, 106:2874-2879.

9. Yang C, Rodionov DA, Li X, Laikova ON, Gelfand MS, Zagnitko OP,Romine MF, Obraztsova AY, Nealson KH, Osterman AL: Comparativegenomics and experimental characterization of N-acetylglucosamineutilization pathway of Shewanella oneidensis. J Biol Chem 2006,281:29872-29885.

10. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N: The use of geneclusters to infer functional coupling. Proc Natl Acad Sci USA 1999,96:2896-2901.

11. Meyer F, Overbeek R, Rodriguez A: FIGfams: yet another set of proteinfamilies. Nucleic Acids Res 2009, 37:6643-6654.

12. Scott JH, Nealson KH: A biochemical study of the intermediary carbonmetabolism of Shewanella putrefaciens. J Bacteriol 1994, 176:3408-3411.

13. Bowman JP, McCammon SA, Nichols DS, Skerratt JH, Rea SM, Nichols PD,McMeekin TA: Shewanella gelidimarina sp. nov. and Shewanellafrigidimarina sp. nov., novel Antarctic species with the ability to produceeicosapentaenoic acid (20:5 omega 3) and grow anaerobically bydissimilatory Fe(III) reduction. Int J Syst Bacteriol 1997, 47:1040-1047.

14. Fonnesbech Vogel B, Venkateswaran K, Satomi M, Gram L: Identification ofShewanella baltica as the most important H2S-producing species duringiced storage of Danish marine fish. Appl Environ Microbiol 2005,71:6689-6697.

15. Leonardo MR, Moser DP, Barbieri E, Brantner CA, MacGregor BJ, Paster BJ,Stackebrandt E, Nealson KH: Shewanella pealeana sp. nov., a member ofthe microbial community associated with the accessory nidamentalgland of the squid Loligo pealei. Int J Syst Bacteriol 1999, 49:1341-1351.

16. Essenberg RC, Candler C, Nida SK: Brucella abortus strain 2308 putativeglucose and galactose transporter gene: cloning and characterization.Microbiology 1997, 143:1549-1555.

17. Titgemeyer F, Reizer J, Reizer A, Saier MH Jr: Evolutionary relationshipsbetween sugar kinases and transcriptional repressors in bacteria.Microbiology 1994, 140:2349-2354.

18. Romine MF, Carlson TS, Norbeck AD, McCue LA, Lipton MS: Identificationof mobile elements and pseudogenes in the Shewanella oneidensis MR-1genome. Appl Environ Microbiol 2008, 74:3257-3265.

19. Roehl RA, Vinopal RT: Lack of glucose phosphotransferase function inphosphofructokinase mutants of Escherichia coli. J Bacteriol 1976,126:852-860.

20. Monterrubio R, Baldoma L, Obradors N, Aguilar J, Badia J: A commonregulator for the operons encoding the enzymes involved in D-galactarate, D-glucarate, and D-glycerate utilization in Escherichia coli. JBacteriol 2000, 182:2672-2674.

21. Reizer J, Reizer A, Saier MH Jr: The cellobiose permease of Escherichia coliconsists of three proteins and is homologous to the lactose permeaseof Staphylococcus aureus. Res Microbiol 1990, 141:1061-1067.

22. Laikova ON, Mironov AA, Gelfand MS: Computational analysis of thetranscriptional regulation of pentose utilization systems in the gammasubdivision of Proteobacteria. FEMS Microbiol Lett 2001, 205:315-322.

23. Schauer K, Rodionov DA, de Reuse H: New substrates for TonB-dependenttransport: do we only see the ‘tip of the iceberg’? Trends Biochem Sci2008, 33:330-338.

24. Reid SJ, Abratt VR: Sucrose utilisation in bacteria: genetic organisationand regulation. Appl Microbiol Biotechnol 2005, 67:312-321.

25. Trindade MI, Abratt VR, Reid SJ: Induction of sucrose utilization genesfrom Bifidobacterium lactis by sucrose and raffinose. Appl EnvironMicrobiol 2003, 69:24-32.

26. Andersen C, Rak B, Benz R: The gene bglH present in the bgl operon ofEscherichia coli, responsible for uptake and fermentation of beta-glucosides encodes for a carbohydrate-specific outer membrane porin.Mol Microbiol 1999, 31:499-510.

27. Boos W, Shuman H: Maltose/maltodextrin system of Escherichia coli:transport, metabolism, and regulation. Microbiol Mol Biol Rev 1998,62:204-229.

28. Watanabe S, Shimada N, Tajima K, Kodaki T, Makino K: Identification andcharacterization of L-arabonate dehydratase, L-2-keto-3-deoxyarabonatedehydratase, and L-arabinolactonase involved in an alternative pathwayof L-arabinose metabolism. Novel evolutionary insight into sugarmetabolism. J Biol Chem 2006, 281:33521-33536.

29. Omelchenko MV, Galperin MY, Wolf YI, Koonin EV: Non-homologousisofunctional enzymes: a systematic analysis of alternative solutions inenzyme evolution. Biol Direct 2010, 5:31.

30. Tchieu JH, Norris V, Edwards JS, Saier MH Jr: The completephosphotranferase system in Escherichia coli. J Mol Microbiol Biotechnol2001, 3:329-346.

31. Rodionov DA: Comparative genomic reconstruction of transcriptionalregulatory networks in bacteria. Chem Rev 2007, 107:3467-3497.

32. Brettar I, Christen R, Hofle MG: Shewanella denitrificans sp. nov., avigorously denitrifying bacterium isolated from the oxic-anoxic interfaceof the Gotland Deep in the central Baltic Sea. Int J Syst Evol Microbiol2002, 52:2211-2217.

33. Zhao JS, Manno D, Beaulieu C, Paquet L, Hawari J: Shewanella sediminissp. nov., a novel Na+-requiring and hexahydro-1,3,5-trinitro-1,3,5-triazine-degrading bacterium from marine sediment. Int J Syst EvolMicrobiol 2005, 55:1511-1520.

34. Zhao JS, Manno D, Leggiadro C, O’Neil D, Hawari J: Shewanella halifaxensissp. nov., a novel obligately respiratory and denitrifying psychrophile. IntJ Syst Evol Microbiol 2006, 56:205-212.

35. Ziemke F, Hofle MG, Lalucat J, Rossello-Mora R: Reclassification ofShewanella putrefaciens Owen’s genomic group II as Shewanella balticasp. nov. Int J Syst Bacteriol 1998, 48(Pt 1):179-186.

36. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K,Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL,Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD,Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O: The RASTServer: rapid annotations using subsystems technology. BMC Genomics2008, 9:75.

37. Mironov AA, Vinokurova NP, Gel’fand MS: [Software for analyzing bacterialgenomes]. Mol Biol (Mosk) 2000, 34:253-262.

38. Gelfand MS, Koonin EV, Mironov AA: Prediction of transcription regulatorysites in Archaea by a comparative genomic approach. Nucleic Acids Res2000, 28:695-705.

39. Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logogenerator. Genome Res 2004, 14:1188-1190.

40. Novichkov PS, Laikova ON, Novichkova ES, Gelfand MS, Arkin AP, Dubchak I,Rodionov DA: RegPrecise: a database of curated genomic inferences oftranscriptional regulatory interactions in prokaryotes. Nucleic Acids Res2010, 38:D111-118.

41. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA,Tomita M, Wanner BL, Mori H: Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol2006, 2:2006.0008.

42. Wan XF, Verberkmoes NC, McCue LA, Stanek D, Connelly H, Hauser LJ,Wu L, Liu X, Yan T, Leaphart A, Hettich RL, Zhou J, Thompson DK:Transcriptomic and proteomic characterization of the Fur modulon inthe metal-reducing bacterium Shewanella oneidensis. Journal ofBacteriology 2004, 186:8385-8400.

doi:10.1186/1471-2164-11-494Cite this article as: Rodionov et al.: Genomic encyclopedia of sugarutilization pathways in the Shewanella genus. BMC Genomics 201011:494.

Rodionov et al. BMC Genomics 2010, 11:494http://www.biomedcentral.com/1471-2164/11/494

Page 19 of 19