beyond metagenomics- integration of complementary approaches for the study of microbial communities

1

Beyond metagenomics: Integration of complementary approaches for the study of microbial communities

1,2Andrés Cubillos-Ruiz, 1,2Howard Junca, 2,3Sandra Baena, 2,4Ivonne Venegas and 1, 2María Mercedes Zambrano

1Corpogen Research Center, Carrera 5 No. 66A – 34, Bogotá, Colombia. 2Colombian Center for Genomics and Bionformatics of Extreme Environments - Gebix, Carrera 5 No. 66A – 34, Bogotá, Colombia 3Department de Biology, Pontificia Universidad Javeriana, POB 56710, Bogotá, Colombia 4Department de Microbiology, Pontificia Universidad Javeriana, POB 56710, Bogotá, Colombia

Abstract

Advances in genomics have had a great impact on the field of microbial ecology. Metagenomics in particular holds great promise for accessing and characterizing microbial communities. However, the high diversity and level of complexity present in microbial communities represent an obstacle to understanding these assemblages given the current approaches. The integration of microbial community structure with function, taking into account uncultured microbes in diverse environments, remains particularly challenging. The anticipated increase in metagenomic data available in the future will require high-throughput methods for data management and analysis of these large and complex microbial communities. Integration of complementing technologies like microarrays, high throughput sequencing and bioinformatics and of novel tools and “meta” approaches, such as metaproteomics, metatranscriptomics and meta-metabolomics, will be required to understand the role of microbes in different ecological habitats. In spite of the many challenges, the field offers promising perspectives for achieving a more comprehensive view of microbial communities and how microorganisms adapt to and function within their ecosystems.

Introduction The field of genomics has led to a conceptual shift in the way we

approach biological systems by enabling researchers to go beyond

studies of isolated components and address global functions and

complex ecosystem interactions (Bertin et al., 2008). Recent

technological advances have also paved the way for novel

experimental approaches to the study of microbial communities

that seemed largely implausible less than a decade ago. The rapidly

growing area of metagenomics has applied the tools of genomics to

analyze complex microbial assemblages and has become a powerful

strategy for exploring and characterizing microbial communities in

diverse settings. The appeal behind the metagenomics approach lies

largely in its ability to bypass cultivation and offer a unique

opportunity to directly sample and gain new insights regarding

natural microbial assemblages. Metagenomic explorations therefore

enable examination of complex communities and microorganisms

of difficult access, providing a more comprehensive view of the

populations present that can go from more extensive phylogenetic

descriptions to valuable information regarding metabolic potential

(Xu, 2006).

One of the major challenges in the field of microbial

ecology is to understand how microorganisms in a community

interact with each other and how the community structure is

related to ecosystem function. Research in microbial diversity and

technological advances over the last decades have led to a new

appreciation of the diversity of microbiological life in our planet

and provided tools for accessing a broad spectrum of microbial

communities. The use of culture-independent methods has been

crucial to our understanding and estimates of microbial diversity,

which now greatly surpass original calculations that were limited by

culture-dependent methods. Modern molecular tools have therefore

been fundamental to our growing recognition of the extent of

microbial diversity and the capacity of microorganisms to influence

global ecosystem functioning (Schmidt, 2006). However, much

remains to be learned regarding microorganisms and their roles in

2

particular environments. Studies aimed at understanding complex

communities require novel and more holistic approaches as well as

integration of methodologies in order to understand the ecology of

populations and factors that control their activities. In this respect

metagenomics, coupled to complementing high-throughput

strategies for studying expression profiles and microbial metabolic

potential, offers a unique opportunity for examining uncultured

microbes and assessing their role in an ecosystem (Turnbaugh and

Gordon, 2008).

Metagenomics holds an undisputed advantage in terms of

accessing and examining complex and difficult to study natural

microbial communities. However, the metagenomic approach that

studies the entire DNA content of a community is still limited in

its scope and capacity to derive ecologically meaningful information

regarding the complex interactions that drive and shape

communities. Difficulties inherent to this strategy, from problems

associated with extraction of genomic material to loss of relevant

information regarding the microorganisms and the ecosystem,

necessarily limit the information that can be obtained from a

particular study. Problems related to limited recovery of DNA have

been addressed recently by amplification of the isolated material

using multiple strand displacement (MDA), a strategy that can also

be applied to single cells (Lasken, 2007). This is done by means of

the isothermal proof reading multiple displacement amplification

activity of phi29 DNA polymerase, an enzyme discovered almost

30 years ago that has now been recognized as a powerful means for

obtaining up to micrograms of DNA from minute amounts of

starting material (Binga et al., 2008). This enzyme has been used

for amplification of metagenomic DNA and tested on soil DNA

templates probed against microarrays (Gonzalez et al., 2005; Wu et

al., 2006). Since metagenomics involves direct isolation of DNA

from the environment, information regarding particular phenotypic

traits is lost together with the capacity to carry out additional

analyses regarding the physiology of specific microbes. Depending

on the questions being addressed, simplification of the microbial

community might be a viable alternative in order to facilitate

interpretation of the data and the reconstruction of genomic

information. This could be achieved either through enrichment of

certain populations or by following diverse cultivation strategies

aimed at recovering microorganisms that can be further analyzed in

the lab. The study of isolates or the reconstruction of genomes from

simplified communities could provide relevant information in

terms of understanding the role of microbes within their particular

niche (Steward and Rappe, 2007; Tyson et al., 2004). More

sophisticated approaches, such as cell sorting and microfluidics have

also been tried (Cardenas and Tiedje, 2008; Warnecke and

Hugenholtz, 2007). Another major drawback of metagenomics is

that gene discovery is carried out at the expense of genomic context

and in the absence of information regarding the organisms

themselves. Deriving useful genomic data thus relies on the capacity

of bioinformatics to reassemble and make sense of the massive

amount of sequence information generated. The taxonomic

classification of metagenomic sequences, which could greatly help

in assessing community composition and dynamics and assignment

of roles to encoded proteins, depends on available information

stored in the databases. Thus our capacity to derive information

from metagenomic samples is also constrained by our current

knowledge regarding gene sequences and proteins, most of which

comes from sequenced genomes (Pignatelli et al., 2008). One of the

most substantial technical improvements is perhaps the recent

introduction of massively parallel sequencing technologies that

generate large amounts of sequence information at reduced costs.

The use of high-throughput approaches will, no doubt, lead to an

increase in the generation of metagenomic data that will in turn

require additional and more sophisticated bioinformatics tools to

manage this information and carry out processes such as assembly,

gene prediction, annotation, and metabolic reconstruction (Steward

and Rappe, 2007).

Metagenomics is therefore at the point where scientific

questions focused on understanding the interaction among

microorganisms and their roles in the environment can start to be

addressed. This will require coupling genotypic and phenotypic

analyses through the implementation of novel, powerful and

innovative tools and the concerted integration of other “omic”

approaches such as proteomics and transcriptomics (see Figure 1).

The formidable plasticity displayed by microorganisms is related to

their metabolic versatility, the interaction of complex regulatory

networks and their capacity to trigger differential responses that

become evident in the expressed metabolic potential. Focusing on

the global analysis of all genes and expression profiles, can therefore

reveal information beyond what can be gathered from studies of

individual genes, contributing substantially to our understanding of

the physiology and the strategies involved in microbial adaptation

to changing environmental conditions (Schweder et al., 2008). The

major challenge in the future will be to integrate experimental

approaches and formulate questions aimed at deriving relevant

ecological information, questions that can only be addressed in the

context of intact communities where population requirements and

interactions are at work (Turnbaugh and Gordon, 2008).

Figure 1. “Omics” approach to the study of microbial ecology. Microbial communities are influenced and shaped by both biotic and abiotic factors. The “omic” strategies target different levels of the information flux, starting with the metagenome and increasing in complexity. The integration of these approaches can provide a more comprehensive of view of a community structure and function in a defined spatial and temporal setting.

3

Metatranscriptomics Definition and origins Metatranscriptomics is the high-throughput detection and analysis,

in sequence diversity and associated functions, of the transcripts

(RNA molecules) extracted from samples where more than one

microbial genome type is present. It is essentially a transcriptomic

study in samples containing multiple cell types, species or

operational taxonomic units (OTUs). The word

“metatranscriptomic” is derived by analogy with “metagenomic”.

In the strict sense of the definition, metatranscriptomics could

include all the work involving direct extraction and detection of

RNA sequences from environmental samples, i.e. those involving

reverse transcription, target amplification, sequencing and analyses

of 16S rRNA gene transcripts (Felske et al., 1996a; Nogales et al.,

2001b; Small et al., 2001a; Weinbauer et al., 2002). However, if

one considers metagenomics mostly as a sequence-based approach

(excluding function-based screenings), metatranscriptomics could

be restricted to analyses that have a broader scope and encompass

total mRNA and/or rRNA transcripts in a sample. This approach is

made possible by massive sequencing efforts and ideally does not

involve cloning procedures or targeted PCR amplifications.

However, the widespread use of 16S rRNA gene amplifications to

characterize microbial communities could be considered as a special

case since this gene is still extremely useful for exploring diversity

and complexity in microbial communities (Tringe and Hugenholtz,

2008). Metatranscriptomics complements the metagenomic

approach by focusing on the expressed subset of genes

(metatranscriptome), thus reducing the complexity of the data to be

analyzed. This allows, for example, detection of sequences

associated with a particular environmental condition that may not

be so readily identified in metagenomic studies and increases the

chance of detecting ecologically relevant active functions. The

discovery of functions being induced in a sample as a response to a

certain environmental condition (exerted pressure) also gives insight

into processes of adaptation and enriches our understanding of

communities previously captured through metagenomic sequence

surveys. Thus, this approach gives a composite view of the

transcriptionally active subset of the genomes present in a

community under the environmental condition sampled. As we will

describe below, metatranscriptomics is now possible thanks to the

recent integration of various developments in different technical

and theoretical fields such as nucleic acids sequencing technologies,

hybridization-based (array) transcriptomics, new molecular biology

applications of well-characterized enzymes, microbial ecology

techniques to improve quantities, stability and detection of RNA

molecules, and the emergence of bacterial phylogenomics and

related bioinformatics tools customized for metagenomic datasets,

among others.

Limitations in analyzing the metatranscriptome The exploitation of transcriptomics to assess the active subset of

genes in a given environmental microbial community metagenome

is very recent, with reports appearing only in the last five years. A

search carried out in February 2009 for key terms in PubMed, such

as metatranscriptomics and related words, retrieved only 10

citations starting in 2006. While this raw search can miss some

relevant publications on metatranscriptomic studies, it does suggest

that this is a new and emerging field. Reasons for the apparent

delay in reports of research in this field, with respect to research in

the general area of metagenomics, are essentially related to technical

difficulties and previously identified limitations inherent to

performing studies using environmental RNA.

The inherent instability of RNA molecules has been one

of the most limiting factors for the development of

metatranscriptomics. Transcriptional studies had already revealed

the complexity of working with RNA, an unstable molecule of

rapid turnover and short cellular half-life (seconds to minutes)

when compared to the informative and more stable molecules of

DNA. The lability of RNA molecules can also contrast with the

proteome, which can have variable protein half-lives that are

dependent on the specific protein’s biochemical nature and

localization. The transient nature of a given RNA population will

therefore influence the expression profiles observed, providing at

best a snapshot of what are probably highly dynamic patterns of

expression (Velculescu et al., 1995). Another factor limiting the

capacity for deep sequence-based transcriptomic analyses of

metagenomes is the low quantities of transcripts inherently present

and/or recovered from environmental samples. This is due to the

substantially lower biomass content found in these samples when

compared with a pure bacterial culture (Amann et al., 1995). In

addition, components that contaminate samples and are co-

extracted with the nucleic acids (Griffiths et al., 2000), such as

humic acids in soils, can interfere with additional steps in sample

processing like quantification, enzymatic amplification,

modification or hybridization (Alm et al., 2000; Roh et al., 2006).

These problems, despite being shared with metagenomics, are

particularly critical for the demanding methodological steps

involved in metatranscriptomic studies. However, improvements in

sample recovery and purification over the last years have opened the

way for global analyses that involve detection and identification of

transcripts from environmental samples.

From 16S rRNA transcript sequencing to total metatranscriptome pyrosequencing In many cases, the first approach to characterizing an

environmental microbial community still relies on a description of

the taxonomical composition of the sample, usually based on 16S

rRNA gene amplification and sequencing. In the late 90s, some

reports described the so-called “active fraction” of the microbial

community by extracting RNA, generating cDNA and then

determining the sequence complexity in ribosomal genes (Felske et

al., 1996b; Nogales et al., 2001a). The community composition

differed depending on whether DNA or RNA was used for 16S

rRNA gene amplification, with some phylogenetic groups found

only in one of the two clone libraries from the same sample. In

addition, predominant 16S rRNA types were more evident when

RNA was used as template, a reflection of a dominant

transcriptionally active species that did not necessarily correlate

4

with the most abundant type detected using DNA (Nogales et al.,

2001b). These studies revealed the discrepancy between observed

predominant species or genome types and the transient expression

profile of particular microbes within a community. This transient

expression is reflected by the amount of rRNA transcripts recovered

and is influenced by the conditions at the time of sampling. These

initial studies struggled with the technical difficulty of extracting

RNA from environmental samples and paved the way for

improvements required for the analyses of transcripts from

environmental samples (Hurt et al., 2001). Superior protocols and

commercial kits thus became available, improving the

reproducibility, quality and quantity of nucleic acids being

extracted from various environmental sources. Despite these

advances, there are still problems inherent to these procedures that

require experimental fine-tuning in order to optimize procedures

for diverse environmental samples.

The recently developed high-throughput sequencing

technologies have obvious advantages in terms of exploring the

metatranscriptome. Pyrosequencing, which is based on the

detection of the released pyrophosphate, represents a turning point

because it dispenses with cloning and provides a fast and

economical alternative for obtaining large-scale sequence

information. The basic steps involved in the pyrosequencing-based

metatranscriptomic approach are: isolation of environmental RNA

(eRNA), generation of complementary ecDNAs by random-primed

reverse transcription that are then treated to produce double

stranded DNA fragments of the environmental cDNAs (ds

ecDNA). These ds ecDNAs are then ligated to adaptors, emulsified,

and subjected to the 454-sequencing process (Leininger et al.,

2006). These DNAs contain information of the expressed

ribosomal genes (rRNA, taxonomical-community structure

information) and protein-coding genes (mRNA – metabolic

functions) within a microbial community and thus provide relevant

input for more detailed downstream analyses (protein-based

analyses or microarray design) at an unprecedented depth of

coverage. This approach, which avoids the well-known biases

associated with culturing, primer-probe specificity and sensitivity,

PCR amplification, cloning and screening, was used by Urich et al.

to rapidly and simultaneously characterize both the structure and in

situ function of a soil microbial community (Urich et al., 2008).

The simultaneous analysis of both actively transcribed rRNA and

mRNA sequences obtained by pyrosequencing was thus useful for

taxonomic profiling of the community and assessing actively

transcribed genes and functional information.

In some cases it is desirable to focus on protein-coding genes and

exclude the ribosomal content from the analysis. This focuses the

work on predictions regarding functionality or networking of the

possible metabolic pathway present. It also increases coverage and

can reveal more diversity associated with a specific function. In

microbial transcriptomics and metatranscriptomics, the exclusion of

rRNA molecules is presently done by two methods. One method

involves capturing and removing the ribosomal content by using

probes to target highly conserved regions on the ribosomal

subunits, followed by a selective hybridization and removal of the

rRNA. Another alternative takes advantage of a difference between

mRNA and rRNA, which allows a processive 5´-3´ exonuclease to

digests rRNA having a 5´ monophosphate. This strategy was used

to analyze the mRNA sequence content by pyrosequencing in

marine surface waters (Frias-Lopez et al., 2008; Gilbert et al.,

2008). Metatranscriptomics studies that use mRNA decrease the

complexity in a meaningful and useful way, offering the advantage

of recovering sequences for putative proteins that otherwise can be

overlooked or underrepresented in metagenomic surveys

Future perspectives in metatranscriptomics Nowadays, metatranscriptomic studies consist of deep sequence

surveys of the expressed genes from overwhelmingly complex

metagenomes (Raes and Bork, 2008; Urich et al., 2008). Although

a powerful approach to understanding functionality, this strategy is

still a relatively isolated and transient picture of what can be an

amazingly diverse and largely unknown community. However,

metatranscriptomics offers several advantages over the large-scale

sequence-based metagenomic approach that seeks broad sequence

coverage. By centering the analysis on the functions detected, this

approach reduces the sequence complexity and provides a more

meaningful alternative to the study of heterogeneous communities.

One of the advantages of working with libraries generated from

expressed transcriptional units is the increased chance of finding

protein coding, functional sequences and assigning possible roles to

these proteins within a metabolic context (Dunlap et al., 2006).

Thus metatranscriptomics can facilitate understanding the

variations within an ecosystem and the possible correlations

between environmental variables and function (Gianoulis et al.,

2009). It can also be used to target specific functions of

environmental importance (Gilbert et al., 2009; Shrestha et al.,

2008) and has the potential of identifying genes that could go

undetected in larger metagenomic sequencing datasets. The

construction and analysis of cDNA libraries from diverse

environments has revealed several unique sequences and the

potential to uncover a high degree of novelty within microbial

communities (McGrath et al., 2008). From a more pragmatic point

of view, metatranscriptomics can be useful for describing the

network of activities taking place in an ecosystem in order to

obtain, for example, a specific metabolite.

Several improvements and developments are still required

in order to more fully exploit this approach. One important aspect

for future studies in metatranscriptomics is to define the rates of

environmental RNA turnover (Kuechenmeister et al., 2009). This

will allow us to fine-tune and correct metatranscriptomic

observations, and to assess possible correlations with microbial

diversity, composition and functions, as well as with the

environmental conditions present. An efficient coupling of

metatranscriptomics with other techniques used in environmental

microbiology will also become more prevalent. These will include

other “omic” approaches, high-throughput sequencing and

microarrays, where metatranscriptomics can provide a more

5

efficient way of feeding microarray probe design to match an

ecosystem’s particular genomic and transcriptional content (Parro

et al., 2007; Small et al., 2001a, b; Urich et al., 2008).

Metatranscriptomics will also be used in conjunction with

complementing strategies, such as stable isotope probing on nucleic

acids, a technique that detects the incorporation of a supplied

isotope into the DNA or RNA of the bacterial species metabolizing

the substrate (Lueders et al., 2004). What will probably be very

important, however, will be to increase the number of studies that

follow the same community across temporal variations in order to

have a more accurate notion of the expression dynamics involved.

The development of additional data mining tools to better interpret

and integrate metatranscriptomics with data derived from

complementing strategies should allow us to relate environmental

factors with community performance and improve our capacity to

detect and predict adaptation and evolution of microbial

communities affected by natural or artificial pressures.

Metaproteomics Metaproteomics has emerged over the last years as a powerful

strategy that can contribute significantly towards our understanding

of ecosystem functioning in microbial ecology (Wilmes et al.,

2008) (Figure 2). It is evident that this ecological information

cannot be obtained from the study of the genes alone and that

genomics is limited in terms of elucidating critical aspects of

microbial interactions (Graves and Haystead, 2002). In fact, an

important difference with respect to genomic studies is that

proteomics can reflect the dynamics of a system and capture

changes driven by shifts in environmental conditions (Hagenstein

and Sewald, 2006). The fact that proteins, not genes, are directly

responsible for the phenotypes of cells makes proteomics an

excellent tool for approaching functionality and revealing changes

in protein synthesis and folding that result from rapid physiological

responses (Lacerda et al., 2007). These protein expression profiles

reflect specific microbial activities in a given ecosystem and can be

more informative than either identification of functional genes

present or even of their corresponding messenger RNAs (Benndorf

et al., 2007; Wilmes and Bond, 2006). Proteomics is also useful

because it can identify functional genes of importance within a

community and can verify metabolic processes inferred from

metagenomic data. In addition, the generation of de novo peptide

sequences confers specificity in the identification of proteins and

phylogenetic origin of proteins (Wilmes and Bond, 2006). While

the rapid progress in technologies for both protein separation and

identification, such as chromatography and mass spectrometry, has

triggered exciting developments in the field, metaproteomics will

surely gain more momentum with the advent and incorporation of

additional tools and strategies for exploring microbial communities.

Figure 2. Schematic overview of the metaproteomic approach in microbial ecosystems.

The metaproteomic approach The term proteomics, which was first used in 1995, can be defined

as the large-scale study of the proteome, or the complete protein

complement, expressed by a genome under different conditions

(Graves and Haystead, 2002). This term is used to represent the

array of proteins that are expressed in a biological compartment

(cell, tissue, organ or organism) at a particular time under a

particular set of conditions (Beranova-Giorgianni, 2003). Because

proteins are key structural and functional molecules, molecular

characterization of proteomes is important for a complete

understanding of biological systems. Therefore proteomic studies,

which involve different disciplines such as molecular biology,

biochemistry and bioinformatics, can provide a more integrated

view of a biological system by detecting modifications of its entire

protein fraction. Although proteomics has been used extensively to

study microorganisms in pure culture, information derived from

these protein profiles may not necessarily reflect processes occurring

in complex microbial communities found in natural settings

(Wilmes and Bond, 2006). Moreover, the focus of research on

microbial ecology goes beyond the individual species to study

whole assemblages and ecosystems. In this respect, the

metaproteomic approach goes further than single microorganisms

to encompass the spectrum of proteins present in a microbial

community, giving a glimpse of its functional potential.

Information generated using this strategy therefore complements

environmental genomic databases and contributes to our

understanding of natural ecosystems.

Experimental approach in metaproteomics A metaproteomic analysis includes several technically challenging

steps, beginning with the extraction of microbial proteins from the

surrounding matrix and ending with their identification (Maron et

al., 2007b). The protein fraction in any ecosystem involves secreted

and cellular proteins, some of which can be attached to the cell wall

or embedded in membranes (integral proteins). The choice of the

protein extraction technique is crucial due to the complexity of

native microbial communities, the heterogeneity of natural

environments, and the presence of interfering compounds that can

affect the efficiency of extraction (Ogunseitan, 2006). Since the

6

extraction technique can influence recovery, it is often useful to

define this step on the basis of the protein fraction being targeted

and on the subsequent method of protein analysis (Hecker, 2003).

There are many protocols for this purpose, including differential

centrifugation, resolving soluble proteins in separate gels, and

employing reagents with stronger solubilization power for pellets

enriched with membrane proteins (Molloy et al., 2000).

The most commonly used technique in proteomics to

separate and resolve complex protein mixtures is polyacrylamide gel

electrophoresis (PAGE) either in one (1-DE) or two dimensions (2-

DE). 2-DE first uses isoelectric focusing (IEF) in immobilized pH

gradients followed by separation based on molecular weight using

sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-

PAGE) in the second dimension. Despite being widely used for

separation of proteins, 2DE is time-consuming and labor intensive

and is limited in its capacity to resolve all the proteins in complex

samples or environments (Graves and Haystead, 2002). In

addition, PAGE separation can lead to an under representation of

very large, or very small, proteins as well as of integral membrane

proteins, and may fail to detect low abundance proteins. To bypass

the limitations of protein electrophoresis, alternative ways of

separating proteins have been developed, one of which involves

high performance liquid chromatography (HPLC) (Graves and

Haystead, 2002).

Once proteins have been separated, spots resolved in 2D

gels are digested with a protease, usually trypsin, and subjected to

analysis using mass spectrometry (MS) for protein identification

(Domon and Aebersold, 2006). The peptides must be ionized for

MS and this is achieved usually by either matrix-assisted laser

desorption/ionization (MALDI) or electrospray ionization (ESI)

techniques. New ionization methods include desorption

electrospray ionization (DESI) and the recently developed surface-

assisted laser desorption/ionization (SALDI) method that uses a

non-volatile inorganic matrix of germanium on a silicon surface

(Seino et al., 2007). Ionization is followed by mass analysis in a

mass spectrometer using different analyzers such as the commonly

used quadrupole mass analyzers, time-of-flight (TOF) instruments,

ion trap mass analyzers that trap molecular ions in a 3-D electric

field, and tandem mass spectrometry (MS/MS), which can be used

to acquire sequence information. There are several different mass

analyzers and the choice of equipment will be defined by several

criteria. Triple-quadrupole mass spectrometers, for example, are

most commonly used to obtain amino acid sequences while

quadrupole-TOF (qTOF) is used for amino acid sequencing and

determination of modifications. MALDI-TOF is usually used for

peptide mass fingerprinting, MALDI-QqTOF allows both peptide

mass fingerprinting and amino acid sequencing, and FT-ICR

(Fourier transform ion cyclotron resonance) is useful because it can

achieve higher resolution and accuracy (Graham et al., 2007;

Graves and Haystead, 2002). Detection has been improved thanks

to developments such as MS/MS and TOF/TOF instrumentation

with optimized laser quality or direct analysis in real time (DART)

(Lasaosa, 2008).

The information generated by MS regarding peptide mass

or sequence is then compared against published nucleotide or

protein databases in order to predict and identify proteins (Wilmes

and Bond, 2006). This identification therefore depends on the

information available and relies heavily on bioinformatics tools for

comparison and identification of homologues in databases.

Metaproteomics and microbial ecology The growing number of reports on the characterization of

microbial ecosystems in recent years is indicative of the great

potential behind the metaproteomic approach. In-depth analyses of

metaproteomic expression profiles are fundamental to our

understanding of microbial interactions and of the role played by

certain microorganisms in global nutrient cycles (Schweder et al.,

2008). The first studies on metaproteomics were carried out in

microbial habitats with limited microbial diversity, but nowadays

the range of habitats studied has increased to include complex

microbial communities. To date, metaproteomic analyses have been

conducted on microbial communities found in soils, activated

sludge, wastewaters, acid mine drainage biofilms, marine

ecosystems and even the human gastrointenstinal tract (Kan et al.,

2005; Klaassens et al., 2007; Schulze et al., 2005; Sowell et al.,

2009; Tyson et al., 2004; Wilmes et al., 2008).

In a pioneering study aimed at identifying proteins in

dissolved organic matter (DOM) from complex environments such

as lake waters, water extracted from soils and soil particles, Schulze

et al. showed that, despite the limitations of the approach at the

time, specific taxonomic groups could be identified and proteomic

composition varied depending on the ecosystem, and that the

strategy could be useful for assessing the functionality of an

ecosystem (Schulze et al., 2005). More recently, protein

fingerprinting has been used to study natural communities and

evaluate the correlation between community structure and

ecosystem function. In one study, protein fingerprints generated by

standard SDS-PAGE and ribosomal DNA fingerprints were used to

analyze indigenous microbial communities in freshwater samples.

Results showed that variations in the genetic and functional

structure were complex and varied depending on the perturbations

imposed on the community (Maron et al., 2007a). More recent

work using the same strategy to analyze bacterial communities

inoculated into sterile soils differing in their physicochemical

properties showed a correlation between the functional structure of

the community, as assessed by protein fingerprinting, and the

physicochemical characteristics of the soil (Maron et al., 2008).

Both metagenomics, and more recently metaproteomics,

have been applied to the study of a natural biofilm community

dominated by few species that is associated with acid mine drainage

(AMD), an environmental problem that arises largely from

microbial activity. By using shotgun cloning and sequencing of the

DNA retrieved directly from the environment, Tyson et al. were

able to reconstruct almost complete genomes of Leptospirillum

group II and Ferroplasma type II, and to partially recover three

other genomes from this underground, low-complexity AMD

7

microbial biofilm (Tyson et al., 2004). While this study unveiled

metabolic pathways and insight into survival strategies, community

proteomics carried out on this AMD biofilm provided information

about how these microorganisms function in their natural

environment. The combination of mass spectrometry–based

proteomics and community genomic analysis revealed key

functions and how these were partitioned among community

members (Ram et al., 2005). More recently, community genomic

data sets were used to identify expressed proteins from the

dominant member of an AMD biofilm (Lo et al., 2007). The

results showed genome-wide recombination patterns due to genetic

exchange between closely related bacterial populations that could be

underlying the capacity of these microorganisms to survive in this

very acidic and metal-rich ecosystem. In this study the capacity to

discriminate peptides with slight differences in composition

enabled identification of sequence variants from proteomic data.

Thus coupling proteomic and genomic data conveyed information

both about the genome structure and the activities present in this

community. It also highlighted the importance of using such strain

strain-resolved community proteomics to complement culture-

independent metagenomics analysis of microbial communities.

The oceans, which cover more than 70% of the Earth’s

surface, constitute the largest natural habitat in the world and as

such are the subject of intense studies in microbial ecology. Marine

microorganisms, which are extremely diverse and play fundamental

roles in global biogeochemical processes, are subjected to

fluctuating environments due to changes in the water conditions

(Thomas et al., 2007). One of the first studies using

metaproteomics on natural aquatic microbial assemblages in the

Chesapeake bay established the feasibility of the approach and

identified several proteins that corresponded to dominant bacterial

groups (Kan et al., 2005). Marine alphaproteobacteria are

ubiquitous in marine ecosystems and outstanding in their capacity

to persist in oligotrophic waters, an adaptive trait of biological

importance and of great interest in marine microbiology (Sowell et

al., 2008; Thomas et al., 2007). A proteomic approach was used to

identify proteorhodopsin proteins, light-dependent proton pumps

predicted to be important in terms of supplying energy for marine

microbial metabolism, in the alphaproteobacteria SAR11 strain

HTCC1062 (“Pelagibacter ubique”) (Giovannoni et al., 2005). An

accurate mass and time (AMT) tag library was then generated for

quantitative examination of proteomic profiles of this cultured

strain to identify differentially expressed genes and create a

comprehensive library of peptide AMT tags to improve further

proteomic analyses of this microorganism (Sowell et al., 2008).

Subsequent metaproteomics analysis of the communities present in

the north-western Sargasso Sea were carried out to understand the

mechanisms involved in survival in these oligotrophic waters. The

analysis of the metaproteome in surface samples, using capillary

liquid chromatography (LC)-tandem mass spectrometry, identified

peptides that could be mapped to proteins from the SAR11 clade,

followed by Prochlorococcus and Synechococcus, both of which are

dominant marine photosynthetic bacteria (Sowell et al., 2009). The

results indicated that a large number of the identified SAR11

peptides belonged to periplasmic substrate-binding proteins,

consistent with observations that the periplasmic space represents a

large proportion of the volume of the extremely small SAR11 cells.

Other abundant proteins included proteins mediating oxidative

stress and re-folding, as well as nutrient acquisition. These findings

indicate that the metaproteomes of SAR11, Prochlorococcus and

Synechococcus bacteria reflect adaptation to fluctuating

environmental conditions where cells have to survive the damage

imposed by light and oxidative stress while competing for limited

nutrients (Sowell et al., 2009).

The potential of metaproteomics has also been used for

understanding the complex relationship among microorganisms

present in wastewater treatment plants (WWTP). The

metaproteome of a laboratory-scale activated sludge system

optimized for enhanced biological phosphorus removal (EBPR) was

first analyzed using 2D PAGE. This work identified highly

expressed proteins, possibly from the dominant and uncultured

Rhodocyclus-type polyphosphate-accumulating organism (PAO),

and established the viability of carrying out proteomics on a

complex community such as this for which cultivation is difficult

(Wilmes and Bond, 2004). Subsequent work compared protein

expression in sludge from two EBPR systems with different levels of

phosphorus removal (Wilmes et al., 2008). This study was able to

identify proteins that were highly expressed by the dominant PAO

and revealed several proteins that could be linked to the metabolic

activities occurring in these EBPR systems. Another interesting

study used metaproteomics to analyze the proteins found in the

extracellular polymeric substances (EPS) in full-scale activated

sludge systems (Park et al., 2008). Extraction of EPS proteins is

technically challenging and was therefore evaluated using three

different cation-associated extraction methods, followed by sample

fractioning and proteomic analysis. While the results showed that

the protein profiles were different for the various extraction

methods, several sewage-derived and bacterial proteins were

identified, some of which were ubiquitous and therefore potentially

useful as biomarkers to monitor operations.

Advanced molecular technologies have also led to

interesting applications in areas such as bioremediation, a biological

process based on the catabolic capability of microorganisms to

degrade and/or eliminate polluting materials from an ecosystem.

Increasing our knowledge of the microbial communities involved in

key physiological processes and understanding the relationship

between microbial diversity and physiological routes involved in

biodegradation processes in polluted environments could enhance

bioremediation processes. With this in mind, a new protein

extraction procedure was developed and applied to a soil

microcosm and a contaminated aquifer (Benndorf et al., 2007).

The analysis of these metaproteomes was consistent with the

bacterial metabolic pathways expected in these ecosystems and

showed the potential of using this approach to identify possible

biomarkers indicative of biodegradation processes. In another

study, proteomics was used to assess the response of a microbial

8

community after stress by cadmium exposure (Lacerda et al., 2007).

The analysis showed significant changes in the microbial physiology

and the capacity to detect rapid changes within the community,

providing evidence of toxicity and insight into mechanisms of

tolerance.

Challenges and future perspectives It can be generally argued that the analysis of proteins through

metaproteomics provides extremely useful functional information

regarding microbial communities, more so than metagenomics or

even metatranscriptomics (Stenuit et al., 2008). Despite its evident

appeal and the great methodological and technical advances in

terms of extracting and analyzing proteins directly from

environmental samples, the approach is still hampered by several

limitations. Some of the inherent limitations of the approach

include low protein extraction yields, difficulty in identifying

peptides through database searches due to reduced coverage of

known protein sequences, and ambiguity in interpreting data in the

absence of any corresponding metagenomic information. As a

consequence of the diversity of protein function and structure there

is no single universal extraction method available. This will require

both adjustments to established procedures and improvements in

the efficiency of protein extraction, especially from highly

contaminated samples. Other major challenges involve protein

separation and identification techniques (Maron et al., 2007b) and

bioinformatic capacity for analysis and management of the large

volumes of data generated (Nesatyy and Suter, 2007; Wilke et al.,

2003). Thus improvements in sample preparation, MS techniques

and data capture and analysis will have to be paralleled by advances

in bioinformatics tools designed for both organizing and processing

proteomics and metaproteomics data (Yang and Zhang, 2008).

Another major problem with metaproteomic studies is that

assignment of peptide masses determined by MS relies on known

peptide sequences in databases. Despite the increasing amount of

available microbial peptide sequences, most of the proteins derived

from environmental microorganisms still lack reference sequences

in databases (Schweder et al., 2008). Thus the limited number of

organisms represented in the protein and gene sequence databases

constrains the efficient application of cutting-edge high-throughput

proteomics to environmental samples (Nesatyy and Suter, 2007).

In addition, the high genetic variation in natural populations, as

well environmental changes that affect the organisms’ responses

could hamper the interpretation of protein expression levels from

environmental samples. Another critical aspect in the approach is

the reproducibility of the results. The difficulty associated with

efforts at reducing the sources of variability has been made evident

by the discrepancy in results obtained in different laboratories

involved in the analysis of the same protein mixture (Tao, 2008).

One additional and also very important challenge in the field will

always be that of testing and validating the functional information

obtained.

In spite of the many limitations, metaproteomics still

provides a powerful tool to study the functional diversity of

environmental microbial communities. With the capacity to sample

the total protein pool of a given natural population, the

metaproteomics strategy provides a unique opportunity to obtain

functional information regarding natural communities and link this

information to population structure. The identification of peptide

sequences, based on information of sequenced microorganisms and

metagenomes, will improve in the years to come, offering more

precise identification of specific enzyme and putative functions and

helping our understanding of the adaptations and response to

changing conditions. It can be anticipated that environmental

proteomics will prove extremely useful in several fronts. For

example, the identification of conserved proteins could serve as

markers for specific habitas. Proteins that change upon

environmental perturbation could be used as indicators of stress on

natural populations and ecosystems (Maron et al., 2007b). In

addition to identification of protein biomarkers, metaproteomics

can also be very useful in the field of ecotoxicology by detecting

minor changes in the proteome or metaproteome and quantifying

the effects of stressors on natural populations, communities, and

ecosystems (Nesatyy and Suter, 2007). Environmental proteomics

can also lead to the identification of known or novel biochemical

functions involved in complex biogeochemical processes and can

help to address the role played by the succession of populations

within an ecosystem. As techniques and databases become more

robust, the likelihood will increase of assigning phylogenetic

affiliation and possible catalytic function to proteins from complex

environments (Rodriguez-Valera, 2004). Finally, metaproteomics

can complement other meta-approaches in addressing fundamental

questions in microbial ecology such as the relationship between

community structure and function and how these communities

contribute to ecosystem dynamics and stability.

Metagenomics and metabolomics Metabolomics in short Metabolomics, which has been defined as the study of global

metabolite profiles in a biological system under a given set of

conditions, is one of the most recent technologies introduced in the

systems biology approach (Goodacre et al., 2004). This rapidly

expanding area of scientific research faces many technological

challenges in its aim to encompass one of the outermost levels of

the information flux that displays greater complexity than do the

genome, the transcriptome or the proteome. While genomics and

proteomics study macromolecular building blocks (DNA and

proteins, respectively), metabolomics deals with structurally and

physicochemically diverse small-molecule metabolites (typically

<1000 Da) (Han et al., 2008). As a consequence of this complexity,

there is no single method that enables a comprehensive

metabolomic analysis. Despite this limitation many analytical

methods can be applied to examine metabolites from different

chemical classes and have provided invaluable information about

the metabolome of model microorganisms (Mashego et al., 2007).

Metabolomic analysis typically is carried out by mass spectrometry

(MS), usually coupled to a separation methodology such as liquid

9

chromatography (LC-MS), gas chromatography (GC-MS) or

capillary electrophoresis (CE-MS). The stand-alone nuclear

magnetic resonance (NMR) technique has also been widely used. A

complete review of the methodologies used in metabolomics has

been recently published (Oldiges et al., 2007). The analysis of

metabolites varies depending on the aims of the research and has

been done using three different strategies (Peric-Concha and Long,

2003): i) Metabolite fingerprinting uses spectra obtained either

from NMR or MS analyses to create a fingerprint of the

metabolites that are produced by a biological system; it is not

quantitative and usually does not provide information about

specific metabolites. ii) Metabolite profiling is the semi-quantitative

analysis of a group of specific metabolites (e.g. carbohydrates or

polyketides). iii) Metabolite target analysis is the quantitative

analysis of metabolites and is targeted to a subset of molecules that

participate in a specific aspect of metabolism.

Metabolome of an ecosystem One of the aims of the metagenomic approach is to reveal the

microbial gene diversity present in the ecosystem, a step that

constitutes investigation at the lowest level of the genetic

information flux (metagenome) of a microbial community. This

metagenome is more stable when compared with levels of

information that are further downstream, such as RNA and

proteins, since it is the result of evolutionary processes over

members in a given population and is not as fluctuating and

transient as the transcriptome, the proteome or the metabolome

(Han et al., 2008). It has now become evident that the fraction of

genes available from culturable microorganisms is minimal in

comparison with the global microbial gene pool present in the

environment. Commensurate with this idea, microbial

communities in natural ecosystems should be expected to harbor a

broad collection of metabolites that are synthesized in response to

environmental cues. Some of these metabolites might not be

present in the current set of culturable microorganisms or they

might not have been detected due to the lack of knowledge

regarding specific signals required under standard laboratory

conditions to elicit their production. There should therefore be a

startling variety of unexplored metabolites produced in natural

environments, many of which might be produced by non-

culturable microorganisms in an environment-dependent manner.

For this reason, the metabolome of a microbial community (meta-

metabolome) is extended to include the complete set of metabolites

formed by the whole community as a result of its interaction with

the biotic and abiotic factors present in a given niche. In the

systems biology approach it has long been known that metabolomic

data represent integrative information. According to the metabolic

control theory (also known as Metabolic Control Analysis, MCA)

(Cascante et al., 2002), small changes in the transcriptome and the

proteome have only minor effects on the overall metabolic fluxes

but have significant effects on the concentration of metabolite

intermediates of the pathway. For instance, the reduction in the

activity of an enzyme can trigger an increase in the concentration of

substrates for that enzyme, thus overall balance of the pathway can

be maintained. Such responses have been made evident from MCA

studies where the perturbation of the system in response to a

mutation is measured by determining the sensitivity coefficients of

fluxes and metabolite concentrations. These coefficients are

consistently higher for metabolites than for fluxes, demonstrating

that perturbations of the system are more accurately measured

when the metabolome is analyzed (Cascante et al., 2002). This

control of the metabolism is possible because the individual

components of metabolic networks are tightly connected, ensuring

that the flux alters only slightly (Nielsen, 2003). As consequence,

the measurement of all the metabolites in a system comprises and

amplifies any perturbation of the levels lying upstream (proteome

or transcriptome) (Mendes et al., 1996; Urbanczyk-Wochniak et

al., 2003) and as such is more sensitive to the physiological

responses of complex biological systems than either transcriptomics

or proteomics (Kell, 2006).

Metabolites are not merely the end product of gene

expression but rather result from the interaction of the genome

constituents with the environment. Thus investigating the full

extent of the meta-metabolome is not possible by just inspecting

the metabolic potential encoded in the metagenome. So far,

metagenomics studies have inferred habitat-specific metabolic

demands on the basis of the identification of predominant gene

families, but experimental confirmation for complex systems

remains elusive because of the lack of a robust analytical

methodology for deconvoluting of all the metabolites present in

complex mixtures (Hollywood et al., 2006). In spite of the

technical challenges, current methodologies for analyzing the

metabolome can contribute to our understanding of microbial

community function and to the discovery of new interesting

bioactive metabolites.

Metagenomics and metabolomics for natural products prospection Microbial secondary metabolism produces a wealth of small

molecules collectively known as natural products that are used in

natural environments for interspecies competition and

communication. These small molecules have been an important

source of therapeutically useful agents such as antibiotics,

antifungals, immunosuppressive agents and anticancer agents

(Clardy and Walsh, 2004). Nearly all known natural products have

been discovered by growing organisms as isolated species and

analyzing their extracts for small molecules. It is estimated that with

this traditional strategy only 10-20% of the culturable bacterial

natural product repertoire, and only 1-2% of the small molecules

potentially produced by the global microbial population have been

discovered (Baltz, 2006; Watve et al., 2001). Bacterial genome

sequencing efforts have only recently focused on Actinomycetales,

one of the most prolific groups of small-molecule antimicrobial

producers. Examination of the natural product repertoire encoded

in the 26 currently available Actinomycetes genomes revealed that,

on average, there are two or three dozen gene clusters potentially

capable of producing a small molecule. However, only a few of

these molecules have actually been identified for each of these

10

strains (Ikeda et al., 2003; Omura et al., 2001; Peric-Concha and

Long, 2003). The potential for secondary metabolite production

revealed in these bacterial genomes suggests that the current

strategy of analyzing isolated microbial species is insufficient for

exploiting their metabolic potential. In fact, most secondary

metabolites are not produced constitutively but, quite the contrary,

are encoded by “cryptic” genes that are triggered only in response to

environmental cues (Peric-Concha and Long, 2003). The

biosynthetic pathways of secondary metabolites are highly complex

and can involve gene clusters that can comprise up to 100 kb of

DNA sequence that encodes refined molecular machines known as

polyketide synthases (PKS) and nonribosomal peptide synthetases

(NRPS) (Fischbach et al., 2008). For a complete review of these

genetic elements and their distribution throughout bacterial

lineages, please see Donadio et. al (2007).

Recent surveys of diverse environments using

metagenomics and other molecular approaches have increased our

awareness regarding the extent of microbial diversity present in

various ecosystems, diversity that should also harbor a remarkable

variety of novel and yet to be exploited natural products. There is a

discrepancy, however, between the number of identified gene

clusters that potentially encode small molecules and the relatively

small number of these molecules that have been discovered. This

discrepancy results most probably from our outdated view of

microorganisms as isolated entities separated from their natural

environments. Bacterial genomics of model culturable organisms

and metagenomics of uncultured bacterial consortia present in

association with marine sponges and soil communities have

revealed numerous gene clusters of PKS and NRPS for which no

molecules have been identified (Donadio et al., 2007; Ginolhac et

al., 2004; Kim and Fuerst, 2006; Piel et al., 2004; Schirmer et al.,

2005). The probability of these gene clusters being junk DNA in

microbial genomes is very low since the metabolic cost of

maintaining such massive biosynthetic systems is high and the

selective pressure for maintenance must be correspondingly strong

(Fischbach et al., 2008). Thus our inability to detect the

corresponding molecule must be related to our poor understanding

of the underlying regulatory networks and to the lack of knowledge

regarding the environmental signals required to elicit production.

How can we access this extensive reservoir of natural

products? Heterologous expression of metagenomic DNA libraries

in Escherichia coli have allowed detection of biological activities and

provided a proof of principle that transcription and translation of

entire biosynthetic pathways are possible (MacNeil et al., 2001;

Rondon et al., 2000a, b). Nevertheless, this approach is greatly

limited by the fact that most genes may not be expressed in

domesticated hosts since cloned genes from environmental

organisms have to be compatible with the host’s genetic machinery.

In an attempt to overcome this limitation, heterologous expression

has been successfully achieved in additional hosts such as

Pseudomonas, Ralstonia, Streptomyces and related actinomycete

species (Craig et al., 2009; Martinez et al., 2004a, b; Wang et al.,

2000). The advantage of using bacterial hosts with diverse genetic

backgrounds lies in their capacity to supply a variety of promoters

and transcriptional, regulatory and post-translational machineries

that extend the capability to express exogenous DNA. Furthermore,

some of these strains are themselves natural products producers and

therefore might already have the biosynthetic apparatus and

necessary primary precursors to support the synthesis of

heterologous small molecules (Peric-Concha and Long, 2003).

Despite these efforts, the frequency of detecting any given activity

from metagenomic libraries is low and high-throughput screening

of thousands of clones is usually required in order to obtain a small

number with the desired biological activity (Henne et al., 2000;

Rondon et al., 2000a). While functional screens for antibiosis or

enzyme action are commonplace, a broader search for novel

chemical entities in metagenomic libraries, particularly in the

absence of a biological screen, will require comprehensive assays

that directly measure the total chemical complement, or the

metabolome, of the expression host (Peric-Concha and Long,

2003). Carrying out a metabolomics-based screen using a

metagenomic library should theoretically meet two fundamental

conditions: it has to be scalable to process thousands of clones in a

high-throughput manner and it has to be sufficiently sensitive to

detect any change produced in the metabolite profile of the host

cell as a consequence of harboring the environmental DNA. The

implementation of such screens may reveal silent phenotypes (i.e.

functions conferred by the expression of heterologous DNA that do

not display evident biological activity, but that modify the overall

behavior of the metabolome) of metagenomic clones that are able

to overcome the barrier of heterologous gene expression.

To efficiently exploit the metabolic potential of microbial

communities, we must abandon the outdated paradigm of isolating

microorganism or genes from their natural environment and shift

towards an eco-systems biology approach where the ecological role

of the molecules is the principal biological question. In accordance

with this ecology-based approach the combination of

metagenomics, metatranscriptomics and meta-metabolomics is

strongly needed to unveil the function of secondary metabolites in

situ. Here we provide a view of how these three approaches can be

combined in order to study the natural product repertoire of

microbial communities present in a given ecosystem.

First, metagenomics through cloning-independent

sequencing of the metagenome can determine the diversity

(richness and abundance) of its members by using ribosomal DNA

markers and can also provide sequence information of the

collection of genes contained in a population. Discovery of novel

biosynthetic gene clusters is the first goal of this line of work. Based

on the catalytic rules of studied assembly line enzymes it is possible

to combine bioinformatics and knowledge-based predictions to

identify scaffolds corresponding to natural products. Furthermore

predictions regarding the structure and physicochemical properties,

based on the organization of genes encoding enzyme modules, can

assist with the selection and tracking of products in the

environment that may be interesting in the search for novel

bioactivities. For instance, novel bioinformatics packages are able to

11

screen genes encoding type I PKS in metagenomics shotgun data

(Foerstner et al., 2008). The program package ClustScan can

annotate gene clusters encoding modular biosynthetic enzymes,

including PKS, NRPS, and hybrid (PKS/NRPS) enzymes, and is

also able to predict some chemical structures and make inferences

about domain specificities and function of the predicted small-

molecule products (Starcevic et al., 2008). However, information

based merely on gene clusters is limited and does not yet faithfully

predict end product structures. This can be particularly true for

clusters with multiple tailoring enzymes, hidden biosynthetic genes

or genes for novel small molecules produced by assembly line

enzymes that operate in an unconventional way (Sattely et al.,

2008).

The prediction of the biosynthetic pathways and the

hypothetical structure of secondary metabolites is the first step

towards the identification and understanding of natural products in

the ecosystem. Once a comprehensive list is made of the gene

clusters found in the microbial community, a metatranscriptomic

analysis of the ecosystem can then be carried out to analyze the

expression dynamics of the genes making up the predicted clusters.

This analysis can shed light on how spatial and temporal conditions

influence differential expression of secondary pathways (Raes and

Bork, 2008). Subsequent linking of identified gene clusters and

expression profiles to microbial species within an ecosystem is an

important but difficult task that has nevertheless been achieved by

co-cloning of a phylogenetic marker (Beja et al., 2000). Nowadays,

the use of single-cell isolation and sequencing technologies provide

promising alternatives to this seemingly daunting endeavor (Walker

and Parkhill, 2008). Thus the identification of actively transcribed

gene clusters encoding small molecules uses both metagenomics

and metatranscriptomic approaches and is based on bioinformatic

tools to predict metabolite scaffold structure and reveal information

regarding physicochemical properties. Using this data the

metabolomics approach can be maneuvered to identify a fraction of

the molecules known to be expressed from gene clusters in a

defined spatial and temporal environmental setting. Additional

information regarding hypothetical chemical properties also

narrows the search space in the overall metabolite profile of the

community. This type of identification will require specialized

extraction protocols for the meta-metabolome and extremely

sensitive analytical tools in order to deconvolute the hundreds of

similar low-concentration metabolites found in such a complex

chemical background. Much hope is held on the application of the

ultrahigh-field Fourier transform ion cyclotron resonance mass

spectrometry (FTICR-MS) that has been useful to profile over 400

metabolites in a short period of time (Han et al., 2008). The

combination of all of these eco-systems biology approaches will

help us to mine and understand the metabolic potential concealed

in microbial populations (Raes and Bork, 2008).

Microarrays Microarrays are a powerful high-throughput technique for the

simultaneous analysis of thousands of target molecules that has

incredible potential for the detection of activities and monitoring

the dynamics of microbial communities. Microarrays, which have

been used extensively for analysis of gene expression, are being

adapted for use in environmental samples (Gentry et al., 2006).

They have the advantage of providing rapid information on a great

number of genes and supplying quantification data without having

to clone DNA. There have been spectacular advances in microarray

design and commercial availability, improving the coverage, density

and limit of detection of gene or transcript copies (Bouchie, 2002).

In environmental setups, microarray technology has not been as

extensively used as for genomic or transcriptomic comparisons of

single organisms. This is due to the relatively high amounts of

nucleic acids needed to detect a signal and to the complexity

underlying the design of multiple probes to target and cover an

uncharacterized diversity. Arrays designed for environmental

applications therefore contain probes for detection of well-defined

gene families of known environmental bacterial functions (Iwai et

al., 2008; Taroncher-Oldenburg et al., 2003; Wu et al., 2006). Due

to the difficulty in recovering large amounts of environmental

DNA, these arrays in many cases require PCR amplification of

specific genes prior to hybridization, a step that can introduce

biases. Alternatives to avoid biases associated with PCR

amplification include either extraction from larger amounts of

sample or the amplification of genomic material using the phi29

polymerase (Binga et al., 2008). In the case of protein coding genes,

the use of arrays can substantially increase our capacity to detect

small variants within the context of a particular gene family since all

known possible variations can be targeted simultaneously.

However, the detection of environmental mRNA is particularly

cumbersome due to the low amount of single gene transcripts,

which even for highly expressed genes can still be 100 times less

when compared to the more abundant rRNAs. Various arrays have

been developed for the study of microbial communities and these

include: 1) phylogenetic arrays based on 16S rRNA, 2) community

arrays with signature genes and 3) functional gene arrays with

information for genes involved in metabolic pathways.

The most extensively used phylogenetic marker in

microbial ecology is undoubtedly the 16S rRNA gene. This is an

ideal marker for community profiling given the large amount of

sequence data, coupled to the intrinsic characteristics of this

molecule. Phylogenetic arrays have only recently begun to be used

to study microbial communities in diverse settings, with some of

the first reports appearing in recent years (Loy et al., 2002) and

further extended to include analysis of either DNA or RNA

obtained from the environment (Adamczyk et al., 2003; El

Fantroussi et al., 2003; Gentry et al., 2006). A recently developed

high density 16S rRNA PhyloChip that targets 8741 bacterial and

archaeal taxa has been used to compare coverage with respect to

clone libraries and to inspect diversity in environmental

communities (DeSantis et al., 2007; Yergeau et al., 2007; Yergeau

et al., 2009). Functional arrays that contain genes involved in key

biogeochemical process, including a comprehensive array called

GeoChip, have also been developed and used for detecting activities

12

in microbial communities (He et al., 2007; Leigh et al., 2007; Rhee

et al., 2004; Steward et al., 2004; Yergeau et al., 2007).

Despite the great potential of applying microarray

technology for the specific, quantitative and rapid assessment of

microbial communities, the analysis of environmental samples

represents several challenges. As occurs with other strategies,

microarrays detect the most abundant organisms or molecules

present in a given ecosystem and can therefore have problems

associated with low sensitivity. There are also difficulties related

with recovery of genetic material due to low biomass present in the

sample or problems with extraction procedures. In addition, the

results can be difficult to interpret due to the large amount of array

data generated, information which can occasionally also be

misleading due to signals generated by cross-hybridization with

related sequences. Finally, and perhaps most importantly, is the fact

that microarrays rely on previously gathered information for probe

design and will therefore miss any novel genes found in the

community that are not represented in the array (Gentry et al.,

2006; Wagner et al., 2007). Thus exploratory studies using

microarrays may overlook functions residing in environmental

populations that have not yet been described and which might very

likely represent a large fraction of the community (Pignatelli et al.,

2008).

Future perspectives The field of microbial ecology has made substantial progress thanks

to novel molecular and genomic approaches that allow estimations

and explorations of the vast majority of uncultured microorganisms

in our planet. Metagenomics is now facing new challenges

precipitated by ongoing developments and novel tools for research

of complex microbial communities. As evidenced by recent reports,

the focus of these studies has started to shift from mere descriptions

of ecosystems to the generation of more comprehensive and

complex datasets aimed at deriving relevant ecological information.

Technological innovations, the development of more economical,

efficient and high-throughput strategies and modifications to

existing methodologies will most probably continue to flourish in

the near future. This will probably lead to increased access and

application of these technologies, prompting research into a

broader spectrum of environments. We will probably see “meta”

strategies being used successfully for investigating diverse microbial

consortia and addressing the role of uncultured microbes in their

natural settings. Tackling some of the fundamental and interesting

questions driving research in microbial ecology will however require

the integration of diverse fields of study, such as geochemistry,

biochemistry, and genetics, among others, and techniques that

expand on the basic metagenomics strategy and move beyond

towards a more integrative eco-systems biology approach. Thus

multidisciplinary teams and complementation with additional

“meta” approaches, such as metaproteomics, transcriptomics and

metabolomics to capture the expressed potential of microbial

populations, will surely lead to a more global and comprehensive

picture of the evolution, complexity and functionality of

environmental microbial communities (Maron et al., 2007b; Raes

and Bork, 2008). The incorporation of additional technologies like

cell sorting and microfluidics, together with advances in isolation

techniques, will prove extremely useful for complementing these

studies using isolates or more simplified communities. Thus

multifaceted approaches will probably become more extensively

used when engaging in comprehensive explorations of in situ

communities. In addition to providing novel genomic and

physiological information, these novel approaches will also prove to

be fundamental for the search and discovery of novel bacterial

functions for biotechnological or clinical applications. All together

the field promises stimulating new developments that will very

likely reshape our vision of microbial interactions and communities

in their natural settings.

Despite these exciting prospects, some of the inherent

difficulties associated with “omic” approaches to study whole

communities, such as efficient isolation of nucleic acids and

proteins from environmental samples, still hamper progress and

thus need to be overcome for the efficient integration of various

disciplines. It is anticipated, however, that the involvement of more

research groups will precipitate innovations and the capacity to

overcome many of these difficulties, paving the way for more in-

depth studies of microbial communities and diversity. One of the

key concerns for the future on any “meta” and “omic” approach is

how to handle and make sense of the vast amount of sequence data

that will be generated from such explorations (Chen and Pachter,

2005). The use of massively parallel sequencing technologies,

coupled to reduced costs, are expected to expand our capacity to

generate data. Therefore, the development of novel and

sophisticated bioinformatics tools will become essential for data

management and analysis of metagenomic data involving assembly,

identification and assignment of functions to expressed proteins

and phylogenetic affiliation to sequence reads. Another aspect of

importance in the field should involve reproducibility of results and

functional experimental validation of sequence-derived

information, an important point that has been largely neglected in

the post-genomic era, given the experimental challenges involved.

The capacity to explore ecosystems at an unprecedented

depth will undoubtedly lead to improvements on our actual survey

of microbial diversity. The deeper resolution obtained by the new

sequencing technologies, coupled to explorations using “omic”

approaches, will not only allow us to assess less abundant organisms

and yield clues regarding the prevalence and distribution of

particular groups of organisms, but will also lead to key

information about niche adaptation. One especially interesting

development in the last years has been the unprecedented capacity

of metagenomics to reveal viral diversity. Viruses, which are

abundant and harbor an immense genetic diversity, affect microbial

community dynamics and are therefore an integral part of

microbial ecology. It is expected that in the future the application

of “meta” approaches will broaden our view of this viral diversity

and include analyses regarding their ecological role (Allen and

Wilson, 2008). Thus as has occurred in the recent past, the

13

development of new technologies will open the way for more in-

depth and large-scale environmental explorations. The integration

of strategies and methodologies will add new dimensions to the

study of microbial communities, expand our appreciation of

microbial diversity and allow us to answer more sophisticated

questions regarding the role of microorganisms within a

community. These composite explorations will therefore prove to

be pivotal in our search for a more comprehensive understanding of

microbial community dynamics and function.

References

Adamczyk, J., Hesselsoe, M., Iversen, N., Horn, M., Lehner, A., Nielsen, P.H., Schloter,

M., Roslev, P., and Wagner, M. (2003). The isotope array, a new tool that employs substrate-mediated labeling of rRNA for determination of microbial community structure and function. Appl. Environ. Microbiol. 69, 6875-6887.

Allen, M.J., and Wilson, W.H. (2008). Aquatic virus diversity accessed through omic techniques: a route map to function. Curr. Opin. Microbiol. 11, 226-232.

Alm, E.W., Zheng, D., and Raskin, L. (2000). The presence of humic substances and DNA in RNA extracts affects hybridization results. Appl. Environ. Microbiol. 66, 4547-4554.

Amann, R.I., Ludwig, W., and Schleifer, K.H. (1995). Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59, 143-169.

Baltz, R.H. (2006). Marcel Faber Roundtable: is our antibiotic pipeline unproductive because of starvation, constipation or lack of inspiration? J. Ind. Microbiol. Biotechnol. 33, 507-513.

Beja, O., Aravind, L., Koonin, E.V., Suzuki, M.T., Hadd, A., Nguyen, L.P., Jovanovich, S.B., Gates, C.M., Feldman, R.A., Spudich, J.L., et al. (2000). Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 289, 1902-1906.

Benndorf, D., Balcke, G.U., Harms, H., and von Bergen, M. (2007). Functional metaproteome analysis of protein extracts from contaminated soil and groundwater. ISME J. 1, 224-234.

Beranova-Giorgianni, S. (2003). Proteome analysis by twodimensional gel electrophoresis and mass spectrometry: strengths and limitations. Trends Analyt. Chem. 22, 273-281.

Bertin, P.N., Medigue, C., and Normand, P. (2008). Advances in environmental genomics: towards an integrated view of micro-organisms and ecosystems. Microbiology 154, 347-359.

Binga, E.K., Lasken, R.S., and Neufeld, J.D. (2008). Something from (almost) nothing: the impact of multiple displacement amplification on microbial ecology. ISME J. 2, 233-241.

Bouchie, A. (2002). Shift anticipated in DNA microarray market. Nat Biotechnol 20, 8.

Cardenas, E., and Tiedje, J.M. (2008). New tools for discovering and characterizing microbial diversity. Curr. Opin. Biotechnol. 19, 544-549.

Cascante, M., Boros, L.G., Comin-Anduix, B., de Atauri, P., Centelles, J.J., and Lee, P.W. (2002). Metabolic control analysis in drug discovery and disease. Nat. Biotechnol. 20, 243-249.

Chen, K., and Pachter, L. (2005). Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comput. Biol. 1, 106-112.

Clardy, J., and Walsh, C. (2004). Lessons from natural molecules. Nature 432, 829-837.

Craig, J.W., Chang, F.Y., and Brady, S.F. (2009). Natural products from environmental DNA hosted in Ralstonia metallidurans. ACS Chem. Biol. 4, 23-28.

DeSantis, T.Z., Brodie, E.L., Moberg, J.P., Zubieta, I.X., Piceno, Y.M., and Andersen, G.L. (2007). High-density universal 16S rRNA microarray analysis reveals broader diversity than typical clone library when sampling the environment. Microb. Ecol. 53, 371-383.

Domon, B., and Aebersold, R. (2006). Mass spectrometry and protein analysis. Science 312, 212-217.

Donadio, S., Monciardini, P., and Sosio, M. (2007). Polyketide synthases and nonribosomal peptide synthetases: the emerging view from bacterial genomics. Nat. Prod. Rep. 24, 1073-1109.

Dunlap, W.C., Jaspars, M., Hranueli, D., Battershill, C.N., Peric-Concha, N., Zucko, J., Wright, S.H., and Long, P.F. (2006). New methods for medicinal chemistry--universal gene cloning and expression systems for production of marine bioactive metabolites. Curr. Med. Chem. 13, 697-710.

El Fantroussi, S., Urakawa, H., Bernhard, A.E., Kelly, J.J., Noble, P.A., Smidt, H., Yershov, G.M., and Stahl, D.A. (2003). Direct profiling of environmental microbial populations by thermal dissociation analysis of native rRNAs hybridized to oligonucleotide microarrays. Appl. Environ. Microbiol. 69, 2377-2382.

Felske, A., Engelen, B., Nubel, U., and Backhaus, H. (1996a). Direct ribosome isolation from soil to extract bacterial rRNA for community analysis. Appl. Environ. Microbiol. 62, 4162-4167.

Felske, A., Engelen, B., Nubel, U., and Backhaus, H. (1996b). Direct ribosome isolation from soil to extract bacterial rRNA for community analysis. Appl Environ Microbiol 62, 4162-4167.

Fischbach, M.A., Walsh, C.T., and Clardy, J. (2008). The evolution of gene collectives: How natural selection drives chemical innovation. Proc. Natl. Acad. Sci. U S A 105, 4601-4608.

Foerstner, K.U., Doerks, T., Creevey, C.J., Doerks, A., and Bork, P. (2008). A computational screen for type I polyketide synthases in metagenomics shotgun data. PLoS ONE 3, e3515.

Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L., Schuster, S.C., Chisholm, S.W., and Delong, E.F. (2008). Microbial community gene expression in ocean surface waters. Proc. Natl. Acad. Sci. U S A 105, 3805-3810.

Gentry, T.J., Wickham, G.S., Schadt, C.W., He, Z., and Zhou, J. (2006). Microarray applications in microbial ecology research. Microb. Ecol. 52, 159-175.

Gianoulis, T.A., Raes, J., Patel, P.V., Bjornson, R., Korbel, J.O., Letunic, I., Yamada, T., Paccanaro, A., Jensen, L.J., Snyder, M., et al. (2009). Quantifying environmental adaptation of metabolic pathways in metagenomics. Proc. Natl. Acad. Sci. U S A 106, 1374-1379.

Gilbert, J.A., Field, D., Huang, Y., Edwards, R., Li, W., Gilna, P., and Joint, I. (2008). Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE 3, e3042.

Gilbert, J.A., Thomas, S., Cooley, N.A., Kulakova, A., Field, D., Booth, T., McGrath, J.W., Quinn, J.P., and Joint, I. (2009). Potential for phosphonoacetate utilization by marine bacteria in temperate coastal waters. Environ. Microbiol. 11, 111-125.

Ginolhac, A., Jarrin, C., Gillet, B., Robe, P., Pujic, P., Tuphile, K., Bertrand, H., Vogel, T.M., Perriere, G., Simonet, P., et al. (2004). Phylogenetic analysis of polyketide synthase I domains from soil metagenomic libraries allows selection of promising clones. Appl. Environ. Microbiol. 70, 5522-5527.

Giovannoni, S.J., Bibbs, L., Cho, J.C., Stapels, M.D., Desiderio, R., Vergin, K.L., Rappe, M.S., Laney, S., Wilhelm, L.J., Tripp, H.J., et al. (2005). Proteorhodopsin in the ubiquitous marine bacterium SAR11. Nature 438, 82-85.

Gonzalez, J.M., Portillo, M.C., and Saiz-Jimenez, C. (2005). Multiple displacement amplification as a pre-polymerase chain reaction (pre-PCR) to process difficult to amplify samples and low copy number sequences from natural environments. Environ. Microbiol. 7, 1024-1028.

Goodacre, R., Vaidyanathan, S., Dunn, W.B., Harrigan, G.G., and Kell, D.B. (2004). Metabolomics by numbers: acquiring and understanding global metabolite data. Trends Biotechnol. 22, 245-252.

Graham, R.L.j, Graham, C., and McMullan, G. (2007). Microbial proteomics: a mass spectrometry primer for biologists. Microb. Cell Fact. 6, 26.

Graves, P.R., and Haystead, T.A. (2002). Molecular biologist's guide to proteomics. Microbiol. Mol. Biol. Rev. 66, 39-63.

Griffiths, R.I., Whiteley, A.S., O'Donnell, A.G., and Bailey, M.J. (2000). Rapid method for coextraction of DNA and RNA from natural environments for analysis of ribosomal DNA- and rRNA-based microbial community composition. Appl. Environ. Microbiol. 66, 5488-5491.

Hagenstein, M.C., and Sewald, N. (2006). Chemical tools for activity-based proteomics. J. Biotechnol. 124, 56-73.

Han, J., Danell, R.M., Patel, J.R., Gumerov, D.R., Scarlett, C.O., Speir, J.P., Parker, C.E., Rusyn, I., Zeisel, S., and Borchers, C.H. (2008). Towards high-throughput metabolomics using ultrahigh-field Fourier transform ion cyclotron resonance mass spectrometry. Metabolomics 4, 128-140.

He, Z., Gentry, T.J., Schadt, C.W., Wu, L., Liebich, J., Chong, S.C., Huang, Z., Wu, W., Gu, B., Jardine, P., et al. (2007). GeoChip: a comprehensive microarray for investigating biogeochemical, ecological and environmental processes. ISME J. 1, 67-77.

Hecker, M. (2003). A proteomic view of cell physiology of Bacillus subtilis--bringing the genome sequence to life. Adv. Biochem. Eng. Biotechnol. 83, 57-92.

Henne, A., Schmitz, R.A., Bomeke, M., Gottschalk, G., and Daniel, R. (2000). Screening of environmental DNA libraries for the presence of genes conferring lipolytic activity on Escherichia coli. Appl. Environ. Microbiol. 66, 3113-3116.

Hollywood, K., Brison, D.R., and Goodacre, R. (2006). Metabolomics: current technologies and future trends. Proteomics 6, 4716-4723.

14

Hurt, R.A., Qiu, X., Wu, L., Roh, Y., Palumbo, A.V., Tiedje, J.M., and Zhou, J. (2001). Simultaneous recovery of RNA and DNA from soils and sediments. Appl. Environ. Microbiol. 67, 4495-4503.

Ikeda, H., Ishikawa, J., Hanamoto, A., Shinose, M., Kikuchi, H., Shiba, T., Sakaki, Y., Hattori, M., and Omura, S. (2003). Complete genome sequence and comparative analysis of the industrial microorganism Streptomyces avermitilis. Nat. Biotechnol. 21, 526-531.

Iwai, S., Kurisu, F., Urakawa, H., Yagi, O., Kasuga, I., and Furumai, H. (2008). Development of an oligonucleotide microarray to detect di- and monooxygenase genes for benzene degradation in soil. FEMS Microbiol. Lett. 285, 111-121.

Kan, J., Hanson, T.E., Ginter, J.M., Wang, K., and Chen, F. (2005). Metaproteomic analysis of Chesapeake Bay microbial communities. Saline Syst. 1, 7.

Kell, D.B. (2006). Systems biology, metabolic modelling and metabolomics in drug discovery and development. Drug Discov. Today 11, 1085-1092.

Kim, T.K., and Fuerst, J.A. (2006). Diversity of polyketide synthase genes from bacteria associated with the marine sponge Pseudoceratina clavata: culture-dependent and culture-independent approaches. Environ. Microbiol. 8, 1460-1470.

Klaassens, E.S., de Vos, W.M., and Vaughan, E.E. (2007). Metaproteomics approach to study the functionality of the microbiota in the human infant gastrointestinal tract. Appl. Environ. Microbiol. 73, 1388-1392.

Kuechenmeister, L.J., Anderson, K.L., Morrison, J.M., and Dunman, P.M. (2009). The use of molecular beacons to directly measure bacterial mRNA abundances and transcript degradation. J. Microbiol. Methods 76, 146-151.

Lacerda, C.M., Choe, L.H., and Reardon, K.F. (2007). Metaproteomic analysis of a bacterial community response to cadmium exposure. J. Proteome Res. 6, 1145-1152.

Lasaosa, M. (2008). Two-dimensional reverse-phase liquid chromatography coupled to MALDI TOF/TOF mass spectometry: an approach to shotgun proteome analysis. (University of Saarland).

Lasken, R.S. (2007). Single-cell genomic sequencing using Multiple Displacement Amplification. Curr. Opin. Microbiol. 10, 510-516.

Leigh, M.B., Pellizari, V.H., Uhlik, O., Sutka, R., Rodrigues, J., Ostrom, N.E., Zhou, J., and Tiedje, J.M. (2007). Biphenyl-utilizing bacteria and their functional genes in a pine root zone contaminated with polychlorinated biphenyls (PCBs). ISME J. 1, 134-148.

Leininger, S., Urich, T., Schloter, M., Schwark, L., Qi, J., Nicol, G.W., Prosser, J.I., Schuster, S.C., and Schleper, C. (2006). Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature 442, 806-809.

Lo, I., Denef, V.J., Verberkmoes, N.C., Shah, M.B., Goltsman, D., DiBartolo, G., Tyson, G.W., Allen, E.E., Ram, R.J., Detter, J.C., et al. (2007). Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature 446, 537-541.

Loy, A., Lehner, A., Lee, N., Adamczyk, J., Meier, H., Ernst, J., Schleifer, K.H., and Wagner, M. (2002). Oligonucleotide microarray for 16S rRNA gene-based detection of all recognized lineages of sulfate-reducing prokaryotes in the environment. Appl. Environ. Microbiol. 68, 5064-5081.

Lueders, T., Manefield, M., and Friedrich, M.W. (2004). Enhanced sensitivity of DNA- and rRNA-based stable isotope probing by fractionation and quantitative analysis of isopycnic centrifugation gradients. Environ. Microbiol. 6, 73-78.

MacNeil, I.A., Tiong, C.L., Minor, C., August, P.R., Grossman, T.H., Loiacono, K.A., Lynch, B.A., Phillips, T., Narula, S., Sundaramoorthi, R., et al. (2001). Expression and isolation of antimicrobial small molecules from soil DNA libraries. J. Mol. Microbiol. Biotechnol. 3, 301-308.

Maron, P.A., Maitre, M., Mercier, A., Henri Lejon, D.P., Nowak, V., and Ranjard, L. (2008). Protein and DNA fingerprinting of a soil bacterial community inoculated into three different sterile soils. Res. Microbiol. 159, 231-236.

Maron, P.A., Mougel, C., Siblot, S., Abbas, H., Lemanceau, P., and Ranjard, L. (2007a). Protein extraction and fingerprinting optimization of bacterial communities in natural environment. Microb. Ecol. 53, 426-434.

Maron, P.A., Ranjard, L., Mougel, C., and Lemanceau, P. (2007b). Metaproteomics: a new approach for studying functional microbial ecology. Microb. Ecol. 53, 486-493.

Martinez, A., Kolvek, S.J., Yip, C.L., Hopke, J., Brown, K.A., MacNeil, I.A., and Osburne, M.S. (2004a). Genetically modified bacterial strains and novel bacterial artificial chromosome shuttle vectors for constructing environmental libraries and detecting heterologous natural products in multiple expression hosts. Appl. Environ. Microbiol. 70, 2452-2463.

Martinez, A., Kolvek, S.J., Yip, C.L., Hopke, J., Brown, K.A., MacNeil, I.A., and Osburne, M.S. (2004b). Genetically modified bacterial strains and novel bacterial artificial chromosome shuttle vectors for constructing environmental libraries and detecting heterologous natural products in multiple expression hosts. Appl. Environ. Microbiol. 70, 2452-2463.

Mashego, M.R., Rumbold, K., De Mey, M., Vandamme, E., Soetaert, W., and Heijnen, J.J. (2007). Microbial metabolomics: past, present and future methodologies. Biotechnol. Lett. 29, 1-16.

McGrath, K.C., Thomas-Hall, S.R., Cheng, C.T., Leo, L., Alexa, A., Schmidt, S., and Schenk, P.M. (2008). Isolation and analysis of mRNA from environmental microbial communities. J. Microbiol. Methods 75, 172-176.

Mendes, P., Kell, D.B., and Westerhoff, H.V. (1996). Why and when channelling can decrease pool size at constant net flux in a simple dynamic channel. Biochim. Biophys. Acta 1289, 175-186.

Molloy, M.P., Herbert, B.R., Slade, M.B., Rabilloud, T., Nouwens, A.S., Williams, K.L., and Gooley, A.A. (2000). Proteomic analysis of the Escherichia coli outer membrane. Eur. J. Biochem. 267, 2871-2881.

Nesatyy, V.J., and Suter, M.J. (2007). Proteomics for the analysis of environmental stress responses in organisms. Environ. Sci. Technol. 41, 6891-6900.

Nielsen, J. (2003). It is all about metabolic fluxes. J. Bacteriol. 185, 7031-7035. Nogales, B., Moore, E.R., Llobet-Brossa, E., Rossello-Mora, R., Amann, R., and Timmis,

K.N. (2001a). Combined use of 16S ribosomal DNA and 16S rRNA to study the bacterial community of polychlorinated biphenyl-polluted soil. Appl. Environ. Microbiol. 67, 1874-1884.

Nogales, B., Moore, E.R., Llobet-Brossa, E., Rossello-Mora, R., Amann, R., and Timmis, K.N. (2001b). Combined use of 16S ribosomal DNA and 16S rRNA to study the bacterial community of polychlorinated biphenyl-polluted soil. Appl. Environ. Microbiol. 67, 1874-1884.

Ogunseitan, O.A. (2006). Soil Proteomics: Extraction and Analysis of Proteins from Soils. In Nucleic acids and proteins in soil, P. Nannipieri, and K. Smalla, eds. (Berlin, Springer), pp. 95-115.

Oldiges, M., Lutz, S., Pflug, S., Schroer, K., Stein, N., and Wiendahl, C. (2007). Metabolomics: current state and evolving methodologies and tools. Appl. Microbiol. Biotechnol. 76, 495-511.

Omura, S., Ikeda, H., Ishikawa, J., Hanamoto, A., Takahashi, C., Shinose, M., Takahashi, Y., Horikawa, H., Nakazawa, H., Osonoe, T., et al. (2001). Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites. Proc. Natl. Acad. Sci. U S A 98, 12215-12220.

Park, C., Novak, J.T., Helm, R.F., Ahn, Y.O., and Esen, A. (2008). Evaluation of the extracellular proteins in full-scale activated sludges. Water Res. 42, 3879-3889.

Parro, V., Moreno-Paz, M., and Gonzalez-Toril, E. (2007). Analysis of environmental transcriptomes by DNA microarrays. Environ. Microbiol. 9, 453-464.

Peric-Concha, N., and Long, P.F. (2003). Mining the microbial metabolome: a new frontier for natural product lead discovery. Drug Discov. Today 8, 1078-1084.

Piel, J., Hui, D., Fusetani, N., and Matsunaga, S. (2004). Targeting modular polyketide synthases with iteratively acting acyltransferases from metagenomes of uncultured bacterial consortia. Environ. Microbiol. 6, 921-927.

Pignatelli, M., Aparicio, G., Blanquer, I., Hernandez, V., Moya, A., and Tamames, J. (2008). Metagenomics reveals our incomplete knowledge of global diversity. Bioinformatics 24, 2124-2125.

Raes, J., and Bork, P. (2008). Molecular eco-systems biology: towards an understanding of community function. Nat. Rev. Microbiol. 6, 693-699.

Ram, R.J., Verberkmoes, N.C., Thelen, M.P., Tyson, G.W., Baker, B.J., Blake, R.C., 2nd, Shah, M., Hettich, R.L., and Banfield, J.F. (2005). Community proteomics of a natural microbial biofilm. Science 308, 1915-1920.

Rhee, S.K., Liu, X., Wu, L., Chong, S.C., Wan, X., and Zhou, J. (2004). Detection of genes involved in biodegradation and biotransformation in microbial communities by using 50-mer oligonucleotide microarrays. Appl. Environ. Microbiol. 70, 4303-4317.

Rodriguez-Valera, F. (2004). Environmental genomics, the big picture? FEMS Microbiol. Lett. 231, 153-158.

Roh, C., Villatte, F., Kim, B.G., and Schmid, R.D. (2006). Comparative study of methods for extraction and purification of environmental DNA from soil and sludge samples. Appl. Biochem. Biotechnol. 134, 97-112.

Rondon, M.R., August, P.R., Bettermann, A.D., Brady, S.F., Grossman, T.H., Liles, M.R., Loiacono, K.A., Lynch, B.A., MacNeil, I.A., Minor, C., et al. (2000a). Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl. Environ. Microbiol. 66, 2541-2547.

Rondon, M.R., August, P.R., Bettermann, A.D., Brady, S.F., Grossman, T.H., Liles, M.R., Loiacono, K.A., Lynch, B.A., MacNeil, I.A., Minor, C., et al. (2000b). Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl. Environ. Microbiol. 66, 2541-2547.

15

Sattely, E.S., Fischbach, M.A., and Walsh, C.T. (2008). Total biosynthesis: in vitro reconstitution of polyketide and nonribosomal peptide pathways. Nat. Prod. Rep. 25, 757-793.

Schirmer, A., Gadkari, R., Reeves, C.D., Ibrahim, F., DeLong, E.F., and Hutchinson, C.R. (2005). Metagenomic analysis reveals diverse polyketide synthase gene clusters in microorganisms associated with the marine sponge Discodermia dissoluta. Appl. Environ. Microbiol. 71, 4840-4849.

Schmidt, T.M. (2006). The maturing of microbial ecology. Int. Microbiol. 9, 217-223. Schulze, W.X., Gleixner, G., Kaiser, K., Guggenberger, G., Mann, M., and Schulze, E.D.

(2005). A proteomic fingerprint of dissolved organic carbon and of soil particles. Oecologia 142, 335-343.

Schweder, T., Markert, S., and Hecker, M. (2008). Proteomics of marine bacteria. Electrophoresis 29, 2603-2616.

Seino, T., Sato, H., Yamamoto, A., Nemoto, A., Torimura, M., and Tao, H. (2007). Matrix-free laser desorption/ionization-mass spectrometry using self-assembled germanium nanodots. Anal. Chem. 79, 4827-4832.

Shrestha, P.M., Kube, M., Reinhardt, R., and Liesack, W. (2008). Transcriptional activity of paddy soil bacterial communities. Environ Microbiol.

Small, J., Call, D.R., Brockman, F.J., Straub, T.M., and Chandler, D.P. (2001a). Direct detection of 16S rRNA in soil extracts by using oligonucleotide microarrays. Appl. Environ. Microbiol. 67, 4708-4716.

Small, J., Call, D.R., Brockman, F.J., Straub, T.M., and Chandler, D.P. (2001b). Direct detection of 16S rRNA in soil extracts by using oligonucleotide microarrays. Appl. Environ. Microbiol. 67, 4708-4716.

Sowell, S.M., Norbeck, A.D., Lipton, M.S., Nicora, C.D., Callister, S.J., Smith, R.D., Barofsky, D.F., and Giovannoni, S.J. (2008). Proteomic analysis of stationary phase in the marine bacterium "Candidatus Pelagibacter ubique". Appl. Environ. Microbiol. 74, 4091-4100.

Sowell, S.M., Wilhelm, L.J., Norbeck, A.D., Lipton, M.S., Nicora, C.D., Barofsky, D.F., Carlson, C.A., Smith, R.D., and Giovanonni, S.J. (2009). Transport functions dominate the SAR11 metaproteome at low-nutrient extremes in the Sargasso Sea. ISME J. 3, 93-105.

Starcevic, A., Zucko, J., Simunkovic, J., Long, P.F., Cullum, J., and Hranueli, D. (2008). ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures. Nucleic Acids Res. 36, 6882-6892.

Stenuit, B., Eyers, L., Schuler, L., Agathos, S.N., and George, I. (2008). Emerging high-throughput approaches to analyze bioremediation of sites contaminated with hazardous and/or recalcitrant wastes. Biotechnol. Adv. 26, 561-575.

Steward, G.F., Jenkins, B.D., Ward, B.B., and Zehr, J.P. (2004). Development and testing of a DNA macroarray to assess nitrogenase (nifH) gene diversity. Appl. Environ. Microbiol. 70, 1455-1465.

Steward, G.F., and Rappe, M.S. (2007). What's the 'meta' with metagenomics? ISME J. 1, 100-102.

Tao, F. (2008). 1st NCI annual meeting on Clinical Proteomic Technologies for Cancer. Expert Rev. Proteomics 5, 17-20.

Taroncher-Oldenburg, G., Griner, E.M., Francis, C.A., and Ward, B.B. (2003). Oligonucleotide microarray for the study of functional gene diversity in the nitrogen cycle in the environment. Appl. Environ. Microbiol. 69, 1159-1171.

Thomas, T., Egan, S., Burg, D., Ng, C., Ting, L., and Cavicchioli, R. (2007). Integration of genomics and proteomics into marine microbial ecology. Mar. ecol. Prog. series 332, 291-299.

Tringe, S.G., and Hugenholtz, P. (2008). A renaissance for the pioneering 16S rRNA gene. Curr. Opin. Microbiol. 11, 442-446.

Turnbaugh, P.J., and Gordon, J.I. (2008). An invitation to the marriage of metagenomics and metabolomics. Cell 134, 708-713.

Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., Solovyev, V.V., Rubin, E.M., Rokhsar, D.S., and Banfield, J.F. (2004). Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37-43.

Urbanczyk-Wochniak, E., Luedemann, A., Kopka, J., Selbig, J., Roessner-Tunali, U., Willmitzer, L., and Fernie, A.R. (2003). Parallel analysis of transcript and metabolic profiles: a new approach in systems biology. EMBO Rep. 4, 989-993.

Urich, T., Lanzen, A., Qi, J., Huson, D.H., Schleper, C., and Schuster, S.C. (2008). Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PLoS ONE 3, e2527.

Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. (1995). Serial analysis of gene expression. Science 270, 484-487.

Wagner, M., Smidt, H., Loy, A., and Zhou, J. (2007). Unravelling microbial communities with DNA-microarrays: challenges and future directions. Microb. Ecol. 53, 498-506.

Walker, A., and Parkhill, J. (2008). Single-cell genomics. Nat. Rev. Microbiol. 6, 176-177.

Wang, G.Y., Graziani, E., Waters, B., Pan, W., Li, X., McDermott, J., Meurer, G., Saxena, G., Andersen, R.J., and Davies, J. (2000). Novel natural products from soil DNA libraries in a streptomycete host. Org. Lett. 2, 2401-2404.

Warnecke, F., and Hugenholtz, P. (2007). Building on basic metagenomics with complementary technologies. Genome Biol. 8, 231.

Watve, M.G., Tickoo, R., Jog, M.M., and Bhole, B.D. (2001). How many antibiotics are produced by the genus Streptomyces? Arch. Microbiol. 176, 386-390.

Weinbauer, M.G., Fritz, I., Wenderoth, D.F., and Hofle, M.G. (2002). Simultaneous extraction from bacterioplankton of total RNA and DNA suitable for quantitative structure and function analyses. Appl. Environ. Microbiol. 68, 1082-1087.

Wilke, A., Ruckert, C., Bartels, D., Dondrup, M., Goesmann, A., Huser, A.T., Kespohl, S., Linke, B., Mahne, M., McHardy, A., et al. (2003). Bioinformatics support for high-throughput proteomics. J. Biotechnol. 106, 147-156.

Wilmes, P., and Bond, P.L. (2004). The application of two-dimensional polyacrylamide gel electrophoresis and downstream analyses to a mixed community of prokaryotic microorganisms. Environ. Microbiol. 6, 911-920.

Wilmes, P., and Bond, P.L. (2006). Metaproteomics: studying functional gene expression in microbial ecosystems. Trends Microbiol. 14, 92-97.

Wilmes, P., Wexler, M., and Bond, P.L. (2008). Metaproteomics provides functional insight into activated sludge wastewater treatment. PLoS ONE 3, e1778.

Wu, L., Liu, X., Schadt, C.W., and Zhou, J. (2006). Microarray-based analysis of subnanogram quantities of microbial community DNAs by using whole-community genome amplification. Appl. Environ. Microbiol. 72, 4931-4941.

Xu, J. (2006). Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances. Mol. Ecol. 15, 1713-1731.

Yang, P., and Zhang, Z. (2008). A Clustering Based Hybrid System for Mass Spectrometry Data Analysis. In Pattern Recognition in Bioinformatics, M. Chetty, A. Ngom, and S. Ahmad, eds. (Heidelberg, Springer Berlin), pp. 98-109.

Yergeau, E., Kang, S., He, Z., Zhou, J., and Kowalchuk, G.A. (2007). Functional microarray analysis of nitrogen and carbon cycling genes across an Antarctic latitudinal transect. ISME J. 1, 163-179.

Yergeau, E., Schoondermark-Stolk, S.A., Brodie, E.L., Dejean, S., DeSantis, T.Z., Goncalves, O., Piceno, Y.M., Andersen, G.L., and Kowalchuk, G.A. (2009). Environmental microarray analyses of Antarctic soil microbial communities. ISME J. 3, 340-351.

beyond metagenomics- integration of complementary approaches for the study of microbial communities

Technology