approaches for our growing metagenomes kostas konstantinidis carlton s. wilder associate professor...

11
Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School of Biology (Adjunct), Center for Bioinformatics and Computational Genomics Georgia Institute of Technology ISME 15 Aug 25 th , 2014

Upload: addison-lock

Post on 15-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

Approaches for our

growing metagenomes

Kostas Konstantinidis

Carlton S. Wilder Associate ProfessorSchool of Civil and Environmental Engineering &

School of Biology (Adjunct),Center for Bioinformatics and Computational Genomics

Georgia Institute of Technology

ISME 15 Aug 25th, 2014

Page 2: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

Adina Howe’s ideas for discussion

- How do you deal with poorly replicated data? The low n high p problem? - What are the best approaches to re-analyze previous datasets with improved

tools? - What is the progress on integrating different sequencing platforms? - How big a computer do I really need to do everything I want? Is it reasonable to

expect access to this for myself? - Is metagenomics really useful and worth the investment? - What are the most useful tools you use regularly? - How do you reduce dataset sizes? - How do you share data? - What kind of statistical tests are appropriate for low replicate data? - What are the assumptions you make for metagenomics data/analyses? - Which assumptions should you not make ever? Or which will come back and haunt

us? - What are the best metagenomic datasets? - What is the dream experiment/dataset? - What is the single largest obstacle in tackling a metagenome? - How much data do I need? Is it possible for there to be too much data? - Do you sequence deeper or for more replicates? - How do you evaluate statistical power of your approaches? - How do you visualize enormous datasets?

Too many! I will focus on a few…

Page 3: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

Is shotgun metagenomics really useful?

Not a panacea (like any other technology!)…but a powerful, hypothesis-generating tool.

If experiment is designed well, metagenomics can also provide a mechanistic understanding of how microbes and their communities evolve, respond to perturbations, which genes they exchange horizontally, what mutations are selected, etc.A few recent examples from our

groupLuo et al, AEM 2014

Oh et al., Env. Microb 2013

Examples from our group in this meeting

Minjae Kim’s talk on ThursdayKostas’ talk on Friday

Page 4: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

Not much because replicates typically give the same picture (gene amplicons may be a different story). Differentially abundant taxa, gene, pathways are easily detectable when differences are not marginal.

For time-series: usually 3 replicates for one sampling point; for the rest sampling points, no replication.

More replicates (n>=6) when we want to detect marginal difference between treatments. DESeq is powerful package.

Always include a mock sample (i.e., one that you know who is there and how abundant) to test for artifacts/errors, especially for gene amplicon work.

How much replication?

Page 5: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

What coverage to obtain and why it matters

From Rodriguez-R and Konstantinidis, ISME 2014

Effect of average coverage on detection of differentially abundant featuresA winter and a summer

shotgun metagenome dataset form Lake Lanier time series (Atlanta, GA) were subsampled and compared.

• Datasets with average coverage > ~50% perform well (e.g., assembly; detect differences).

• Avoid comparisons between datasets that differ >2 fold in terms of coverage.

Page 6: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

Need for new toolsNonpareil: Estimating coverage level of

metagenomes

Rodriguez-R and Konstantinidis,ISME 2014

Our approach examines the redundancy of reads. It is free from assembly, reference gene databases (e.g., 16S rRNA gene), or clustering OTUs.

Note that more diverse communities require larger sequencing efforts to achieve the same level of coverage, hence located rightward in the plot.

Available throughwww.enve-omics.gatech.edu

Page 7: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

How to select the right tool?

-Test the tool first on a mock dataset! Sometimes the code does not work as it is supposed to, or you anticipated…

-Learn some Perl/Python!

From Luo, Rodriguez-R and Konstantinidis,Methods in Enzymology 2013

Page 8: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

Some (potentially) useful approaches

An approach to assess assembly parameters and results based on in-silico generated “spiked-in”

metagenomes

For some additional approaches, see:Luo, Rodriguez-R and Konstantinidis,

Methods in Enzymology 2013

Page 9: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

Challenges remaining

Gene functional annotation. Propagation of wrong/poor annotations; many genes still hypothetical. Need to keep supporting experimental work to decipher gene functions and curated databases.

Tools do not scale with the volume of data that become available. Need to work closer with computer engineers and scientists.

Binning of assembled contigs into populations, especially in complex communities (e.g., to model what each member of the community does). New approaches needed; longer sequencing reads; single cells.

Page 10: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

Additional lab presentations at ISME

Minjae Kim

Seasonal changes and nitrogen cycle genes in midwestern agricultural soils as revealed by metagenomics. Poster 199B, Tuesday.

Expanding the bioinformatics toolbox for the analysis of genomes and metagenomes. Poster 204B, Tuesday.

Microbial community degradation of widely used quaternary ammonium disinfectants and implications for controlling disinfectant-induced antibiotic resistance. Contributed talk 1400, Thursday.

Metagenomics reveal that bacterial species exist. Invited talk, Friday.

Page 11: Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School

AcknowledgementsKonstantinidis

LabJanet Hatt, Ph.D.

Michael Weigand, Ph.D.

Samantha Waters, PhD

Despina Tsementzi

Natasha DeLeon

Luis Orellana

Luis-Miguel Rodriguez-R.

Eric Johnston

Juliana Soto

Angela Pena

Minjae Kim

Yuanqi Wang

www.enve-omics.gatech.eduInterested? Email:

[email protected]

Funding