eisen.all hands

43
A phylogeny driven genomic encyclopedia of bacteria and archaea (or what is GEBA anyway?) Jonathan A. Eisen October 27, 2009

Upload: jonathan-eisen

Post on 28-Jan-2015

105 views

Category:

Health & Medicine


0 download

DESCRIPTION

Talk summarizing our GEBA Genomic Encylopedia of Bacteria and Archaea project for "All Hands" meeting at the Joint Genome Institute

TRANSCRIPT

Page 1: Eisen.All Hands

A phylogeny driven genomic encyclopedia of bacteria and archaea

(or what is GEBA anyway?)

Jonathan A. EisenOctober 27, 2009

Page 2: Eisen.All Hands

From http://genomesonline.org

Page 3: Eisen.All Hands

rRNA Tree of Life

Page 4: Eisen.All Hands

The Tree is not Happy

Page 5: Eisen.All Hands

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Page 6: Eisen.All Hands

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Page 7: Eisen.All Hands

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Based on Hugenholtz, 2002

Page 8: Eisen.All Hands

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Archaea

As of 2002

Based on Hugenholtz, 2002

Page 9: Eisen.All Hands

Need for Tree Guidance Well Established

• Common approach within some eukaryotic groups

• Many small projects funded to fill in some bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature

Page 10: Eisen.All Hands

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 100 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

• Solution - use tree to really fill gaps

Well sampled phyla

Page 11: Eisen.All Hands

http://www.jgi.doe.gov/programs/GEBA/pilot.html

Page 12: Eisen.All Hands

GEBA Pilot Project Overview

• Identify major branches in rRNA tree for which no genomes are available

• Identify a cultured representative for each group

• Grow > 200 of these and prep. DNA• Sequence and finish 100• Annotate, analyze, release data• Assess benefits of tree guided sequencing

Page 13: Eisen.All Hands

GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan

Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,

Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et

al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor

Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)

Page 14: Eisen.All Hands

Some Lessons From GEBA

Page 15: Eisen.All Hands

GEBA Lesson 1

rRNA Tree of Life is a Useful Guide and Genomes Improve Resolution

Page 16: Eisen.All Hands
Page 17: Eisen.All Hands

GEBA Lesson 2

Phylogenetically Guided Selection Can Help Annotate Other Genomes

Page 18: Eisen.All Hands

Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling

• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”

based predictions• Conversion of hypothetical into conserved

hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction

Kostas Mavrommatis

Natalia Ivanova

Thanos Lykidis

Nikos Kyrpides

Iain Anderson

Page 19: Eisen.All Hands

GEBA Lesson 3

Phylogenetically Guided Selection Can Help Study Uncultured

Organisms

Page 20: Eisen.All Hands

Environmental Shotgun Sequencing

shotgun

sequence

Page 21: Eisen.All Hands

ABCDEFG

TUVWXYZ

Binning challenge

Page 22: Eisen.All Hands

Metagenomic Analysis Improves

Sean Hooper

Amrita Pati

• Small but real improvement in metagenomic annotation and analysis

Page 23: Eisen.All Hands

GEBA Lesson 4

We have still only scratched the surface of microbial diversity

Page 24: Eisen.All Hands

Protein Family Rarefaction Curves

• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Page 25: Eisen.All Hands
Page 26: Eisen.All Hands
Page 27: Eisen.All Hands
Page 28: Eisen.All Hands
Page 29: Eisen.All Hands
Page 30: Eisen.All Hands

Phylogenetic Distribution Novelty: 1st Bacterial Actin Related Protein

Haliangium ochraceum DSM 14365

Victor Kunin

Patrik D’haeseleer

Adam Zemla

Page 31: Eisen.All Hands

Phylogenetic Diversity with GEBA

Page 32: Eisen.All Hands

Phylogenetic Diversity: Isolates

Page 33: Eisen.All Hands

Phylogenetic Diversity: All

Page 34: Eisen.All Hands

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

Well sampled phylaPoorly sampled

No cultured taxa

Page 35: Eisen.All Hands

Uncultured Lineages:Technical Approaches

• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification

Page 36: Eisen.All Hands

GEBA Lesson 6

Need Experiments from Across the Tree of Life too

Page 37: Eisen.All Hands

Adopt a Microbe

Page 38: Eisen.All Hands
Page 39: Eisen.All Hands

MICROBES

Page 40: Eisen.All Hands

A Happy Tree of Life

Page 41: Eisen.All Hands

Related Lesson 1

METADATA ROCKS

Page 42: Eisen.All Hands

SIGS

• The Genomic Standards Consortium • The GSC is an open-membership working body which

formed in September 2005. • The goal of this international community is to promote

mechanisms that standardize the description of genomes and the exchange and integration of genomic data.

• See http://gensc.org/gc_wiki/index.php/Main_Page

Page 43: Eisen.All Hands