sequencing all of microbial life: challenges and opportunities

Post on 03-Jan-2016

60 Views

Category:

Documents

35 Downloads

Preview:

Click to see full reader

DESCRIPTION

Sequencing All of Microbial Life: Challenges and Opportunities. Rob Edwards Argonne National Laboratory San Diego State University. How much has been sequenced. 100 bacterial genomes. Environmental sequencing. Number of known sequences. First bacterial genome. 1,000 bacterial - PowerPoint PPT Presentation

TRANSCRIPT

Sequencing All of Microbial Life: Challenges and Opportunities

Rob Edwards

Argonne National LaboratorySan Diego State University

Firstbacterial genome

100bacterial genomes

1,000bacterial genomes

Num

ber

of

know

n s

equence

s

Year

How much has been sequenced

Environmentalsequencing

Everybody inToronto

Everybody inNorth America

AllculturedBacteria

100people

How much will be sequenced

One genome fromevery species

Most majormicrobial environments

Rank Abundance Curves, Papers vs Genomes

• Microbial publications vs Genomes by Family

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

16S Abundance -- Human Intestine

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

16S Abundance -- Upland Pasture Soil

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Environmental Genomics -- Wisconsin Soil

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Line Island Metagenomics Transect

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Environmental Genomics -- Whale Fall

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

There are big gaps in sequence space• 6,400 total taxa

• About 380 are human, animal or plant pathogens

• 360 complete prokaryotic genomes published

• 56 archaeal and 940 bacterial genomes in progress– ~400 are pathogens

• Approximately ~5,000 prokaroytes not yet in play– We estimate about 4,800 non-pathogen taxa

The Bergey’s Manual

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

David H. Bergey

Strain Distribution in CollectionsUS Collections / BRCs Strains American Type Culture Collection (ATCC) 4027 USDA ARS Collection (NRRL) 223European Collections

Deutsche Sammlung vor Microoransmen (DSMZ) 1302Culture Collection University Gottenberg (CCUG) 183Pasteur Institute (CIP) 170Laboratory for Micrbiology, Gent (LMG) 101National Collection of Industrial and

Marine Bacteria 25French Collection of Phytopathogens (CFPB) 15National Collection of Type Cultures (NCTC) 12National Collection of Phytopathogenic

Bacteria 11Asia

Japan Collection of Microorganisms (JCM) 185Institute of Fermentation, Osaka (IFO) 34Korean Collection of Type Cultures (KCTC) 28Institute of Applied Microbiology, Tokyo (IAM) 26National Institute of Technology

And Evaluation (NBRC) 24All-Russian Collection of Microorganisms (VKM) 13

Estimated Sequencing RatesYear 2007 2008 2009 2010 2011 2012 2013 2014 Notes

Base Pairs per dollar 200 300 450 675 1,013 1,519 2,278 3,417 50% improvement per year

Bacterial Genome Cost in $ 20,000 13,333 8,889 5,926 3,951 2,634 1,756 1,171 ~4M bp per genome

Number Genomes for $5M 250 375 563 844 1,266 1,898 2,848 4,271Cumulative Genomes Sequenced 250 625 1,188 2,031 3,297 5,195 8,043 12,314

TargetSelection

TypeCulture Material

SequencingAssembly

RapidAnnotation(24 Hours)

MetabolicReconstruction

PhenotypeMicroarrays

Target Selection

http://www.sequencingbergeys.org

Microbial Idol

>2,000 different media

Physical Conditions: • Temperature (4° - 120°C) • pH (1.0 - 11.0)• Salt (0 - 30%)• Light (obligate phototrophs• Pressure (few obligate piezophiles)• Redox:

Strict anaerobes Facultative Microaerobes Aerobes

Culturing by ATCC

Phenotyping by Biolog

Carbon Pathways

Nitrogen Pathways

Sensitivity to Chemicals

Osmotic &Ion Effects

pHEffects

Biosynth.

Pathways

P

SN

Sequencing by JGI

FY 06: # InstrumentsSanger: 107454: 1

FY 07: # InstrumentsSanger: 107 454: 2

35.4 Gb

45 Gb goal

• Automated process consisting of:– Gene calling– Initial annotation of

function– Initial metabolic

reconstruction

• Process takes 1-7 hours depending on size and complexity of the genome

• ~20 genomes per day

Rapid Annotation Using Subsystems Technology

http://www.nmpdr.org/anno-server/index48.cgi

Evaluation / Viewing

Feedback

TargetSelection

Sequencing

AnnotationMetabolic

Reconstruction

Phenotyping

Status

• 100 organism pilot - GEBA underway

• Requesting funding/approval for remainder

• Target selection about to go live

PeopleJGI Jim Bristow Jonathan Eisen Phil Hugenholtz Nikos Kyrpides Paul Richardson David Bruce

MSU Jim Cole George GarrityU GA Barney WhitmanUIUC Gary Olsen

ATCC David Emmerson Tim LilburnBiolog Stacy Montgomery John Groat

ANL Rick Stevens Folker Meyer Ross Overbeek Veronika VonsteinHope Matt DeJongh

Technical Feasibility FAQ• How many genomes would the project propose to sequence?

– About 5000• Who would produce the biomass needed for DNA extraction?

– Type culture centers• Will the biomass/DNA be available for distribution?

– Yes, both the DNA and the libraries could be stored for distribution• What throughput is needed for DNA production?

– In the beginning of the project ~300 taxa per year to 2000 per yr at the end• What throughput is needed for sequencing?

– 1.2 Gb/yr to 8 Gb/yr finished sequence• What combinations of sequencing technologies need to be employed?

– Sanger and Pyrosequencing initially• What throughput is needed for annotation?

– 24 hour turnaround from assembled sequence to initial availability• Is is possible to have a standard set of phenotype assays given the broad

spectrum of organisms and conditions?– We are considering Biolog as a model, but it is too limited

• How would the genomes be selected and prioritized?– At each cycle we choose genomes (e.g. via 16s) to minimize the diversity gaps

• Is it necessary to “close” the genomes?– We think no.

top related