genomes from metagenomes: recovery and analysis of...
TRANSCRIPT
![Page 1: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/1.jpg)
Genomes from metagenomes: recovery and analysis of
population genomes
Kate OrmerodAustralian Centre for Ecogenomics,
School of Chemistry & Molecular Biosciences
![Page 2: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/2.jpg)
Genomes from metagenomes
v
• Represent an consensus version of a particular species present within a dataset
2
Isolate sequencing
Metagenomic sequencing
![Page 3: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/3.jpg)
Why assemble?
• We can pull functional and phylogenetic information from a metagenomic sample without any assembly – so why assemble?– Expanding the tree of life reference genomes– Gives context to the functional data – how much
variation exists within a given family?– Functional assignment based on full length genes– Provides a basis for strain level comparisons
3
![Page 4: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/4.jpg)
Generating population genomes
• Experimental design– Multiple samples
• Different sites/biological entities
• Time series• Different sample treatment
• Initial de novo assembly– Combined samples
• Binning– Based on coverage in
individual samples and sequence composition vv v v
4
![Page 5: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/5.jpg)
Experimental design
• How diverse is the environment you are sampling? How rare are the genomes you seek? – Assess using:
• Test batch of shotgun samples• 16S rRNA profiling
• How easy is it to obtain samples?
• Is there likely to be host contamination?
• Depth can be achieved across multiple samples
5
![Page 6: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/6.jpg)
Assembly & binning
• Assembly– Software
• SPAdes• MetaVelvet• CLC Genomics Workbench
– Read QC• eg Trimmomatic
– Combined samples• All at once?• Subset?
– Gap filling• eg ABySS
vv
v
v v
vv
vv
v
vv
vv
vv
v
vv v
6
![Page 7: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/7.jpg)
Assembly & binning
• Binning– Software
• MetaBAT• CONCOCT• Canopy• GroopM
– Coverage/co-abundance• Coverage within a single
sample• Differential coverage
– Coverage across different samples
– Sequence composition• Tetranucleotide frequency
patterns– Beware small contigs vv v v
7
![Page 8: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/8.jpg)
Problems with population genomes
• Completeness– Insufficient coverage– Regions with different profile from the rest of the
genome eg duplications, lateral transfer, transposable elements
• Contamination– Misassembly – chimeric contigs – lateral transfer,
transposable elements
8
![Page 9: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/9.jpg)
Assessing genome quality
• CheckM– Measures completeness
and contamination– Lineage specific marker
gene assessment• Read mapping
– Detection of structural variation, low/high coverage, different insert sizes
• Reference genome coverage
9
![Page 10: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/10.jpg)
Improving population genomes
• Reassembly of individual genome bins– Extract reads mapping to
individual bins (BamM)
• Reference guided assembly
• Different binning tool and/or assembly tool
10
![Page 11: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/11.jpg)
metaQUAST
11
![Page 12: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/12.jpg)
Analysis of population genomes
• Major advantage = culture independent– Less biased view of the diversity of the chosen
environment
• Major challenge = culture independent– Therefore functional information may not be available– Describing new genus, family, phylum?
12
![Page 13: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/13.jpg)
Finding the story in your genomes
• Inferring core metabolism– Energy– Food
• Environment specific points of interest– Antibiotic resistance– Virulence factors
• Comparisons to phylogenetic and environmental neighbours
13
![Page 14: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/14.jpg)
Analysis of population genomes: an example
14Ormerod, K.L., D.L.A.Wood, N. Lachner, et al. "Genomic characterization of the uncultured Bacteroidales family S24-7 inhabiting the guts of homeothermic animals." Microbiome (2016) 4:36.
![Page 15: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/15.jpg)
Genome annotation
• Annotation– Prokka– NCBI
• Annotation + analysis– RAST– IMG– KBase
15
![Page 16: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/16.jpg)
16
![Page 17: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/17.jpg)
Functional categorisation
• Databases assign function to protein annotations using homology– CAZy: carbohydrate active enzymes– KEGG– COG– EggNOG
• Differentiation between: – Family members– Phylogenetic neighbours– Environmental neighbours
17
![Page 18: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/18.jpg)
CAZy: carbohydrate active enzymes
18
![Page 19: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/19.jpg)
KEGG & COG: intrafamily comparisons
19
![Page 20: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/20.jpg)
KEGG & COG: interfamily comparisons
20
![Page 21: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/21.jpg)
CAZy: niche companion comparison
21
![Page 22: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/22.jpg)
Database bias
22
![Page 23: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/23.jpg)
From annotations to pathways
23
![Page 24: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/24.jpg)
From annotations to pathways
• KEGG– 490 reference pathways
• BioCyc– MetaCyc
• 2,453 pathways• 2,063 organisms
– Pathway Tools• RAST
– ModelSEED• IMG• Kbase
24
![Page 25: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/25.jpg)
25
![Page 26: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/26.jpg)
Pathway Tools
26
![Page 27: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/27.jpg)
Pathway Tools
27
![Page 28: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/28.jpg)
Pathway Tools
28
![Page 29: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/29.jpg)
Pathway Tools
29
![Page 30: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/30.jpg)
Metabolic gap filling
• Are there missing enzymes within core pathways?– Are these truly missing or an assembly artefact?– Is there something missing suggestive of reliance on
other members of the community?
• How does your prediction compare to other, possibly cultured, species from similar environment?– Are there known culturing requirements for related
species?
30
![Page 31: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/31.jpg)
Metabolic overview
31
![Page 32: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/32.jpg)
Summary
32
• Test your environment before committing• Depth can be achieved across multiple
samples
• Test your environment before committing• Depth can be achieved across multiple
samples
Plan
• Combine all samples or subsets?• Combine all samples or subsets?
Assemble
• More samples means improved differential coverage binning
• Try multiple binning programs
• More samples means improved differential coverage binning
• Try multiple binning programs
Bin
• Databases may not capture everything -make use of multiple databases
• Databases may not capture everything -make use of multiple databases
Annotate
• Core metabolism can be inferred from reference pathways, phylogenetic neighbours and environmental neighbours
• Core metabolism can be inferred from reference pathways, phylogenetic neighbours and environmental neighbours
Infer
• Sangwan, N., F. Xia and J. A. Gilbert (2016). "Recovering complete and draft population genomes from metagenome datasets." Microbiome 4(1): 1-11.
• Turaev, D. and T. Rattei (2016). "High definition for systems biology of microbial communities: metagenomics gets genome-centric and strain-resolved." Current Opinion in Biotechnology 39: 174-181.
• Franzosa, E. A., T. Hsu, A. Sirota-Madi, et al. (2015). "Sequencing and beyond: integrating molecular 'omics' for microbial community profiling." Nature Reviews Microbiology 13(6): 360-372.
For the utility of different extraction methods:Albertsen, M., P. Hugenholtz, A. Skarshewski, et al. (2013). "Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes." Nature Biotechnology 31(6): 533-538.
Useful references
![Page 33: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population](https://reader034.vdocuments.us/reader034/viewer/2022042205/5ea7c47d9cb4b80104267766/html5/thumbnails/33.jpg)
Acknowledgements
ACEPhil HugenholtzGene TysonDavid WoodNancy LachnerJoshua DalyNicola AngelSerene Low
TRI, University of QueenslandMark Morrison
33
University of NewcastlePhil HansbroShaan Gellatly
IMB, University of QueenslandMatt Cooper
AIBN, University of QueenslandLars NielsenCristiana Dal’MolinRobin Palfreyman
QFABJeremy Parsons