assembly of metagenomes

29
Assembly of metagenomes Lex Nederbragt Norwegian Sequencing Center & Centre for Ecological and Evolutionary Synthesis University of Oslo

Upload: lex-nederbragt

Post on 10-May-2015

2.608 views

Category:

Documents


2 download

DESCRIPTION

A talk for I gave for the 2011 metagenomics course at the Biological Dept. Univ. of Oslo April 2011

TRANSCRIPT

Page 1: Assembly of metagenomes

Assembly of metagenomes

Lex NederbragtNorwegian Sequencing Center &

Centre for Ecological and Evolutionary SynthesisUniversity of Oslo

Page 2: Assembly of metagenomes

What is assembly

• From reads to genome

Page 3: Assembly of metagenomes

Why assembly?

Wooley JC et al, PLoS Comput Biol. 2010 Feb 26;6(2):e1000667

Page 4: Assembly of metagenomes

How

Find overlap between reads

Page 5: Assembly of metagenomes

How

Build consensus sequence

Page 6: Assembly of metagenomes

Challenges

Collapsed contig

Shotgun reads

DNA

Shotgunreads

Contigs

Repetitive element

Page 7: Assembly of metagenomes

Results

Lots of pieces

Page 8: Assembly of metagenomes

Mate pairs

Page 9: Assembly of metagenomes

Assembly with mate pairs

Paired reads

Gaps

ScaffoldContigs

Page 10: Assembly of metagenomes

Mate pairs

Scaffold NNNNN NNNNN

Contig Contig Contig

Page 11: Assembly of metagenomes

Mate pairs?

150– 600 bases

454/Illumina

Illumina

Page 12: Assembly of metagenomes

Mate pairs!

Longer jumps:

Page 13: Assembly of metagenomes

Mate pairs

• Little used for metagenomics...

Page 14: Assembly of metagenomes

Why is assembly hard for metagenomes?

• Heterogeneous samples– many different genomes– overlap between genomes• e.g. 16S

• Non-species-specific contigs

http://rna.ucsc.edu/

Page 15: Assembly of metagenomes

When could it work

• One or a few dominating species– contigs might be species-specific

Page 16: Assembly of metagenomes

Specialized software

• Genovo

Page 17: Assembly of metagenomes

Specialized software

• Genovo– Uses a 'generative probabilistic model' of read

generation – Assembler discovers 'likely sequence

reconstructions under the model'

Page 18: Assembly of metagenomes

Use your favorite assembler

• Newbler (454)• Velvet• Euler• SOAPdenovo• ...• Tweak parameters

e.g. higher stringency for determining overlaps

Page 19: Assembly of metagenomes

Check contigs for

• Read depth• GC frequency• Tetranucleotide frequency

Page 20: Assembly of metagenomes

Example

Read depth

Page 21: Assembly of metagenomes

Challenges

Collapsed contig

Shotgun reads

DNA

Shotgunreads

Contigs

Repetitive element

Page 22: Assembly of metagenomes

Results

Lots of pieces

Higher read depth

DNA

Repetitive element

Page 23: Assembly of metagenomes

Example

One contig

Log scale!

Page 24: Assembly of metagenomes

Example

Page 25: Assembly of metagenomes

Example

Bacteroides

Proteobacteria

Cyanobacteria

Caulobacteraceae

Page 26: Assembly of metagenomes

Solution

• Split contigs on– read depth– GC%

• Use BLAST

Page 27: Assembly of metagenomes

Metagenomic ORFome Assembly

Ye Y, Tang H. 2009. J Bioinform Comput Biol 7: 455-471

Gene/protein-directed assembly

Page 28: Assembly of metagenomes

Iterative read mapping and assembly

Align reads to a single reference genome

'Update' the reference based on alignment

Align remaining reads again

Dutilh BE, Huynen MA, Strous M. 2009. Bioinformatics 25: 2878-2881.

Page 29: Assembly of metagenomes

Reverse metagenomics

• Leptospirillum group III never cultured• shotgun metagenomics

nitrogen fixation geneGC content and read depth Leptospirillum group

III• Culturable for the first time