assembly of metagenomes

Post on 10-May-2015

2.608 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A talk for I gave for the 2011 metagenomics course at the Biological Dept. Univ. of Oslo April 2011

TRANSCRIPT

Assembly of metagenomes

Lex NederbragtNorwegian Sequencing Center &

Centre for Ecological and Evolutionary SynthesisUniversity of Oslo

What is assembly

• From reads to genome

Why assembly?

Wooley JC et al, PLoS Comput Biol. 2010 Feb 26;6(2):e1000667

How

Find overlap between reads

How

Build consensus sequence

Challenges

Collapsed contig

Shotgun reads

DNA

Shotgunreads

Contigs

Repetitive element

Results

Lots of pieces

Mate pairs

Assembly with mate pairs

Paired reads

Gaps

ScaffoldContigs

Mate pairs

Scaffold NNNNN NNNNN

Contig Contig Contig

Mate pairs?

150– 600 bases

454/Illumina

Illumina

Mate pairs!

Longer jumps:

Mate pairs

• Little used for metagenomics...

Why is assembly hard for metagenomes?

• Heterogeneous samples– many different genomes– overlap between genomes• e.g. 16S

• Non-species-specific contigs

http://rna.ucsc.edu/

When could it work

• One or a few dominating species– contigs might be species-specific

Specialized software

• Genovo

Specialized software

• Genovo– Uses a 'generative probabilistic model' of read

generation – Assembler discovers 'likely sequence

reconstructions under the model'

Use your favorite assembler

• Newbler (454)• Velvet• Euler• SOAPdenovo• ...• Tweak parameters

e.g. higher stringency for determining overlaps

Check contigs for

• Read depth• GC frequency• Tetranucleotide frequency

Example

Read depth

Challenges

Collapsed contig

Shotgun reads

DNA

Shotgunreads

Contigs

Repetitive element

Results

Lots of pieces

Higher read depth

DNA

Repetitive element

Example

One contig

Log scale!

Example

Example

Bacteroides

Proteobacteria

Cyanobacteria

Caulobacteraceae

Solution

• Split contigs on– read depth– GC%

• Use BLAST

Metagenomic ORFome Assembly

Ye Y, Tang H. 2009. J Bioinform Comput Biol 7: 455-471

Gene/protein-directed assembly

Iterative read mapping and assembly

Align reads to a single reference genome

'Update' the reference based on alignment

Align remaining reads again

Dutilh BE, Huynen MA, Strous M. 2009. Bioinformatics 25: 2878-2881.

Reverse metagenomics

• Leptospirillum group III never cultured• shotgun metagenomics

nitrogen fixation geneGC content and read depth Leptospirillum group

III• Culturable for the first time

top related