a fast hybrid short read fragment assembly algorithm

11
A Fast Hybrid Short Read Fragment Assembly Algorithm

Upload: ahmed-barker

Post on 31-Dec-2015

30 views

Category:

Documents


1 download

DESCRIPTION

A Fast Hybrid Short Read Fragment Assembly Algorithm. Introduction. Second-generation DNA technologies Traditional : Sanger shotgun techniques New techniques(2007 & 2008): SSAKE, UCAKE and SHARCGS --based on greedy extension Edena, Velvet, Euler-SR --based on graph. Taipan Method: Two steps. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Fast Hybrid Short Read Fragment Assembly Algorithm

A Fast Hybrid Short Read Fragment Assembly Algorithm

Page 2: A Fast Hybrid Short Read Fragment Assembly Algorithm

Introduction

• Second-generation DNA technologies

• Traditional : Sanger shotgun techniques

• New techniques(2007 & 2008):

• SSAKE, UCAKE and SHARCGS--based on greedy extension

• Edena, Velvet, Euler-SR--based on graph

Page 3: A Fast Hybrid Short Read Fragment Assembly Algorithm

Taipan Method: Two steps

• 1. Greedy Extension• iteratively extended by one base at a time both in

3’ direction and 5’ direction

• 2. Graph-based Method• to assembly the constructed contig from previous

step.

Page 4: A Fast Hybrid Short Read Fragment Assembly Algorithm

Example• Usage:taipan -f {inputfilename} -k {minimal_overlap} [-t {threshold}] [-o {seed_occ}] [-v

{verbose}] [-c {min_contig_length}]

• Result:

Page 5: A Fast Hybrid Short Read Fragment Assembly Algorithm

Optimal spliced alignments of short sequence reads

Fabio De Bona

Bioinfromatics, 2008

Page 6: A Fast Hybrid Short Read Fragment Assembly Algorithm

Genome VS Transcriptome

• Analysis sequence reads from genomic DNA

Sequence assemble

Align them to the genome• Transcriptome analysis

First align the single reads to the genome

Then merges the alignments to infer gene structures.

Page 7: A Fast Hybrid Short Read Fragment Assembly Algorithm

Genome VS Transcriptome

• Reconstruct the whole genome from cDNA data

• Reconstruct the transcriptome from EST data (transcripted cDNA)

DNA

Page 8: A Fast Hybrid Short Read Fragment Assembly Algorithm

Problem Formulation

Limitation:

1 read length of the NG is relatively small.

2 read error rate(assuming 5%)

DNA

Page 9: A Fast Hybrid Short Read Fragment Assembly Algorithm

General Description

Smith-Waterman

– Quality Score

– Slicing Site Info

– Intron Length

Page 10: A Fast Hybrid Short Read Fragment Assembly Algorithm

Method

1. Original

2. With Quality Score

3. With Slicing Info

4. With Intron

Page 11: A Fast Hybrid Short Read Fragment Assembly Algorithm

Test Data• 10 000 sequences with known alignments• three different scorings

1.quality information2.splice site predictions3. intron length