a hardware accelerator for the fast retrieval of dialign biological sequence alignments in linear...

16
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba C. M. A. Melo, and Ricardo P. Jacobi Publisher: IEEE TRANSACTIONS ON COMPUTERS 2010 Presenter: Chin-Chung Pan Date: 2011/04/20

Upload: augusta-dixon

Post on 18-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space

Author:

Azzedine Boukerche, Jan M. Correa, Alba C. M. A. Melo, and Ricardo P. Jacobi

Publisher:IEEE TRANSACTIONS ON COMPUTERS 2010

Presenter: Chin-Chung Pan

Date: 2011/04/20

Page 2: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

Outline

2

Introduction The DIALIGN Algorithm Related Work Design of the FPGA-Based Architectures The DIALIGN-Score Architecture Executing DIALIGN in Linear Space The DIALIGN-Alignment Architecture

Experimental Results

Page 3: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

Introduction SW is the most widely used exact method to locally align two sequences,

and it is very accurate if the sequences have a single common region of high similarity. However, if the sequences share more than one region of high similarity, SW is not very effective.

DIALIGN can be used for either local or global alignment as well as pairwise or multiple sequence alignment. One drawback of DIALIGN is that it is slower than SW. To overcome this, alternatives have been proposed to run DIALIGN in parallel and to combine it with a fast local search similarity tool.

We propose and evaluate two FPGA-based accelerators executing DIALIGN in linear space: one to obtain the optimal DIALIGN score and one to retrieve the DIALIGN alignment.

3

Page 4: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

The DIALIGN Algorithm

4

Page 5: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

The DIALIGN Algorithm DIALIGN (DIAgonal ALIGNment) is a method for sequence alignment

that searches for fragments (or diagonals) that have no gaps and aligns them.

For each DIALIGN pairwise alignment, it is necessary to calculate the relevance of each diagonal found before attempting to align it. This is done through the equation E(l, m) = -ln(P(l, m)), where P(l, m) is the probability of a diagonal D of size l have at least m matches.

Weighting the Significance of Diagonals.

5

One may assume p = 0.25 for nucleic acid sequences and p = 0.05 for proteins.

Page 6: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

The DIALIGN Algorithm For every pair of positions (i,j) with 1 ≦ i L≦ 1 and 1 ≦ j L≦ 2, all integers k

0 with ≧ k min(≦ i-1, j-1) for which the diagonal (Xi-k, Yj-k; . . . ; Xi,Yj) from (i - k, j - k) to (i,j) has a positive weight.

Next, for every pair (i,j) as above, one defines a value ‘‘score(i,j),’’ which is the score of a maximum alignment of the prefixes (X1, . . . , Xi) and (Y1, . . . , Yj).

6

Page 7: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

The DIALIGN Algorithm The last fragment Dk which is aligned in position (i, j) is recovered by the

function prec(i, j) = Dk. For each fragment Dk aligned in position (i, j), prec(i, j) chooses the chain of fragments with the greatest score to date.

7

Page 8: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

The DIALIGN Algorithm X = C T G, Y = C G.

8

C G

C {C, C} {C, G}

T {T, C} {T, G} {CT, CG}

G {G, C} {G, G} {TG, CG}

C G

C 1.386 0.288

T 0.288 0.288 0.827

G 0.288 1.386 0.827

Possible diagonals for every position

Diagonal weights of each position

C G

C 1.386 1.386

T 1.386 1.674 0.827

G 1.386 2.772 0.827

Scores at each position

Result : CTG C─G

25.0)75.0()25.0(1

1)1,1( 01

P

386.1)1,1(ln)1,1()( 0 PEDw4375.0)75.0()25.0(

2

2)75.0()25.0(

1

2)1,2( 0211

P

827.0)1,2(ln)1,2()( 1 PEDw

Page 9: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

Related Work

9

Page 10: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

Design of the FPGA-Based Architectures In the case of DIALIGN, the recurrence relations are more

complex and involve a set of conditional statements. For this reason, the time needed for each PE to complete its operations can greatly vary.

We propose the use of wavefront array processors instead of systolic arrays, for our FPGA-based architectures that execute DIALIGN.

We claim that wavefront array processors are better suited to deal with our problem since communication between processing elements is asynchronous, occurring exactly when output data are available.

10

Page 11: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

The DIALIGN-Score Architecture

12

Page 12: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

Executing DIALIGN in Linear Space It’s only stores the maximum score, the row where it occurs and its, and the

ending position of the preceding fragment of the fragment that has the highest score in column j is stored.

The area that comprises rows 1 to 15 and columns 1 to 17 needs to be reprocessed.

13

Page 13: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

The DIALIGN-Alignment Architecture

14

Page 14: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

Experimental Results - Dataset

15

Page 15: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

Experimental Results - Results for DIALIGN-Score

16

Page 16: A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba

Experimental Results - Results for DIALIGN-Alignment

17