cell processor based sequence alignment

19
Kurt2

Upload: guestbe9138

Post on 18-Jul-2015

225 views

Category:

Education


2 download

TRANSCRIPT

Kurt2

Prof. Vassil Alexandrov

Janko Straßburg

University of Reading, Aristotle University, University Carlos III

European Commission

Sequence Alignment on the Playstation 3◦ Six SPEs

◦ Smith-Waterman Algorithm only

Accelerating Multiple Sequence Alignment with the Cell BE Processor◦ Designed to accelerate a particular sequence

alignment application

Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine◦ Smith-Waterman Algorithm

Cell BroadBand Engine

SIMD approach◦ Working with vectors

Parallelisation◦ Using multiple SPEs

HBA_HUMA

N

GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHA

HKL

G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL

HBB_HUMA

N

GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHC

DKL

Bioinformatics

Used for aligning sequences of DNA nucleotides or amino acids (proteins)

Great amount of data, requiring lots of computational power

Matrix◦ Scoring matrix

◦ Traceback matrix

H

H

HE

E

E

EEA

A

A

AG GW

W

P

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

Needleman-Wunsch

Scoring matrix

Each cell’s value is based on its upper, left, and upper-left neighbour

Main issue – data dependencies

Two possible approaches◦ Use the existing code and modify it

◦ Develop the code from scratch

Chose the latter one

One SPE – one row

Each SPE one cell behind the previous one

Not efficient◦ DMA overhead

SPE1

SPE2

SPE3

SPE4

SPE5

SPE6

SPE7

Grouping cells into tiles

Tile size 8 X 8 up to 64 x 64

T1

T2

T3

T3

T3T2

Tiles grouped into blocks

Each block is 16 tiles high or more

Algorithm first covers one block, then moves to the next one

oneblock

oneanti-

-diagonal

n cells

m cells

Wavefront algorithm also applied on the tile level

One antidiagonal –one or more vectors

P A

A

G

H

E

W H

0

0

0-9

-9-19 0

-10 -20 -29

-20

-20

-30

-10 -10

-10

Always try to transfer as much as possible

Maximum transfer allowed – 16 KB

Integer size – 4 B

If tile size is 64, the transfer size is

64 X 64 X 2 = 8192 X 4 = 32768 B

Solution – short integers

New transfer size - 64 X 64 X 2 = 8192 X 2 =

= 16 384 B = 16 KB

time

DMA input DMA input

Compute Compute

Each SPE –two tiles

It is possible to efficiently employ Cell Broadband Engine for Sequence alignment

Further optimisation needed◦ Reduction of context creations

◦ Inter-SPE communication

◦ Implementing sequence alignment across multiple pairs of sequences

◦ Using ALF – Accelerated Library Framework