cell processor based sequence alignment
TRANSCRIPT
Prof. Vassil Alexandrov
Janko Straßburg
University of Reading, Aristotle University, University Carlos III
European Commission
Sequence Alignment on the Playstation 3◦ Six SPEs
◦ Smith-Waterman Algorithm only
Accelerating Multiple Sequence Alignment with the Cell BE Processor◦ Designed to accelerate a particular sequence
alignment application
Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine◦ Smith-Waterman Algorithm
HBA_HUMA
N
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHA
HKL
G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL
HBB_HUMA
N
GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHC
DKL
Bioinformatics
Used for aligning sequences of DNA nucleotides or amino acids (proteins)
Great amount of data, requiring lots of computational power
Matrix◦ Scoring matrix
◦ Traceback matrix
H
H
HE
E
E
EEA
A
A
AG GW
W
P
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
Needleman-Wunsch
Scoring matrix
Each cell’s value is based on its upper, left, and upper-left neighbour
Main issue – data dependencies
Two possible approaches◦ Use the existing code and modify it
◦ Develop the code from scratch
Chose the latter one
One SPE – one row
Each SPE one cell behind the previous one
Not efficient◦ DMA overhead
SPE1
SPE2
SPE3
SPE4
SPE5
SPE6
SPE7
Tiles grouped into blocks
Each block is 16 tiles high or more
Algorithm first covers one block, then moves to the next one
oneblock
oneanti-
-diagonal
n cells
m cells
Wavefront algorithm also applied on the tile level
One antidiagonal –one or more vectors
P A
A
G
H
E
W H
0
0
0-9
-9-19 0
-10 -20 -29
-20
-20
-30
-10 -10
-10
Always try to transfer as much as possible
Maximum transfer allowed – 16 KB
Integer size – 4 B
If tile size is 64, the transfer size is
64 X 64 X 2 = 8192 X 4 = 32768 B
Solution – short integers
New transfer size - 64 X 64 X 2 = 8192 X 2 =
= 16 384 B = 16 KB
It is possible to efficiently employ Cell Broadband Engine for Sequence alignment
Further optimisation needed◦ Reduction of context creations
◦ Inter-SPE communication
◦ Implementing sequence alignment across multiple pairs of sequences
◦ Using ALF – Accelerated Library Framework