www.bioalgorithms.infoan introduction to bioinformatics algorithms block alignment and the...
Post on 20-Dec-2015
219 views
TRANSCRIPT
![Page 1: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/1.jpg)
www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms
Block Alignment and the Four-Russians Speedup
Presenter: Yung-Hsing Peng
Date: 2004.12.03
![Page 2: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/2.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Outline
• Block Alignment• Four-Russians Speedup• Constructing LCS in Sub-quadratic
Time
![Page 3: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/3.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Is It Possible to Align Sequences in Subquadratic Time?• Dynamic Programming takes O(n2) for global
alignment• Can we do better?• Yes, use Four-Russians Speedup
![Page 4: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/4.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Blocks
partition
n n/t
n/t
t
tn
![Page 5: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/5.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Block Alignment: Examples
valid invalid
![Page 6: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/6.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Block Alignment Problem
• Goal: Find the longest block path through an edit graph
• Input: Two sequences, u and v partitioned into blocks of size t. This is equivalent to an n x n edit graph partitioned into t x t subgrids
• Output: The block alignment of u and v with the maximum score (longest block path through the edit graph
![Page 7: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/7.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Constructing Alignments within Blocks
n/t
Block pair represented by each small square
Solve mini-alignmnent problems
![Page 8: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/8.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Block Alignment: Dynamic Programming• Let si,j denote the optimal block alignment
score between the first i blocks of u and first j blocks of v
si,j = maxsi-1,j - block
si,j-1 - block
si-1,j-1 + i,,j
block is the penalty
for inserting or deleting an entire block
i,j is score of block
in row i, column j.
![Page 9: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/9.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Block Alignment Runtime (cont’d)
• Computing all i,j requires solving (n/t)*(n/t)
mini block alignments, each of size (t*t)
• So computing i,j takes O([n/t]*[n/t]*t*t) =
O(n2) time• This is the same as dynamic programming• How do we speed this up?
![Page 10: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/10.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Look-up Table for Four Russians Technique
Lookup table “Score”
AAAAAA
AAAAAC
AAAAAG
AAAAAT
AAAACA…
AAAAAA
AAAAAC
AAAAAG
AAAAAT
AAAACA
…
each sequence has t nucleotides
size is only n, instead of (n/t)*(n/t)
![Page 11: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/11.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
New Recurrency
• The new lookup table Score is indexed by a pair of t-nucleotide strings, so
si,j = maxsi-1,j - block
si,j-1 - block
si-1,j-1 + Score(ith block of v, jth block of u
![Page 12: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/12.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Four Russians Technique
• Let t = log(n), where t is block size, n is sequence size.
• Instead of having (n/t)*(n/t) mini-alignments, construct 4t x 4t mini-alignments for all pairs of strings of t nucleotides (huge size), and put in a lookup table.
• However, size of lookup table is not rreally that huge if t is small. Let t = (logn)/4. Then 4t x 4t
= n
![Page 13: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/13.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Four Russians Speedup Runtime• Each access takes O(logn) time, because we
have to transfer the little string into an index.
• Overall running time: O( [n2/t2]*logn )
• Since t = logn, substitute in:• O( [n2/{logn}2]*logn) > O( n2/logn )
![Page 14: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/14.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
So Far…
• We know how to speed up when we are solving block alignment
• In order to speed up the mini-alignment calculations to under n2, we create a lookup table of size n, which consists of all scores for all t-nucleotide pairs
• Running time goes from quadratic, O(n2), to subquadratic: O(n2/logn)
![Page 15: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/15.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Block Alignment vs. LCS
• Unlike the block partitioned graph, the LCS path does not have to pass through the vertices of the blocks.
block alignment longest common subsequence
![Page 16: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/16.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Points of Interest
block alignment has (n/t)*(n/t) = (n2/t2) points of interest
LCS alignment has O(n2/t) points of interest
![Page 17: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/17.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Traversing Blocks for LCS (cont’d)
• If we used this to compute the grid, it would take quadratic, O(n2) time, but we want to do better.
we know these scores
we can calculate these scores
t x t block
![Page 18: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/18.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Four Russians Speedup
• Build a lookup table for all possible values of the four variables: 1. subsequence of sequence u in this block (4t possibilities)
2. subsequence of sequence v in this block (4t possibilities)
3. all pairs of possible scores for the first row s*,j
4. all pairs of possible scores for the first column s*,j
• For each quadruple we store the value of the score for the last row and last column.
• This will be a huge table, but we can eliminate alignments scores that don’t make sense
![Page 19: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/19.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Efficient Encoding of Alignment Scores• Instead of recording numbers that correspond
to the index in the sequences u and v, we can use binary to encode the differences between the alignment scores
0 1 2 2 3 4
1 1 0 1 1
original encoding
binary encoding
![Page 20: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/20.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example---- C T T C G A T G A
---- 0 0 0 0 0 0 0 0 0 0
T 0 0 1 1 1 1 1 1 1 1
T 0 0 1 2 2 2 2 2 2 2
A 0 0 1 2 2 2 3 3 3 3
C 0 1 1 2 3 3 3 3 3 3
G 0 1 1 2 3 4 4 4 4 4
T 0 1 2 2 3 4 4 5 5 5
G 0 1 2 2 3 4 4 5 6 6
C 0 1 2 2 3 4 4 5 6 6
A 0 1 2 2 3 4 5 5 6 7
![Page 21: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/21.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
An Example---- C T T C G A T G A
----
T 0/0 1 0 0 1/0 0 0 0 1/0
T 0 1 1
A 0 0 1
C 1 1 0
G 0/1 0 1 1 1/1 0 0 0 1/0
T 0 0 1
G 0 0 1
C 0 0 0
A 0/1 1 0 1 0/1 1 0 1 1/1
![Page 22: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/22.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Reducing Lookup Table Size
• 2t possible scores (t = # blocks)• 4t possible strings
• Lookup table size is (2t * 2t)*(4t * 4t) = 26t
• Let t = (logn)/4;• Table size is: 26((logn)/4) = n(6/4) = n(3/2)
• Time = O( [n2/t2]*logn )• O( [n2/{logn}2]*logn) > O( n2/logn )
![Page 23: Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d485503460f94a23733/html5/thumbnails/23.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Summary
• We take advantage of the fact that for each block of t = log(n), we can pre-compute all possible scores and store them in a lookup table of size n(3/2)
• We used the Four Russian speedup to go from a quadratic running time for LCS to subquadratic running time: O(n2/logn)