![Page 1: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/1.jpg)
Space/Time Tradeoffand
Heuristic Approachesin
Pairwise Alignment
![Page 2: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/2.jpg)
Given two sequences of length ~1,000 requires a table of size ~1,000,000 cells
Can we use less space if only wanted the alignment score
Hint: The construction was carried out one row at a time
Alignment and Resources
- G A T T A C A- 0 -2 -4 -6 -8 -10 -12 14C -2 -1 -3 -5 -7 -9 -8 -10A -4 -3 1 -1 -3 -5 -7 -6C -6 -5 -1 0 -2 -4 -3 -5T -8 -7 -3 1 2 0 -2 -4A -10 -9 -5 -1 0 4 2 0G -12 -8 -7 -3 -2 2 3 1
![Page 3: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/3.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
- 0 -2 -4 -6 -8 -10 -12 14
C -2 -1 -3 -5 -7 -9 -8 -10
A -4 -3 1 -1 -3 -5 -7 -6
C -6 -5 -1 0 -2 -4 -3 -5
T -8 -7 -3 1 2 0 -2 -4
A -10 -9 -5 -1 0 4 2 0
G -12 -8 -7 -3 -2 2 3 1
![Page 4: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/4.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
- 0 -2 -4 -6 -8 -10 -12 14
C
A
C
T
A
G
![Page 5: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/5.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
- 0 -2 -4 -6 -8 -10 -12 14
C -2 -1 -3 -5 -7 -9 -8 -10
A
C
T
A
G
![Page 6: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/6.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C -2 -1 -3 -5 -7 -9 -8 -10
A
C
T
A
G
![Page 7: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/7.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C -2 -1 -3 -5 -7 -9 -8 -10
A -4 -3 1 -1 -3 -5 -7 -6
C
T
A
G
![Page 8: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/8.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C
A -4 -3 1 -1 -3 -5 -7 -6
C
T
A
G
![Page 9: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/9.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C
A -4 -3 1 -1 -3 -5 -7 -6
C -6 -5 -1 0 -2 -4 -3 -5
T
A
G
![Page 10: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/10.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C
A
C -6 -5 -1 0 -2 -4 -3 -5
T
A
G
![Page 11: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/11.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C
A
C -6 -5 -1 0 -2 -4 -3 -5
T -8 -7 -3 1 2 0 -2 -4
A
G
![Page 12: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/12.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C
A
C
T -8 -7 -3 1 2 0 -2 -4
A
G
![Page 13: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/13.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C
A
C
T -8 -7 -3 1 2 0 -2 -4
A -10 -9 -5 -1 0 4 2 0
G
![Page 14: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/14.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C
A
C
T
A -10 -9 -5 -1 0 4 2 0
G
![Page 15: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/15.jpg)
If only alignment score is needed the alignment can be computed by using a matrix of only two rows
Alignment and Resources
- G A T T A C A
-
C
A
C
T
A -10 -9 -5 -1 0 4 2 0
G -12 -8 -7 -3 -2 2 3 1
![Page 16: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/16.jpg)
If the sequences have size m and n need 2*min(m, n) cells to compute alignment score (could have slid “window” vertically)
Alignment and Resources
- G A T T A C A
-
C
A
C
T
A -10 -9 -5 -1 0 4 2 0
G -12 -8 -7 -3 -2 2 3 1
![Page 17: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/17.jpg)
If the sequences have size m and n need 2*min(m, n) cells to compute alignment score (could have slid “window” vertically)
Cannot recover the alignment -- trace-back arrows not stored
Possible to design an algorithm that uses m+n cells but still allows to recover the alignment
D. S. Hirschberg. Algorithms for the longest common subsequence problem. J.ACM, 24:664-675, 1977.
Alignment and Resources
![Page 18: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/18.jpg)
Given two sequences each of length ~1,000 original algorithm required to store ~1,000,000 = 1,000*1,000 cells modified version requires 2,000 = 2*min(1000, 1000)
If the value of a cell could be computed in 1μs how much time is required by each algorithm
The algorithms are impractical if you need to search througha database of hundreds of thousands of sequences
Heuristic approaches (BLAST, FASTA) have been developed to cope with this problem
May not find overall best alignment, but do well in practice
Alignment and Resources
![Page 19: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/19.jpg)
Basic Local Alignment Search Tool – computes local alignments and performs very well in practice
Altschul, Gish, Miller, Myers, Lipman, Basic Local Alignment Search Tool. Journal of Molecular Biology, 215(3), 403-410.
BLAST
QUERY sequence(s)
BLAST database
BLAST program
BLAST results
![Page 20: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/20.jpg)
Main Idea: Identify short stretches of high scoring local alignments between query and target sequence and extend
“The central idea of the BLAST algorithm is to confine attention to segment pairs that contain a word pair of length w with a score of at least T.”
Altschul et al. (1990)
The procedure: use sliding window to extract all words of size w from query sequence for each word build a “hit list” of words with pairwise score at least T scan database for sequences that have words from “hit list” extend each hit until score drops below some cutoff
BLAST
![Page 21: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/21.jpg)
Example with w=3, T=11, query= …FSGTWYA… use sliding window to extract all words of size w from query sequence
… FSG, SGT, GTW, WAY, …
for each word build a “hit list” of words with pairwise score at least T
GTW GTW 6,5,11 = 22ASW 0,1,11 = 12
QTW -2,5,11 = 14
scan database for sequences that have words from “hit list” extend each hit until score drops below some cutoff
ENFDKARFSGTWYAMAKKDQNFDKTRYAGTWYAVAKKD
BLAST
Adapted from JHMI 140.638.01
![Page 23: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/23.jpg)
Runs dynamic programming on a restricted part of the tableLipman, Pearson. Rapid and sensitive protein similarity searches. Science. 227 (4693): 1435-41.
Procedure identify all matches of size k between the sequences (dot plot like) --
these matches will form diagonals in the matrix
keep only the top scoring matches (using PAMn, BLOSUMn) – the score for these matches is called init1
attempt to join any of the top scoring regions if they could form longer alignment – the score for these alignments is called initn
apply full dynamic programming on a narrow band around the high scoring diagonal – the score for the final alignment is called opt
FASTA
![Page 24: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/24.jpg)
“Protein Structure prediction – a practical approach”
![Page 26: Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment](https://reader035.vdocuments.us/reader035/viewer/2022062812/568163a3550346895dd4a7b7/html5/thumbnails/26.jpg)
Python Programming be able to write python functions be able to predict the output of a function
Chapter 4 4.1: principles of sequence alignment 4.2: scoring alignments, dot plots 4.3: substitution matrices (high-level difference PAM vs BLOSUM) 4.4: handling gaps 4.5: types of alignment (pairwise only) 4.6: searching databases (BLAST, FASTA)
Chapter 5 5.1: substitution matrices (know how BLOSUM works, up to p.124) 5.2: dynamic programming algorithms (skip pp.134, 135) 8.1: Jukes-Cantor, Kimura models (pp.271-273)
Exam Topics