![Page 1: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/1.jpg)
www.bioalgorithms.info An Introduction to Bioinformatics Algorithms
Dynamic Programming: Edit Distance
![Page 2: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/2.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
1. DNA Sequence Comparison and CF 2. Change Problem 3. Manhattan Tourist Problem 4. Longest Paths in Graphs 5. Sequence Alignment 6. Edit Distance
Outline�
![Page 3: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/3.jpg)
www.bioalgorithms.info An Introduction to Bioinformatics Algorithms
Section 5: Sequence Alignment
![Page 4: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/4.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
���#�,'��"'$' 0����)-�&����$" &%�&,�
• Recall that our original problem was to fit a similarity score on two DNA sequences:
• To do this we will use what is called an alignment matrix.
![Page 5: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/5.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
�$" &%�&,���,*"/���/�%($��
• Example of Alignment : 2 * k matrix ( k > m, n )
A T -- G T A T --
A T C G -- A -- C
letters of v
letters of w
T
T
4 matches 2 insertions 2 deletions
• Given 2 DNA sequences v and w of length m and n:
v: ATCTGAT m=7 w: TGCATA n=6
![Page 6: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/6.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Given two sequences v = v1 v2…vm and w = w1 w2…wn
a common subsequence of v and w is a sequence of positions in v: 1 < i1 < i2 < … < it < m and a sequence of positions in w: 1 < j1 < j2 < … < jt < n such that the it -th letter of v
is equal to the jt-th letter of w.
• Example: v = ATGCCAT, w = TCGGGCTATC. Then take: • i1 = 2, i2 = 3, i3 = 6, i4 = 7 • j1 = 1, j2 = 3, j3 = 8, j4 = 9 • This gives us that the common subsequence is TGAT.
'%%'&��-�+�)-�&���
![Page 7: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/7.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Given two sequences v = v1 v2…vm and w = w1 w2…wn
the Longest Common Subsequence (LCS) of v and w is a sequence of positions in v: 1 < i1 < i2 < … < iT < m and a sequence of positions in w: 1 < j1 < j2 < … < jT < n such that the it -th letter of v is equal to jt-th letter of w and T is maximal.
• Example: v = ATGCCAT, w = TCGGGCTATC. • Before we found that TGAT is a subsequence. • The longest subsequence is TGCAT. • But…how do we find the LCS of two sequences?
�'& �+,�'%%'&��-�+�)-�&���
![Page 8: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/8.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
T
G
C
A
T
A
C
1
2
3
4
5
6
7
0 i
A T C T G A T C0 1 2 3 4 5 6 7 8
j • Assign one sequence to the rows, and one to the columns.
��",� *�(!��'*�����*'�$�%�
![Page 9: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/9.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
T
G
C
A
T
A
C
1
2
3
4
5
6
7
0 i
A T C T G A T C0 1 2 3 4 5 6 7 8
j • Assign one sequence to the rows, and one to the columns.
• Every diagonal edge represents a match of elements.
��",� *�(!��'*�����*'�$�%�
![Page 10: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/10.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
T
G
C
A
T
A
C
1
2
3
4
5
6
7
0 i
A T C T G A T C0 1 2 3 4 5 6 7 8
j • Assign one sequence to the rows, and one to the columns.
• Every diagonal edge represents a match of elements.
• Therefore, in a path from source to sink, the diagonal edges represent a common subsequence. Common Subsequence: TGAT
��",� *�(!��'*�����*'�$�%�
![Page 11: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/11.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
T
G
C
A
T
A
C
1
2
3
4
5
6
7
0 i
A T C T G A T C0 1 2 3 4 5 6 7 8
j • LCS Problem: Find a path with the maximum number of diagonal edges.
Common Subsequence: TGAT
��",� *�(!��'*�����*'�$�%�
![Page 12: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/12.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Let vi = prefix of v of length i: v1 … vi
• and wj = prefix of w of length j: w1 … wj
• The length of LCS(vi,wj) is computed by:
'%(-,"& �,!������0&�%"���*' *�%%"& �
�
si, j =maxsi−1, jsi, j−1si−1, j−1 +1 if vi = w j
⎧
⎨ ⎪
⎩ ⎪
![Page 13: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/13.jpg)
www.bioalgorithms.info An Introduction to Bioinformatics Algorithms
Section 6: Edit Distance
![Page 14: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/14.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• The Hamming Distance dH(v, w) between two DNA sequences v and w of the same length is equal to the number of places in which the two sequences differ.
• Example: Given as follows, dH(v, w) = 8:
• However, note that these sequences are still very similar. • Hamming Distance is therefore not an ideal similarity score,
because it ignores insertions and deletions.
��%%"& �"+,�&���
v: ATATATAT w: TATATATA
![Page 15: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/15.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Levenshtein (1966) introduced the edit distance between two strings as the minimum number of elementary operations (insertions, deletions, and substitutions) needed to transform one string into the other
d(v,w) = MIN number of elementary operations to transform v w
��",�"+,�&���
![Page 16: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/16.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• So in our previous example, we can shift w one nucleotide to the right, and see that w is obtained from v by one insertion and one deletion:
• Hence the edit distance, d(v, w) = 2.
• Note: In order to provide this distance, we had to “fiddle” with the sequences. Hamming distance was easier to find.
��",�"+,�&�����/�%($����
v: ATATATAT- w: -TATATATA
![Page 17: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/17.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• We can transform TGCATAT ATCCGAT in 5 steps:
TGCATAT
��",�"+,�&�����/�%($����
![Page 18: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/18.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• We can transform TGCATAT ATCCGAT in 5 steps:
TGCATAT (delete last T)
��",�"+,�&�����/�%($����
![Page 19: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/19.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• We can transform TGCATAT ATCCGAT in 5 steps:
TGCATAT (delete last T) TGCATA (delete last A)
��",�"+,�&�����/�%($����
![Page 20: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/20.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• We can transform TGCATAT ATCCGAT in 5 steps:
TGCATAT (delete last T) TGCATA (delete last A) ATGCAT (insert A at front)
��",�"+,�&�����/�%($����
![Page 21: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/21.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• We can transform TGCATAT ATCCGAT in 5 steps:
TGCATAT (delete last T) TGCATA (delete last A) ATGCAT (insert A at front) ATCCAT (substitute C for G)
��",�"+,�&�����/�%($����
![Page 22: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/22.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• We can transform TGCATAT ATCCGAT in 5 steps:
TGCATAT (delete last T) TGCATA (delete last A) ATGCAT (insert A at front) ATCCAT (substitute C for G) ATCCGAT (insert G before last A)
��",�"+,�&�����/�%($����
![Page 23: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/23.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• We can transform TGCATAT ATCCGAT in 5 steps:
TGCATAT (delete last T) TGCATA (delete last A) ATGCAT (insert A at front) ATCCAT (substitute C for G) ATCCGAT (insert G before last A)
• Note: This only allows us to conclude that the edit distance is at most 5.
��",�"+,�&�����/�%($����
![Page 24: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/24.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Now we transform TGCATAT ATCCGAT in 4 steps:
ATGCATAT
��",�"+,�&�����/�%($����
![Page 25: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/25.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Now we transform TGCATAT ATCCGAT in 4 steps:
ATGCATAT (insert A at front)
��",�"+,�&�����/�%($����
![Page 26: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/26.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Now we transform TGCATAT ATCCGAT in 4 steps:
ATGCATAT (insert A at front) ATGCATAT (delete second T)
��",�"+,�&�����/�%($����
![Page 27: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/27.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Now we transform TGCATAT ATCCGAT in 4 steps:
ATGCATAT (insert A at front) ATGCATAT (delete second T) ATGCGAT (substitute G for A)
��",�"+,�&�����/�%($����
![Page 28: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/28.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Now we transform TGCATAT ATCCGAT in 4 steps:
ATGCATAT (insert A at front) ATGCATAT (delete second T) ATGCGAT (substitute G for A) ATCCGAT (substitute C for G)
��",�"+,�&�����/�%($����
![Page 29: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/29.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Now we transform TGCATAT ATCCGAT in 4 steps:
ATGCATAT (insert A at front) ATGCATAT (delete second T) ATGCGAT (substitute G for A) ATCCGAT (substitute C for G)
• Can we do even better? 3 steps? 2 steps? How can we know?
��",�"+,�&�����/�%($����
![Page 30: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/30.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Theorem: Given two sequences v and w of length m and n, the edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w) is the length of the longest common subsequence of v and w.
• This is great news, because it means that if solving the LCS problem for v and w is equivalent to finding the edit distance between them.
• Therefore, we will solve the LCS problem instead in the following slides.
��0���+-$,�
![Page 31: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/31.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Every alignment corresponds to a path from source to sink.
��,-*&�,'�,!����",� *�(!�
![Page 32: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/32.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Every alignment corresponds to a path from source to sink.
• Horizontal and vertical edges correspond to indels (deletions and insertions).
��,-*&�,'�,!����",� *�(!�
![Page 33: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/33.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Every alignment corresponds to a path from source to sink.
• Horizontal and vertical edges correspond to indels (deletions and insertions).
• Diagonal edges correspond to matches and mismatches.
��,-*&�,'�,!����",� *�(!�
![Page 34: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/34.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Suppose that our sequences are ATCGTAC, ATGTTAT.
• One possible alignment is:
• Edit graph path: (0,0)(1,1)(2,2)
(2,3)(3,4)etc.
Alignment as a Path in the Edit Graph: Example
![Page 35: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/35.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Initialize 0th row and 0th
column to be all zeroes.
Alignment with Dynamic Programming
![Page 36: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/36.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
Si-1, j-1 + 1
Si, j = max Si-1, j
Si, j-1
• Initialize 0th row and 0th
column to be all zeroes.
• Use the following recursive formula to calculate Si,j for each i, j:
Alignment with Dynamic Programming
![Page 37: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/37.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Note that the placement of the arrows shows where a given score originated from: • if from the top • if from the left • if vi = wj
0&�%"���*' *�%%"& ��/�%($��
![Page 38: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/38.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
Dynamic Programming Example
![Page 39: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/39.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
• We first look for matches, highlighted in red, and then we fill in the rest of that row and column.
Dynamic Programming Example
![Page 40: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/40.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
• We first look for matches, highlighted in red, and then we fill in the rest of that row and column.
Dynamic Programming Example
![Page 41: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/41.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
• We first look for matches, highlighted in red, and then we fill in the rest of that row and column.
• As we can see, we do not simply add 1 each time.
Dynamic Programming Example
![Page 42: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/42.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
• We first look for matches, highlighted in red, and then we fill in the rest of that row and column.
• As we can see, we do not simply add 1 each time.
Dynamic Programming Example
![Page 43: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/43.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
• We first look for matches, highlighted in red, and then we fill in the rest of that row and column.
• As we can see, we do not simply add 1 each time.
Dynamic Programming Example
![Page 44: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/44.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
• We first look for matches, highlighted in red, and then we fill in the rest of that row and column.
• As we can see, we do not simply add 1 each time.
Dynamic Programming Example
![Page 45: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/45.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
• We first look for matches, highlighted in red, and then we fill in the rest of that row and column.
• As we can see, we do not simply add 1 each time.
Dynamic Programming Example
![Page 46: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/46.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• Continuing with the dynamic programming algorithm fills the table.
• We first look for matches, highlighted in red, and then we fill in the rest of that row and column.
• As we can see, we do not simply add 1 each time.
Dynamic Programming Example
![Page 47: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/47.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
1. LCS(v,w) 2. for i 1 to n
3. si,0 0
4. for j 1 to m
5. s0,j 0
6. for i 1 to n
7. for j 1 to m
8. si-1,j 9. si,j max si,j-1
10. si-1,j-1 + 1, if vi = wj 11. “ “ if si,j = si-1,j • bi,j “ “ if si,j = si,j-1 • “ “ if si,j = si-1,j-1 + 1
• return (sn,m, b)
0&�%"���$" &%�&,���+�-�'�'���
![Page 48: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/48.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 49: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/49.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 50: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/50.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 51: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/51.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 52: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/52.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 53: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/53.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 54: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/54.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 55: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/55.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 56: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/56.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 57: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/57.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 58: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/58.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• LCS(v,w) created the alignment grid.
• Follow the arrows backwards from the sink to the source to obtain the path corresponding to an optimal alignment:
�"&�"& ��&��(,"%�$��$" &%�&,�
![Page 59: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/59.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
1. PrintLCS(b,v,i,j) 2. if i = 0 or j = 0 3. return 4. if bi,j = “ “ 5. PrintLCS(b,v,i-1,j-1) 6. print vi 7. else 8. if bi,j = “ “ 9. PrintLCS(b,v,i-1,j) 10. else 11. PrintLCS(b,v,i,j-1)
�*"&,"& ��������#,*��#"& �
![Page 60: Dynamic Programming: Edit Distancecompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch06... · 2016. 8. 25. · edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w)](https://reader036.vdocuments.us/reader036/viewer/2022081616/5fe9f266f336e560e22cea99/html5/thumbnails/60.jpg)
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
• It takes O(nm) time to fill in the nxm dynamic programming matrix: the pseudocode consists of a nested “for” loop inside of another “for” loop.
• This is a very positive result for a problem that appeared much more difficult initially.
�����-&,"%��