dynamic programming: edit distance. aligning sequences without insertions and deletions: hamming...
TRANSCRIPT
![Page 1: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/1.jpg)
Dynamic Programming:Edit Distance
![Page 2: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/2.jpg)
Aligning Sequences without Insertions and Deletions: Hamming DistanceGiven two sequences v and w :
v :
• The Hamming distance: dH(v, w) = 8 is large but the sequences are very similar
AT AT AT ATAT AT AT ATw :
![Page 3: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/3.jpg)
Aligning Sequences with Insertions and Deletions
v : AT AT AT ATAT AT AT ATw : ----
By shifting one sequence over one position:
• The edit distance: dH(v, w) = 2.
• Hamming distance neglects insertions and deletions in the sequences
![Page 4: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/4.jpg)
Edit Distance
Levenshtein (1966) introduced edit distance between two strings as the minimum number of elementary operations (insertions, deletions, and substitutions) to transform one string into the other
d(v,w) = MIN number of elementary operations
to transform v w
![Page 5: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/5.jpg)
Edit Distance vs Hamming Distance
V = ATATATAT
W = TATATATA
Hamming distance always compares i-th letter of v with i-th letter of w
Hamming distance: d(v, w)=8Computing Hamming distance is a trivial task.
![Page 6: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/6.jpg)
Edit Distance vs Hamming Distance
V = ATATATAT
W = TATATATA
Hamming distance: Edit distance: d(v, w)=8 d(v, w)=2
(one insertion and one deletion)
How to find what j goes with what i ???
W = TATATATA
V = - ATATATAT
Hamming distance always compares i-th letter of v with i-th letter of w
Edit distance may compare i-th letter of v with j-th letter of w
![Page 7: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/7.jpg)
Edit Distance: ExampleTGCATAT ATCCGAT in 5 steps
TGCATAT (delete last T)TGCATA (delete last A)TGCAT (insert A at front)ATGCAT (substitute C for 3rd G)ATCCAT (insert G before last A) ATCCGAT (Done)
![Page 8: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/8.jpg)
Alignment as a Path in the Edit Graph
0 1 2 2 3 4 5 6 7 70 1 2 2 3 4 5 6 7 7 A T _ G T T A T _A T _ G T T A T _ A T C G T _ A _ CA T C G T _ A _ C0 1 2 3 4 5 5 6 6 7 0 1 2 3 4 5 5 6 6 7
(0,0) , (1,1) , (2,2), (2,3), (0,0) , (1,1) , (2,2), (2,3), (3,4), (4,5), (5,5), (6,6), (3,4), (4,5), (5,5), (6,6), (7,6), (7,7)(7,6), (7,7)
- Corresponding path -
![Page 9: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/9.jpg)
Alignment as a Path in the Edit Graph
Old AlignmentOld Alignment 01223012234545677677v= AT_Gv= AT_GTTTTAT_AT_w= ATCGw= ATCGT_T_A_CA_C 01234012345555667667
New AlignmentNew Alignment 01223012234545677677v= AT_Gv= AT_GTTTTAT_AT_w= ATCGw= ATCG_T_TA_CA_C 01234012344545667667
![Page 10: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/10.jpg)
Dynamic programming (Cormen et al.)• Optimal substructure: The optimal solution to the problem
contains within it optimal solutions to subproblems.• Overlapping subproblems: The optimal solutions to subproblems
(“subsolutions”) overlap. These subsolutions are computed over and over again when computing the global optimal solution.
• Optimal substructure: We compute minimum distance of substrings in order to compute the minimum distance of the entire string.
• Overlapping subproblems: Need most distances of substrings 3 times (moving right, diagonally, down)
![Page 11: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/11.jpg)
Dynamic Programming
si,j = si-1, j-1+ (vi != wj)
min si-1, j +1
si, j-1+1
![Page 12: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/12.jpg)
Levenshtein distance: Computation
![Page 13: Dynamic Programming: Edit Distance. Aligning Sequences without Insertions and Deletions: Hamming Distance Given two sequences v and w : v : The Hamming](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697bfec1a28abf838cb88b8/html5/thumbnails/13.jpg)
Levenshtein distance: algorithm