![Page 1: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/1.jpg)
Alignment IIDynamic Programming
![Page 2: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/2.jpg)
2
Pair-wise sequence alignments
A: C A T - T C A - C
| | | | |
B: C - T C G C A G C
Idea: Display one sequence above another with spaces inserted in both to reveal similarity
![Page 3: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/3.jpg)
3
Two types of alignment
S = CTGTCGCTGCACGT = TGCCGTG
CTGTCGCTGCACG---------TGC-CGTG
CTGTCG-CTGCACG
-TGC-CG-TG----
Global alignment Local alignment
![Page 4: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/4.jpg)
4
Global alignment: ScoringCTGTCG-CTGCACG
-TGC-CG-TG----
Reward for matches: Mismatch penalty: Space penalty:
score(A) = w – x - y
w = #matches x = #mismatchesy = #spaces
![Page 5: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/5.jpg)
5
Global alignment: Scoring
C T G T C G – C T G C - T G C – C G – T G -
-5 10 10 -2 -5 -2 -5 -5 10 10 -5
Total = 11
Reward for matches:10
Mismatch penalty: 2Space penalty: 5
![Page 6: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/6.jpg)
6
Optimum Alignment
• The score of an alignment is a measure of its quality
• Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score
• The similarity between X and Y, denoted sim(X,Y), is the maximum score of an alignment of X and Y
![Page 7: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/7.jpg)
7
Alignment algorithms
• Global: Needleman-Wunsch• Local: Smith-Waterman• NW and SW use dynamic
programming• Variations:
– Gap penalty functions– Scoring matrices
![Page 8: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/8.jpg)
8
Global Alignment: Algorithm
1..j1..i T and S of alignment optimum of Cost),( jiC
T of jlength of Prefix
S of i length of Prefix
..1
..1
j
i
T
S
ba
babaw
if
if),(
![Page 9: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/9.jpg)
9
)1j,i(C
)j,1i(C
)T,S(w)1j,1i(C
max)j,i(Cji
j)j,0(Ci)0,i(C
Initial conditions:
Recurrence relation: For 1 i n, 1 j m:
Theorem. C(i,j) satisfies the following relationships:
![Page 10: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/10.jpg)
10
Justification
S1 S2 . . . Si-1 Si
T1 T2 . . . Tj-1 Tj
C(i-1,j-1) + w(Si,Tj)
S1 S2 . . . Si-1 Si
T1 T2 . . . Tj —
C(i-1,j)
S1 S2 . . . Si —
T1 T2 . . . Tj-1 Tj
C(i,j-1)
![Page 11: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/11.jpg)
11
Example
Case 1: Line up Si with Tj
S: C A T T C A C T: C - T T C A G
i - 1 i
jj -1
S: C A T T C A - C T: C - T T C A G -
Case 2: Line up Si with spacei - 1 i
j
S: C A T T C A C - T: C - T T C A - G
Case 3: Line up Tj with spacei
jj -1
![Page 12: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/12.jpg)
12
Computation Procedure
C(n,m)
C(0,0)
C(i,j)
)1j,i(C,)j,1i(C),T,S(w)1j,1i(Cmax)j,i(C ji
C(i-1,j)C(i-1,j-1)
C(i,j-1)
![Page 13: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/13.jpg)
13
λ C T C G C A G C
A
C
T
T
C
A
C
+10 for match, -2 for mismatch, -5 for space
0 -5 -10 -15 -20 -25 -30 -35 -40
-5
-10
-15
-20
-25
-30
-35
10 5
λ
![Page 14: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/14.jpg)
14
0 -5 -10 -15 -20 -25 -30 -35 -40
-5 10 5 0 -5 -10 -15 -20 -25
-10 5 8 3 -2 -7 0 -5 -10
-15 0 15 10 5 0 -5 -2 -7
-20 -5 10 13 8 3 -2 -7 -4
-25 -10 5 20 15 18 13 8 3
-30 -15 0 15 18 13 28 23 18
-35 -20 -5 10 13 28 23 26 33
λ C T C G C A G C
A
C
T
T
C
A
C
λ
Traceback can yield both optimum alignments
**
![Page 15: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/15.jpg)
15
End-gap free alignment
• Gaps at the start or end of alignment are not penalized
Best global Best end-gap free
Match: +2 Mismatch and space: -1
Score = 1 Score = 9
![Page 16: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/16.jpg)
16
Motivation: Shotgun assembly
• Shotgun assembly produces large set of partially overlapping subsequences from many copies of one unknown DNA sequence.
• Problem: Use the overlapping sections to ”paste” the subsequences together.
• Overlapping pairs will have low global alignment score, but high end-space free score because of overlap.
![Page 17: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/17.jpg)
17
Motivation: Shotgun assembly
![Page 18: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/18.jpg)
18
Algorithm
• Same as global alignment, except:– Initialize with zeros (free gaps at
start)– Locate max in the last row/column
(free gaps at end)
![Page 19: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/19.jpg)
19
10 5 10 5 10 5 0 10
λ C T C G C A G C
A
C
T
T
C
A
G
+10 for match, -2 for mismatch, -5 for gap
0 0 0 0 0 0 0 0 0
0
0
0
0
0
0
0
λ
5 8 5 8 5 20 15 10
0 15 10 5 6 15 18 13
-2 10 13 8 3 10 13 16
10 5 20 15 18 13 8 23 5 8 15 18 13 28 23 18
0 3 10 25 20 23 38 33
![Page 20: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/20.jpg)
20
Local Alignment: Motivation
• Ignoring stretches of non-coding DNA:– Non-coding regions are more likely to be
subjected to mutations than coding regions.– Local alignment between two sequences is
likely to be between two exons.
• Locating protein domains:– Proteins of different kind and of different
species often exhibit local similarities– Local similarities may indicate ”functional
subunits”.
![Page 21: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/21.jpg)
21
Local alignment: Example
Best local alignment:
Match: +2 Mismatch and space: -1
Score = 5
S = g g t c t g a gT = a a a c g a
g g t c t g a ga a a c – g a -
![Page 22: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/22.jpg)
22
Local Alignment: Algorithm
Initialize top row and leftmost column to zero.
0
1,
,1
,]1,1[
max ,
jiC
jiC
jtisscorejiC
jiC
C [i, j] = Score of optimally aligning a suffix of s with a suffix of t.
![Page 23: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/23.jpg)
23
0 0 0 0 0 0 0 0 0
0 1 0 1 0 1 0 0 1
0 0 0 0 0 0 2 0 0
0 0 1 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0
0 1 0 2 0 1 0 0 1
0 0 0 0 1 0 2 0 0
0 1 0 1 0 2 0 1 1
λ C T C G C A G C
A
C
T
T
C
A
C
λ
+1 for a match, -1 for a mismatch, -5 for a space
![Page 24: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/24.jpg)
24
Some Results
• Most pairwise sequence alignment problems can be solved in O(mn) time.
• Space requirement can be reduced to O(m+n), while keeping run-time fixed [Myers88].
• Highly similar sequences can be aligned in O(dn) time, where d measures the distance between the sequences [Landau86].
![Page 25: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/25.jpg)
25
Reducing space requirements
• O(mn) tables are often the limiting factor in computing large alignments
• There is a linear space technique that only doubles the time required [Hirschberg77]
![Page 26: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/26.jpg)
26
0 10 5 10 5 10 5 0 10
λ C T C G C A G C
A
C
T
T
C
A
G
IDEA: We only need the previous row to calculate the next
0 0 0 0 0 0 0 0 0λ
0 5 8 5 8 5 20 15 10
![Page 27: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/27.jpg)
27
Linear-space Alignments
mn + ½ mn + ¼ mn + 1/8 mn + 1/16 mn + … = 2 mn
![Page 28: Alignment II Dynamic Programming. 2 Pair-wise sequence alignments A: C A T - T C A - C | | | | | B: C - T C G C A G C Idea: Display one sequence above](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649d765503460f94a57f09/html5/thumbnails/28.jpg)
28
Affine Gap Penalty Functions
Gap penalty = h + gk
where
k = length of gaph = gap opening penaltyg = gap continuation penalty
Can also be solved in O(nm) time using dynamic programming