alignment ii dynamic programming. 2 pair-wise sequence alignments a: c a t - t c a - c | | | | | b:...
Post on 22-Dec-2015
216 Views
Preview:
TRANSCRIPT
Alignment IIDynamic Programming
2
Pair-wise sequence alignments
A: C A T - T C A - C
| | | | |
B: C - T C G C A G C
Idea: Display one sequence above another with spaces inserted in both to reveal similarity
3
Two types of alignment
S = CTGTCGCTGCACGT = TGCCGTG
CTGTCGCTGCACG---------TGC-CGTG
CTGTCG-CTGCACG
-TGC-CG-TG----
Global alignment Local alignment
4
Global alignment: ScoringCTGTCG-CTGCACG
-TGC-CG-TG----
Reward for matches: Mismatch penalty: Space penalty:
score(A) = w – x - y
w = #matches x = #mismatchesy = #spaces
5
Global alignment: Scoring
C T G T C G – C T G C - T G C – C G – T G -
-5 10 10 -2 -5 -2 -5 -5 10 10 -5
Total = 11
Reward for matches:10
Mismatch penalty: 2Space penalty: 5
6
Optimum Alignment
• The score of an alignment is a measure of its quality
• Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score
• The similarity between X and Y, denoted sim(X,Y), is the maximum score of an alignment of X and Y
7
Alignment algorithms
• Global: Needleman-Wunsch• Local: Smith-Waterman• NW and SW use dynamic
programming• Variations:
– Gap penalty functions– Scoring matrices
8
Global Alignment: Algorithm
1..j1..i T and S of alignment optimum of Cost),( jiC
T of jlength of Prefix
S of i length of Prefix
..1
..1
j
i
T
S
ba
babaw
if
if),(
9
)1j,i(C
)j,1i(C
)T,S(w)1j,1i(C
max)j,i(Cji
j)j,0(Ci)0,i(C
Initial conditions:
Recurrence relation: For 1 i n, 1 j m:
Theorem. C(i,j) satisfies the following relationships:
10
Justification
S1 S2 . . . Si-1 Si
T1 T2 . . . Tj-1 Tj
C(i-1,j-1) + w(Si,Tj)
S1 S2 . . . Si-1 Si
T1 T2 . . . Tj —
C(i-1,j)
S1 S2 . . . Si —
T1 T2 . . . Tj-1 Tj
C(i,j-1)
11
Example
Case 1: Line up Si with Tj
S: C A T T C A C T: C - T T C A G
i - 1 i
jj -1
S: C A T T C A - C T: C - T T C A G -
Case 2: Line up Si with spacei - 1 i
j
S: C A T T C A C - T: C - T T C A - G
Case 3: Line up Tj with spacei
jj -1
12
Computation Procedure
C(n,m)
C(0,0)
C(i,j)
)1j,i(C,)j,1i(C),T,S(w)1j,1i(Cmax)j,i(C ji
C(i-1,j)C(i-1,j-1)
C(i,j-1)
13
λ C T C G C A G C
A
C
T
T
C
A
C
+10 for match, -2 for mismatch, -5 for space
0 -5 -10 -15 -20 -25 -30 -35 -40
-5
-10
-15
-20
-25
-30
-35
10 5
λ
14
0 -5 -10 -15 -20 -25 -30 -35 -40
-5 10 5 0 -5 -10 -15 -20 -25
-10 5 8 3 -2 -7 0 -5 -10
-15 0 15 10 5 0 -5 -2 -7
-20 -5 10 13 8 3 -2 -7 -4
-25 -10 5 20 15 18 13 8 3
-30 -15 0 15 18 13 28 23 18
-35 -20 -5 10 13 28 23 26 33
λ C T C G C A G C
A
C
T
T
C
A
C
λ
Traceback can yield both optimum alignments
**
15
End-gap free alignment
• Gaps at the start or end of alignment are not penalized
Best global Best end-gap free
Match: +2 Mismatch and space: -1
Score = 1 Score = 9
16
Motivation: Shotgun assembly
• Shotgun assembly produces large set of partially overlapping subsequences from many copies of one unknown DNA sequence.
• Problem: Use the overlapping sections to ”paste” the subsequences together.
• Overlapping pairs will have low global alignment score, but high end-space free score because of overlap.
17
Motivation: Shotgun assembly
18
Algorithm
• Same as global alignment, except:– Initialize with zeros (free gaps at
start)– Locate max in the last row/column
(free gaps at end)
19
10 5 10 5 10 5 0 10
λ C T C G C A G C
A
C
T
T
C
A
G
+10 for match, -2 for mismatch, -5 for gap
0 0 0 0 0 0 0 0 0
0
0
0
0
0
0
0
λ
5 8 5 8 5 20 15 10
0 15 10 5 6 15 18 13
-2 10 13 8 3 10 13 16
10 5 20 15 18 13 8 23 5 8 15 18 13 28 23 18
0 3 10 25 20 23 38 33
20
Local Alignment: Motivation
• Ignoring stretches of non-coding DNA:– Non-coding regions are more likely to be
subjected to mutations than coding regions.– Local alignment between two sequences is
likely to be between two exons.
• Locating protein domains:– Proteins of different kind and of different
species often exhibit local similarities– Local similarities may indicate ”functional
subunits”.
21
Local alignment: Example
Best local alignment:
Match: +2 Mismatch and space: -1
Score = 5
S = g g t c t g a gT = a a a c g a
g g t c t g a ga a a c – g a -
22
Local Alignment: Algorithm
Initialize top row and leftmost column to zero.
0
1,
,1
,]1,1[
max ,
jiC
jiC
jtisscorejiC
jiC
C [i, j] = Score of optimally aligning a suffix of s with a suffix of t.
23
0 0 0 0 0 0 0 0 0
0 1 0 1 0 1 0 0 1
0 0 0 0 0 0 2 0 0
0 0 1 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0
0 1 0 2 0 1 0 0 1
0 0 0 0 1 0 2 0 0
0 1 0 1 0 2 0 1 1
λ C T C G C A G C
A
C
T
T
C
A
C
λ
+1 for a match, -1 for a mismatch, -5 for a space
24
Some Results
• Most pairwise sequence alignment problems can be solved in O(mn) time.
• Space requirement can be reduced to O(m+n), while keeping run-time fixed [Myers88].
• Highly similar sequences can be aligned in O(dn) time, where d measures the distance between the sequences [Landau86].
25
Reducing space requirements
• O(mn) tables are often the limiting factor in computing large alignments
• There is a linear space technique that only doubles the time required [Hirschberg77]
26
0 10 5 10 5 10 5 0 10
λ C T C G C A G C
A
C
T
T
C
A
G
IDEA: We only need the previous row to calculate the next
0 0 0 0 0 0 0 0 0λ
0 5 8 5 8 5 20 15 10
27
Linear-space Alignments
mn + ½ mn + ¼ mn + 1/8 mn + 1/16 mn + … = 2 mn
28
Affine Gap Penalty Functions
Gap penalty = h + gk
where
k = length of gaph = gap opening penaltyg = gap continuation penalty
Can also be solved in O(nm) time using dynamic programming
top related