sequence alignment -...
TRANSCRIPT
![Page 1: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/1.jpg)
Sequence Alignment
Mark Voorhies
4/12/2018
Mark Voorhies Sequence Alignment
![Page 2: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/2.jpg)
Exercise: Scoring an ungapped alignment
Given two sequences and a scoring matrix, find the offset thatyields the best scoring ungapped alignment.
Mark Voorhies Sequence Alignment
![Page 3: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/3.jpg)
Exercise: Scoring an ungapped alignment
Given two sequences and a scoring matrix, find the offset thatyields the best scoring ungapped alignment.
def s c o r e (S , x , y ) :a s s e r t ( l e n ( x ) == l e n ( y ) )s = 0f o r ( i , j ) i n z i p ( x , y ) :
s += S [ i ] [ j ]r e t u r n s
Mark Voorhies Sequence Alignment
![Page 4: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/4.jpg)
Exercise: Scoring an ungapped alignment
Given two sequences and a scoring matrix, find the offset thatyields the best scoring ungapped alignment.
def sub s eq s ( x , y , i ) :i f ( i > 0 ) :
y = y [ i : ]e l i f ( i < 0 ) :
x = x[− i : ]L = min ( l e n ( x ) , l e n ( y ) )r e t u r n x [ : L ] , y [ : L ]
Mark Voorhies Sequence Alignment
![Page 5: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/5.jpg)
Exercise: Scoring an ungapped alignment
Given two sequences and a scoring matrix, find the offset thatyields the best scoring ungapped alignment.
def a l i gnment ( x , y , i ) :i f ( i > 0 ) :
x = ”−”∗ i+xe l i f ( i < 0 ) :
y = ”−”∗(− i )+yL = l e n ( y ) − l e n ( x )i f (L > 0 ) :
x += ”−”∗Le l i f (L < 0 ) :
y += ”−”∗(−L)r e t u r n x , y
Mark Voorhies Sequence Alignment
![Page 6: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/6.jpg)
Exercise: Scoring an ungapped alignment
Given two sequences and a scoring matrix, find the offset thatyields the best scoring ungapped alignment.
def ungapped (S , x , y ) :b e s t = Noneb e s t s c o r e = Nonef o r i i n range(− l e n ( x )+1 , l e n ( y ) ) :
( sx , sy ) = subseq s ( x , y , i )s = s c o r e (S , sx , sy )i f ( ( b e s t s c o r e i s None ) or ( s > b e s t s c o r e ) ) :
b e s t s c o r e = sbe s t = i
r e t u r n best , b e s t s c o r e , a l i gnment ( x , y , b e s t )
Mark Voorhies Sequence Alignment
![Page 7: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/7.jpg)
Exercise: Scoring an ungapped alignment
Given two sequences and a scoring matrix, find the offset thatyields the best scoring ungapped alignment.
def ungapped (S , x , y ) :b e s t = Noneb e s t s c o r e = Nonef o r i i n range(− l e n ( x )+1 , l e n ( y ) ) :
( sx , sy ) = subseq s ( x , y , i )s = s c o r e (S , sx , sy )i f ( ( b e s t s c o r e i s None ) or ( s > b e s t s c o r e ) ) :
b e s t s c o r e = sbe s t = i
r e t u r n best , b e s t s c o r e , a l i gnment ( x , y , b e s t )
Mark Voorhies Sequence Alignment
![Page 8: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/8.jpg)
Exercise: Scoring a gapped alignment
Write a new scoring function with separate penalties for opening azero length gap (e.g., G = -11) and extending an open gap by onebase (e.g., E = -1).
Sgapped(x , y) = S(x , y) +
gaps∑i
(G + E ∗ len(i))
Mark Voorhies Sequence Alignment
![Page 9: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/9.jpg)
Exercise: Scoring a gapped alignment
Sgapped(x , y) = S(x , y) +
gaps∑i
(G + E ∗ len(i))
Mark Voorhies Sequence Alignment
![Page 10: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/10.jpg)
Exercise: Scoring a gapped alignment
Sgapped(x , y) = S(x , y) +
gaps∑i
(G + E ∗ len(i))
def gapped s co r e ( seq1 , seq2 ,s , g = 0 , e = −1):
gap = Nones c o r e = 0f o r p a i r i n z i p ( seq1 , seq2 ) :
a s s e r t ( p a i r != ( ”−” , ”−” ) )t r y :
curgap = p a i r . i n d e x ( ”−” )except Va l u eE r r o r :
s c o r e += s [ p a i r [ 0 ] ] [ p a i r [ 1 ] ]gap = None
e l s e :i f ( gap != curgap ) :
s c o r e += ggap = curgap
s c o r e += er e t u r n s c o r e
Mark Voorhies Sequence Alignment
![Page 11: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/11.jpg)
Exercise: Scoring a gapped alignment
Sgapped(x , y) = S(x , y) +
gaps∑i
(G + E ∗ len(i))
def gapped s co r e ( seq1 , seq2 ,s , g = 0 , e = −1):
gap = Nones c o r e = 0f o r p a i r i n z i p ( seq1 , seq2 ) :
a s s e r t ( p a i r != ( ”−” , ”−” ) )t r y :
curgap = p a i r . i n d e x ( ”−” )except Va l u eE r r o r :
s c o r e += s [ p a i r [ 0 ] ] [ p a i r [ 1 ] ]gap = None
e l s e :i f ( gap != curgap ) :
s c o r e += ggap = curgap
s c o r e += er e t u r n s c o r e
def gapped s co r e ( seq1 , seq2 ,s , g = 0 , e = −1):
gap = Nones c o r e = 0f o r ( c1 , c2 ) i n z i p ( seq1 , seq2 ) :
i f ( ( c1 == ”−” ) and ( c2 == ”−” ) ) :r a i s e Va l u eE r r o r
e l i f ( c1 == ”−” ) :i f ( gap != 1 ) :
s c o r e += ggap = 1
s c o r e += ee l i f ( c2 == ”−” ) :
i f ( gap != 2 ) :s c o r e += ggap = 2
s c o r e += ee l s e :
s c o r e += s [ c1 ] [ c2 ]gap = None
r e t u r n s c o r e
Mark Voorhies Sequence Alignment
![Page 12: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/12.jpg)
How many ways can we align two sequences?
Mark Voorhies Sequence Alignment
![Page 13: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/13.jpg)
How many ways can we align two sequences?
Mark Voorhies Sequence Alignment
![Page 14: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/14.jpg)
How many ways can we align two sequences?
Mark Voorhies Sequence Alignment
![Page 15: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/15.jpg)
How many ways can we align two sequences?
Mark Voorhies Sequence Alignment
![Page 16: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/16.jpg)
How many ways can we align two sequences?
Binomial formula: (k
r
)=
k!
(k − r)!r !(2n
n
)=
(2n)!
n!n!
Stirling’s approximation:
x! ≈√
2π(
xx+ 12
)e−x
(2n
n
)≈ 22n√
πn
Mark Voorhies Sequence Alignment
![Page 17: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/17.jpg)
How many ways can we align two sequences?
Binomial formula: (k
r
)=
k!
(k − r)!r !
(2n
n
)=
(2n)!
n!n!
Stirling’s approximation:
x! ≈√
2π(
xx+ 12
)e−x
(2n
n
)≈ 22n√
πn
Mark Voorhies Sequence Alignment
![Page 18: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/18.jpg)
How many ways can we align two sequences?
Binomial formula: (k
r
)=
k!
(k − r)!r !(2n
n
)=
(2n)!
n!n!
Stirling’s approximation:
x! ≈√
2π(
xx+ 12
)e−x
(2n
n
)≈ 22n√
πn
Mark Voorhies Sequence Alignment
![Page 19: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/19.jpg)
How many ways can we align two sequences?
Binomial formula: (k
r
)=
k!
(k − r)!r !(2n
n
)=
(2n)!
n!n!
Stirling’s approximation:
x! ≈√
2π(
xx+ 12
)e−x
(2n
n
)≈ 22n√
πn
Mark Voorhies Sequence Alignment
![Page 20: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/20.jpg)
How many ways can we align two sequences?
Binomial formula: (k
r
)=
k!
(k − r)!r !(2n
n
)=
(2n)!
n!n!
Stirling’s approximation:
x! ≈√
2π(
xx+ 12
)e−x
(2n
n
)≈ 22n√
πn
Mark Voorhies Sequence Alignment
![Page 21: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/21.jpg)
Dynamic Programming
Mark Voorhies Sequence Alignment
![Page 22: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/22.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 23: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/23.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 24: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/24.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 25: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/25.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 26: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/26.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 27: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/27.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 28: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/28.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 29: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/29.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 30: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/30.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 31: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/31.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 32: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/32.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 33: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/33.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 34: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/34.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 35: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/35.jpg)
Needleman-Wunsch
Mark Voorhies Sequence Alignment
![Page 36: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/36.jpg)
Smith-Waterman
The implementation of local alignment is the same as for globalalignment, with a few changes to the rules:
Initialize edges to 0 (no penalty for starting in the middle of asequence)
The maximum score is never less than 0, and no pointer isrecorded unless the score is greater than 0 (note that thisimplies negative scores for gaps and bad matches)
The trace-back starts from the highest score in the matrix andends at a score of 0 (local, rather than global, alignment)
Because the naive implementation is essentially the same, the timeand space requirements are also the same.
Mark Voorhies Sequence Alignment
![Page 37: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/37.jpg)
Smith-Waterman
A G C G G T A
G
A
G
C
G
GA
0 0 0 0 0 0 0 0
0
0
0
0
0
0
0
0 1 0 0 1 0 0
1 0 0 0 0 0 1
0 2 1 1 1 0 0
0 1 3 2 1 0 0
0 0 2 4 3 2 1
0 1 31 5 4 3
1 0 0 2 4 4 5
Mark Voorhies Sequence Alignment
![Page 38: Sequence Alignment - histo.ucsf.eduhisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides07_Alignment.pdf · The implementation of local alignment is the same as for global alignment, with](https://reader033.vdocuments.us/reader033/viewer/2022050607/5fae7e88dda477200079b968/html5/thumbnails/38.jpg)
Final Homework
Implement Needleman-Wunsch global alignment with zero gapopening penalties. Try attacking the problem in this order:
1 Initialize and fill in a dynamic programming matrix by hand(e.g., try reproducing the example from my slides on paper).
2 Write a function to create the dynamic programming matrixand initialize the first row and column.
3 Write a function to fill in the rest of the matrix
4 Rewrite the initialize and fill steps to store pointers to thebest sub-solution for each cell.
5 Write a backtrace function to read the optimal alignmentfrom the filled in matrix.
If that isn’t enough to keep you occupied, try implementingSmith-Waterman local alignment and/or non-zero gap openingpenalties.
Mark Voorhies Sequence Alignment