Download - BIC I, Week 4 lectures
![Page 1: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/1.jpg)
1
BIC I, Week 4 lectures
Rhys Price Jones and Anne Haake
Rochester Institute of Technology
![Page 2: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/2.jpg)
2
Overview of the need for Dynamic Programming
• Consider Fibonacci• The obvious algorithm is elegant, easily
derived from the definition, and clearly correct.(define fib (lambda (n) (if (<= n 1) 1 (+ (fib (- n 2)) (fib (- n 1))))))
• But it’s hopelessly inefficient• Why?• Because it makes repeated recursive calls
with the same argument
![Page 3: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/3.jpg)
3
The Traditional Solution
• Change the order in which the computations are performed
• Change the logic of the program– So that it works “bottom up” instead of “top down”– Fill an array with calculated values starting with (fib 0), then
(fib 1) then (fib 2), etc.
• You can do it manually, as in fib.ss• That is dynamic programming!• The main problem is that it requires thought and
programming and hence may introduce error.
![Page 4: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/4.jpg)
4
It’s not just Fibonacci
• Many programs “write themselves” from the specification of the problem.
• When that happens, we are extremely pleased
• Sadly, the resulting program is often inefficient
• But dynamic programming is a technique to make it efficient again.
![Page 5: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/5.jpg)
5
Memo-izing
• Redefine the function calling mechanism so that:– We first check to see if we’ve made that calculation before– If no, go ahead and compute it but store the result in a hash
table– If yes, look up the previously computed value in the hash
table
• Do it once• Inefficient code becomes efficient automatically with
no re-programming memolambda.ss
memofib.ssmemofib.ss
![Page 6: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/6.jpg)
6
Another Example
• Pascal’s triangle• Each entry is the sum of its parents
– Cn,k = Cn-1,k-1 + Cn-1,k
– C0,k = Cn,0 = 1
• Leading to program• Runs really slowly• Replace lambda by memolambda
badcomb.ss
badcomb.ss
goodcomb.ss
![Page 7: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/7.jpg)
7
Review of Pattern Matching
• Does CGGA appear within the sequence ATCGCGTAACGGAGATAGGCTTA ?
• More generally, where does pattern p (length n) appear within text t (length m)
• Boyer-Moore, or Knuth-Morris-Pratt give O(m+n) search
• If p is going to change a lot and t stay the same, suffix tree can be built in O(m), each search is then O(n)
• If p is stable and there are lots of different t, virtual machine can be built in O(n) and then each search is O(m)
![Page 8: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/8.jpg)
8
Build a Virtual Machine
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 9: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/9.jpg)
9
First Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 10: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/10.jpg)
10
Second Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 11: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/11.jpg)
11
Third Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 12: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/12.jpg)
12
Fourth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 13: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/13.jpg)
13
Fifth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 14: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/14.jpg)
14
Sixth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 15: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/15.jpg)
15
Seventh Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 16: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/16.jpg)
16
Eighth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 17: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/17.jpg)
17
Ninth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 18: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/18.jpg)
18
Tenth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 19: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/19.jpg)
19
Eleventh Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 20: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/20.jpg)
20
Twelfth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 21: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/21.jpg)
21
Thirteenth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 22: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/22.jpg)
22
Fourteenth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 23: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/23.jpg)
23
Fifteenth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 24: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/24.jpg)
24
Sixteenth Step
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 25: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/25.jpg)
25
17th – 23rd Steps
• CGGA
AGT
C G G A
ACGT
C
C CAT
AT
GT
ATCGCGTAACGGAGATAGGCTTA
![Page 26: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/26.jpg)
26
Pattern Matching – Conclusion
• Exact pattern matching is easy• Often the naive algorithm is good enough• Fast algorithms are readily available• Sadly, not much use for biological tasks
![Page 27: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/27.jpg)
27
Why not?
• What’s the difference?• Mutation• Insertion/deletion gaps• We need an inexact way to compare two (or
more) biological sequences
![Page 28: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/28.jpg)
28
Pattern Matching vs. Sequence Alignment
• In the CS world, we talk of comparing strings, or matching patterns of characters within strings
• For biological applications, we talk of comparing sequences, or aligning sequences of nucleotides (or amino acids) to each other
![Page 29: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/29.jpg)
29
Evolutionary Relatedness
• Consider ACCGT and CACGT• How likely is it that they are “related”?• Possible alignments:• ACCGT AC-CGTXX||| -|-|||CACGT -CACGT
• Which is better?
![Page 30: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/30.jpg)
30
It Depends
• ACCGT AC-CGTXX||| -|-|||CACGT -CACGT
• Scoring 2 for a match, -2 for a mismatch, and –1 for a gap, 2 versus 6
• Scoring 2 for a match, 0 for a mismatch and –2 for a gap, 6 versus 4
• And we haven’t even begun to consider experimental evidence that might cause us to rank some mutations better than others!
![Page 31: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/31.jpg)
31
Distance measure
• Score 0 for a match• 1 for a mismatch or gap• Low score best!• ACCGT AC-CGTXX||| -|-|||CACGT -CACGT
• Now it’s 2 versus 2
![Page 32: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/32.jpg)
32
Global alignment
• For two sequences• - A C C A C C-ACACC
• Use the scoring scheme to fill in the table, starting with first row and first column
![Page 33: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/33.jpg)
33
First entries
• Using the distance measure• - A C C A C C- 0 1 2 3 4 5 6A 1 C 2A 3C 4A 5
• Each nucleotide<->gap costs 1 point
![Page 34: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/34.jpg)
34
Extending inwards
• Extending the distance measure• - A C C A C C- 0 1 2 3 4 5 6A 1 0 1 2 3 4 5 C 2 1 A 3 2 C 4 3 A 5 4
• Extending from North or West costs 1 point, from NW costs 0 (match) or 1 (mismatch)
• Pick cheapest of the three
![Page 35: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/35.jpg)
35
More extension
• - A C C A C C- 0 1 2 3 4 5 6A 1 0 1 2 3 4 5 C 2 1 0 1 2 3 4 A 3 2 1 1 C 4 3 2 1 A 5 4 3 2
• mi,j = min (mi,j-1+g mi-1,j+g mi-1,j-1+cij)
• where cij = 0 for a match, 1 for a mismatch
![Page 36: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/36.jpg)
36
Getting there...
• - A C C A C C- 0 1 2 3 4 5 6A 1 0 1 2 3 4 5 C 2 1 0 1 2 3 4A 3 2 1 1 1 2 3C 4 3 2 1 2 A 5 4 3 2 1
• mi,j = min (mi,j-1+1 mi-1,j+1 mi-1,j-1+cij)
• where cij = 0 for a match, 1 for a mismatch
![Page 37: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/37.jpg)
37
Almost done...
• - A C C A C C- 0 1 2 3 4 5 6A 1 0 1 2 3 4 5 C 2 1 0 1 2 3 4A 3 2 1 1 1 2 3C 4 3 2 1 2 1 2A 5 4 3 2 1 2
• mi,j = min (mi,j-1+1 mi-1,j+1 mi-1,j-1+cij)
• where cij = 0 for a match, 1 for a mismatch
![Page 38: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/38.jpg)
38
Finally, we can get a Global alignment
• One of the least-cost routes• - A C C A C C- 0 1 2 3 4 5 6A 1 0 1 2 3 4 5 C 2 1 0 1 2 3 4A 3 2 1 1 1 2 3C 4 3 2 1 2 1 2A 5 4 3 2 1 2 2
• Can you see how this path leads to the alignment• ACCACCAC-ACA
![Page 39: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/39.jpg)
39
Global alignment program
• Distance measure• Runnable program• Dynamic Programming version
globalig.txt
globalig.ss
globaligm.ss
![Page 40: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/40.jpg)
40
Global vs Local Alignment
• Global alignment seeks the best alignment between the complete sequence and the complete sequenceA global alignment between GATCCACCA and GTAACACA might be
• G-ATCCACCA|-|X|-||-|GTAAC-AC-A
• A local alignment is the best alignment between subsequences. A local alignment between GATCCACCA and GTAACACA might be
• gATCCACca |X|-||gtAAC-ACa
• Best local alignment depends on scoring scheme
![Page 41: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/41.jpg)
41
Local Alignment
• For this demo, we will use a different measure– 2 for a match– -1 for a mismatch, -2 for a gap– Find best match withinG C T C T G C G A A T A GC G T T G A G A T A C T C
![Page 42: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/42.jpg)
42
The solution
• - G C T C T G C G A A T A G
- 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 2 0 2 0 0 2 0 0 0 0 0 0
G 0 2 0 1 0 1 2 0 4 2 0 0 0 2
T 0 0 1 2 0 2 0 1 2 3 1 2 0 0
T 0 0 0 3 1 2 1 0 0 1 2 3 1 0
G 0 2 0 1 2 0 4 2 2 0 0 1 2 3
A 0 0 1 0 0 1 2 3 1 4 2 0 3 1
G 0 2 0 0 0 0 3 1 5 3 3 1 1 5
A 0 0 1 0 0 0 1 2 3 7 5 3 3 3
T 0 0 0 3 1 2 0 0 1 5 6 7 5 3
A 0 0 0 1 2 0 1 0 0 3 7 5 9 7
C 0 0 2 0 3 1 0 3 1 1 5 6 7 8
T 0 0 0 4 2 5 3 1 2 0 3 7 5 6
C 0 0 2 2 6 4 4 5 3 1 1 5 6 4
• G C T C T G C G A A T A G
| | x | | X | |
C G T T G A G A - T A C T C
![Page 43: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/43.jpg)
43
The Program
• Has dynamic programming to make it fast!• This is basically Smith-Waterman• Work has been done on different scoring
schemes, gap penalties, etc.• Runs in time O(mn)
localig.ss
![Page 44: BIC I, Week 4 lectures](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814cf5550346895db9f9d0/html5/thumbnails/44.jpg)
44
Exercises
• that we will attempt in class:– amend global alignment program to do the
“backtracking” needed for the alignment
• that will be homework– amend local alignment program to do the
“backtracking” needed for the alignment