efficient algorithms for locating the length- constrained heaviest segments, with applications to...
Post on 20-Dec-2015
215 views
TRANSCRIPT
![Page 1: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/1.jpg)
Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Application
s to Biomolecular Sequence Analysis
Yaw-Ling Lin* Tao Jiang Kun-Mao Chao
* Dept CS & Info Mngmt, Providence Univ, TaiwanDept CS & Engineering, UC Riverside, USA
Dept CS & Info Engnr, Nat. Taiwan Univ, Taiwan
![Page 2: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/2.jpg)
Yaw-Ling Lin, Providence, Taiwan 2
Outline
• Introduction. • Applications to Biomolecular Sequence Analysis. • Maximum Sum Consecutive Subsequence.• Maximum Average Consecutive Subsequence.• Implementation and Preliminary Experiments• Concluding Remarks
![Page 3: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/3.jpg)
Yaw-Ling Lin, Providence, Taiwan 3
Introduction
• Two fundamental algorithms in searching for interesting regions in sequences:
• Given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum --- an O(n)-time algorithm.
• Given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. --- an O(n log L)-time algorithm.
![Page 4: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/4.jpg)
Yaw-Ling Lin, Providence, Taiwan 4
Applications to Biomolecular Sequence Analysis (I)
• Locating GC-Rich Regions– Finding GC-rich regions: an important problem in gene recogniti
on and comparative genomics.– CpG islands ( 200 ~ 1400 bp )– [Huang’94]: O(n L)-time algorithm.
• Post-Processing Sequence Alignments– Comparative analysis of human and mouse DNA: useful in gene
prediction in human genome.– Mosaic effect: bad inner sequence.– Normalized local alignment.– Post-processing local aligned subsequences
![Page 5: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/5.jpg)
Yaw-Ling Lin, Providence, Taiwan 5
Applications to Biomolecular Sequence Analysis (II)
• Annotating Multiple Sequence Alignments– [Stojanovic’99]: conserved regions in biomolecular sequences.
– Numerical scores for columns of a multiple alignment; each column score shall be adjusted by subtracting an anchor value.
• Ungapped Local Alignments with Length Constraints– Computing the length-constrained segment of each diagonal in th
e matrix with the largest sum (or average) of scores.
– Applications in motif identification.
![Page 6: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/6.jpg)
Yaw-Ling Lin, Providence, Taiwan 6
Maximum Sum Consecutive Subsequence
<-4,1,-2,3> is left-negative < 5, -3, 4, -1, 2, -6 > is not.
<5> <-3,4> <-1,2> <-6> is minimal left-negative partitioned.
![Page 7: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/7.jpg)
Yaw-Ling Lin, Providence, Taiwan 7
Minimal left-negative partition
![Page 8: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/8.jpg)
Yaw-Ling Lin, Providence, Taiwan 8
MLN-partition: linear time
![Page 9: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/9.jpg)
Yaw-Ling Lin, Providence, Taiwan 9
Max-Sum with LC
![Page 10: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/10.jpg)
Yaw-Ling Lin, Providence, Taiwan 10
Analysis of MSLC
![Page 11: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/11.jpg)
Yaw-Ling Lin, Providence, Taiwan 11
Max Average Subsequence
<4,2,3,8> is right-skew < 5, 3, 4, 1, 2, 6 > is not.
<5> <3,4> <1,2,6> is decreasing right-skew partitioned.
![Page 12: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/12.jpg)
Yaw-Ling Lin, Providence, Taiwan 12
Decreasing right-skiew partition
![Page 13: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/13.jpg)
Yaw-Ling Lin, Providence, Taiwan 13
DRS-partition: linear time
![Page 14: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/14.jpg)
Yaw-Ling Lin, Providence, Taiwan 14
Max-Avg-Seq with LC
![Page 15: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/15.jpg)
Yaw-Ling Lin, Providence, Taiwan 15
Locate good-partner
![Page 16: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/16.jpg)
Yaw-Ling Lin, Providence, Taiwan 16
Analysis of MaxAvgSeq
![Page 17: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/17.jpg)
Yaw-Ling Lin, Providence, Taiwan 17
Implementation and Preliminary Experiments
![Page 18: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/18.jpg)
Yaw-Ling Lin, Providence, Taiwan 18
Implementation and Preliminary Experiments
![Page 19: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/19.jpg)
Yaw-Ling Lin, Providence, Taiwan 19
Conclusion
• Find a max-sum subsequence of length at most U can be done in O(n)-time.
• Find a max-avg subsequence of length at least L can be done in O(n log L)-time.
![Page 20: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/20.jpg)
Yaw-Ling Lin, Providence, Taiwan 20
Recent Progress• Lu (CMCT’2002): finding the max-avg subsequen
ce of length at least L on binary (0,1) sequences. O(n)-time.
• Goldwasser, Kao, Lu (2002, manuscripts): finding the max-avg subsequence of length at least L and at most U on real sequences. O(n)-time
• Tools: finding CpG islands using MAVG (joint work with Huang, X., Jiang, T. and Chao, K.-M.) http://deepc2.zool.iastate.edu/aat/mavg/cgdoc.html http://deepc2.zool.iastate.edu/aat/mavg/cg.html
![Page 21: Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d445503460f94a21257/html5/thumbnails/21.jpg)
Yaw-Ling Lin, Providence, Taiwan 21
Future Research
• Best k (nonintersecting) subsequences?
• Normalized local alignment?
• Measurement of goodness?