rna assembly using extending method
DESCRIPTION
RNA Assembly Using extending method. Wei Xueliang 2010-04-07. Overview. Why abandon deBruijn . Why abandon Extended deBruijn . Introduction to current method. Handle the old problem. The new problem. Tod o. Why abandon deBruijn . De Bruijn Graph’s ( dis )advantage: Very Fast. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/1.jpg)
RNA Assembly Using extending method.
Wei Xueliang2010-04-07
![Page 2: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/2.jpg)
Overview
• Why abandon deBruijn.• Why abandon Extended deBruijn.• Introduction to current method.• Handle the old problem.• The new problem.• Todo
![Page 3: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/3.jpg)
Why abandon deBruijn.• De Bruijn Graph’s (dis)advantage: – Very Fast. – Coverage distribution and K-Value affect a
lot
• Key : the coverage is not uniform distributed in the RNA assembly.– No best K value.
![Page 4: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/4.jpg)
Why abandon deBruijn.
• The length of the red part is 27.
![Page 5: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/5.jpg)
deBruijn Graph of K = 28
![Page 6: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/6.jpg)
deBruijn Graph of K = 29
![Page 7: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/7.jpg)
deBruijn Graph of K = 30
![Page 8: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/8.jpg)
Why abandon deBruijn.• Key : The coverage is not uniform distributed
in the RNA assembly.– No best K value.
• Can we using different K to run the program many times?
• This is not De Novo Assembly’s job. – Time. – Provide high accurate contigs with-in limited time.– Scaffolding programs.
![Page 9: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/9.jpg)
Why abandon Extended deBruijn.• My Extended de Bruijn method: – Using two or more K value at the same time.
![Page 10: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/10.jpg)
Why abandon Extended deBruijn.
• The change rate of coverage is above my expectation. Need many K.
• The convert between different K are difficult. • Memory problem for big K. When K > 32, each
K-index need > 50G (with Data-Sets: 10G)
• Throw the K away.
![Page 11: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/11.jpg)
Introduction to the new method
• From Pramila’s genome assembly method. • Start from any Tag and do a correction.• If successfully corrected, continue.
![Page 12: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/12.jpg)
Introduction to the new method
• Find all the tag which have at least 24 bps overlaps. (Magic number)
• Using these overlapping tags to extend Base and continue add more tags.
![Page 13: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/13.jpg)
Introduction to the new method
• How to find the overlapping tags fast and with mis-match?
• Index and Union:{Tag3}, {Tag2, Tag3}, {Tag3, Tag4}Union =>{Tag1, Tag2, Tag3, Tag4}
![Page 14: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/14.jpg)
Introduction to the new method
• How to find the next overlapping tags fast and with mis-match?
• V1 <= U3• V2 <= (U1 << 1) + 0• V3 <= (U2 << 1) + 0
![Page 15: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/15.jpg)
Handle the old problem.
• When the length of overlapping part < 24?
![Page 16: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/16.jpg)
Handle the old problem.
• Check the tags one by one by descending order of the length of overlap.
![Page 17: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/17.jpg)
Handle the old problem.
A GOverlap Count % Count %
60 1 6.67% 1 4.76%52 3 20.00% 1 4.76%44 6 40.00% 2 9.52%36 10 66.67% 10 47.62%30 11 73.33% 16 76.19%24 15 100.00% 21 100.00%
![Page 18: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/18.jpg)
Handle the old problem.
A G(High Exp)Overlap Count % Count %
56 1 6.67% 5 2.50%50 3 20.00% 10 5.00%44 6 40.00% 20 10.00%36 10 66.67% 120 60.00%30 11 73.33% 150 75.00%24 15 100.00% 200 100.00%
![Page 19: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/19.jpg)
Handle the old problem.
• Degree of approximation.
![Page 20: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/20.jpg)
Handle the old problem.
• Less tips.
• Do not have bubbles. – Because we doing
overlap with mis-match.
– Use whole tags
![Page 21: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/21.jpg)
The new problem.
• Speed.
• The tail of the tag often have more errors.– Reverse Extending Problem.
![Page 22: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/22.jpg)
Todo
• Handle Reverse Extending Problem.• Speed
• Finish the comparision between deBruijn method(velvet) and my method.
• Paired End Tag.
![Page 23: RNA Assembly Using extending method](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815cf7550346895dcaf998/html5/thumbnails/23.jpg)
• Thank you very much for attention.