foldalign 2.2: multithreaded structural rna alignment

27
Foldalign 2.2: multithreaded structural RNA alignment Daniel Sundfeld 1,3 Jakob Havgaard 1,2 Alba C. M. A. de Melo 3 Jan Gorodkin 1,2 1 Center for non-coding RNA in Technology and Health 2 IKVH University of Copenhagen 3 Department of Computer Science University of Brasilia 19/02/2015 - Winterseminar Sundfeld et al Foldalign 2.2 19/02/2015 1 / 16

Upload: others

Post on 20-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Foldalign 2.2: multithreaded structural RNA alignment

Foldalign 2.2: multithreaded structural RNA alignment

Daniel Sundfeld1,3 Jakob Havgaard1,2 Alba C. M. A. de Melo3

Jan Gorodkin1,2

1Center for non-coding RNA in Technology and Health

2IKVHUniversity of Copenhagen

3Department of Computer ScienceUniversity of Brasilia

19/02/2015 - Winterseminar

Sundfeld et al Foldalign 2.2 19/02/2015 1 / 16

Page 2: Foldalign 2.2: multithreaded structural RNA alignment

Foldalign

*

Algorithms for simultaneously aligning and folding:

the probabilistic methods: usually based on Stochastic ContextFree Grammars, that use statistical learning to creates a solutionbased on previously known answers;

Sankoff-like methods: use some method of minimization of the freeenergy to calculates how elements of an RNA structure contribute tofree energy.

*M. Ziv-Ukelson, I. Gat-Viks, Y. Wexler, and R. Shamir. A faster algorithm for rna co-folding. In Proceedings of the 8thInternational Workshop on Algorithms in Bioinformatics, pages 174–185, 2008.

Sundfeld et al Foldalign 2.2 19/02/2015 2 / 16

Page 3: Foldalign 2.2: multithreaded structural RNA alignment

Foldalign

The time complexity O(L6), where L is the sequence length of thesequences, makes the Sankoff algorithm prohibited for long sequences.Foldalign use several constraints to reduce the time requirement:

The final alignment is limited to λ nucleotides, and the maximumlength difference between any two subsequences being aligned islimited to δ nucleotides;

Subalignments with score bellow to a threshold are eliminated.

With this constraints, the time complexity is reduced to O(L2λ2δ2)

Sundfeld et al Foldalign 2.2 19/02/2015 3 / 16

Page 4: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignFoldalign recursion function.

The simplified Foldalign recursion:

Di,j,k,l = max

Di+1,j−1,k+1,l−1 + Sbp(ni , nj , nk , nl , σbp) (a)

Di+1,j−1,k,l + SbpiI (ni , nj ,−,−, σbpiI ) (b)

Di,j,k+1,l−1 + SbpiK (−,−, nk , nl , σbpiK ) (c)

Di+1,j,k+1,l + Sal (ni , nk , σal ) (d)

Di,j−1,k,l−1 + Sar (nj , nl , σar ) (e)

Di+1,j,k,l + SglI (ni ,−, σglI ) (f)

Di,j−1,k,l + SglrI (nj ,−, σglrI ) (g)

Di,j,k+1,l + SglK (−, nk , σglK ) (h)

Di,j,k,l−1 + SgrK (−, nl , σgrK ) (i)max

i<m<jk<n<l

{Di,m,k,n + Dm+1,j,n+1,l + σmbl} (j)

(1)

Sundfeld et al Foldalign 2.2 19/02/2015 4 / 16

Page 5: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignGraphical Representantion

J. Gorodkin and W. L. Ruzzo. RNA Sequence, Structure and Function: Computational and Bioinformatic Methods. HumanaPress, 2014.

Sundfeld et al Foldalign 2.2 19/02/2015 5 / 16

Page 6: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignFoldalign algorithm.

1: for i = LengthI → 1 do

2: for k = LengthK → 1 do3: for j = 1→ λ do

4: for l = (j − δ)→ (j + δ) do

5: if Msi,j,k,l can be the right part of bifurcation then6: Mli,j,k,l = Msi,j,k,l7: end if8: Expand Position(i , j , k, l)9: for Wm = 1→ (λ− j) do

10: for Wn = (Wm − δ)→ (Wm + δ) do

11: Expand MB(i , j , k, l ,Wm,Wn)12: end for13: end for14: end for15: end for16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 6 / 16

Page 7: Foldalign 2.2: multithreaded structural RNA alignment

Parallel FoldalignParallel algorithm.

The multithreading algorithm is based on the fact that cells in the i loopmight be calculated in parallel. Some data dependency exist inside theloop, conditions are added to avoid race conditions.

Nthreads are created, each one have different i values:

Thread 1:

1: for i = LI → 1 step Nthreads do2: for k = LK → 1 do3: (...)

16: end for17: end for

Thread 2:

1: for i = (LI − 1)→ 1 step Nthreads do2: for k = LK → 1 do3: (...)

16: end for17: end for

(...)

Sundfeld et al Foldalign 2.2 19/02/2015 7 / 16

Page 8: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Single Thread:

1: for i = 20 do2: for k = 20 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

Page 9: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Single Thread:

1: for i = 20 do2: for k = 19 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

Page 10: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Single Thread:

1: for i = 20 do2: for k = ... do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

Page 11: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Single Thread:

1: for i = 20 do2: for k = 1 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

Page 12: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Single Thread:

1: for i = 19 do2: for k = 20 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

Page 13: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Single Thread:

1: for i = 19 do2: for k = 19 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 8 / 16

Page 14: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Thread 1:

1: for i = 20 step 2 do2: for k = 20 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 19 step 2 do2: for k = waiting do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

Page 15: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Thread 1:

1: for i = 20 step 2 do2: for k = 19 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 19 step 2 do2: for k = waiting do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

Page 16: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Thread 1:

1: for i = 20 step 2 do2: for k = 18 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 19 step 2 do2: for k = 20 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

Page 17: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Thread 1:

1: for i = 20 step 2 do2: for k = (...) do3: (...)

16: end for17: end for

Thread 2:

1: for i = 19 step 2 do2: for k = (...) do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

Page 18: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Thread 1:

1: for i = 18 step 2 do2: for k = 20 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 17 step 2 do2: for k = waiting do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

Page 19: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Thread 1:

1: for i = 18 step 2 do2: for k = 19 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 17 step 2 do2: for k = waiting do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

Page 20: Foldalign 2.2: multithreaded structural RNA alignment

FoldalignExecution sample.

Thread 1:

1: for i = 18 step 2 do2: for k = 18 do3: (...)

16: end for17: end for

Thread 2:

1: for i = 17 step 2 do2: for k = 20 do3: (...)

16: end for17: end for

Sundfeld et al Foldalign 2.2 19/02/2015 9 / 16

Page 21: Foldalign 2.2: multithreaded structural RNA alignment

Parallel FoldalignParallel algorithm.

Other mechanisms are necessary to the final implementation:

Threadpoll: To avoid thread creation overhead, a fixed number ofthreads are created and each thread wait for a i value.

Protection: The global memory need protection mechanisms toavoid simultaneous writes.

and other implementation details...

Sundfeld et al Foldalign 2.2 19/02/2015 10 / 16

Page 22: Foldalign 2.2: multithreaded structural RNA alignment

ResultsMultithreaded Programming: Expectation vs. Reality

Sundfeld et al Foldalign 2.2 19/02/2015 11 / 16

Page 23: Foldalign 2.2: multithreaded structural RNA alignment

ResultsRandom Sequences Performance

Sundfeld et al Foldalign 2.2 19/02/2015 12 / 16

Page 24: Foldalign 2.2: multithreaded structural RNA alignment

ResultsReal Sequences Performance

16S Ribosomal RNA from the Gutell websitehttp://www.rna.icmb.utexas.edu/(|S1| = 2315, |S2| = 2019, λ = 2019, δ = 100).

Sundfeld et al Foldalign 2.2 19/02/2015 13 / 16

Page 25: Foldalign 2.2: multithreaded structural RNA alignment

Conclusion

A multithreaded version of Foldalign is now available.The implementation were carefully designed to keep all previous functions,and it open new possibilities to search for structured RNAs in much longersequences in reasonable time, and in our experiments, it was possible tosearch 2.5-2.7 times faster with random sequences, and 4.18 times fasterwith 8 cores, 5.03 times faster with 14 cores.

Sundfeld et al Foldalign 2.2 19/02/2015 14 / 16

Page 26: Foldalign 2.2: multithreaded structural RNA alignment

Acknowledgements

Sundfeld et al Foldalign 2.2 19/02/2015 15 / 16

Page 27: Foldalign 2.2: multithreaded structural RNA alignment

Thank you!Questions?

Sundfeld et al Foldalign 2.2 19/02/2015 16 / 16