multiply aligning rna sequences
DESCRIPTION
Multiply Aligning RNA Sequences. -RNA -Phylogeny -SAR -Re-Sequencing Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program. Open Questions in Multiple Sequence Alignments. Aligning Protein Sequences Aligning RNA Sequences. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/1.jpg)
Multiply Aligning RNA Sequences
-RNA-Phylogeny-SAR-Re-Sequencing
Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program
![Page 2: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/2.jpg)
Open Questions in Multiple Sequence Alignments
Aligning Protein Sequences Aligning RNA Sequences
![Page 3: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/3.jpg)
Accurately Aligning Protein Sequences
Remains Challenging with sequences less than 20% identity
These sequences can be structurally homologues Correct alignments can help discovering functional
sites Expresso/3D-Coffee is currently the most accurate
way of combining sequence and structural information
Available on www.tcoffee.org
![Page 4: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/4.jpg)
Comparing ncRNAs
![Page 5: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/5.jpg)
ncRNAs Comparison
And ENCODE said…“nearly the entire genome may be represented in primary transcripts that extensively overlap and include many non-protein-coding regions”
Who Are They?– tRNA, rRNA, snoRNAs, – microRNAs, siRNAs– piRNAs– long ncRNAs (Xist, Evf, Air, CTN, PINK…)
How Many of them– Open question– 30.000 is a common guess– Harder to detect than proteins
.
![Page 6: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/6.jpg)
Detecting ncRNAs in silico: a long way to go…
RNAse P (Not in ENCODE)
![Page 7: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/7.jpg)
Lizard ---GG--TGGAGACTAGTCTGAATTGGGTTATGAAG--CCA--Rat GGCGG--GGGAGAGTAGTCTGAATTGGGTTATGAGG--CCC--Hedgehog GACGG--GGGAGAGTAGTCTGAATTAGGTTATGGGG--CCC--Shrew GACGG-CGGGAGAGTAGTCTGAATTGGGTTATGAGG--CCC--Medaka GTGAG--TGGAGAGTAGTCTGAATTGGGT---------TCT--X.tropicalis AGCGG-CGGGAGAGTAGTCTGACTTGGGTTATGAGG--TGC--Cat GACGG--GGGAGAGTAGTCTGAATTGGGTTATGAGGCCCCC--Dog -------------------------------------------Rhesus GGCGG--GGGAGAGTAGTCTGAATTGGGTTATGAGG--TCC--Mouse GGCGG--GGGAGAGTAGTCTGAATTGGGTTATGAGG--CCC--Chimp GGCGG--AGGAGAGTAGTCTGAATTGGGTTATGAGG--TCC--Human GGCGG--AGGAGAGTAGTCTGAATTGGGTTATGAGG--TCC--TreeShrew GCGCG--GGGAGAGTAGTCTGAATTGGGTTATGAGG--CCC--
prediction
UCSC
RNAalifold
RFAM
Search (CMsearch)
Genome
RFAM
![Page 8: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/8.jpg)
Results for RNase P
Mammalian alignment
Vertebrate alignment
Structure Results
UCSC Predicted Nothing
RFAM Predicted Nothing
UCSC RFAM Nothing
RFAM RFAM OK
UCSC Predicted Nothing
RFAM Predicted Nothing
UCSC RFAM OK
RFAM RFAM OKMatthias Zytneki
![Page 9: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/9.jpg)
Results for RNase PBetter Alignments = Better Predictions
Matthias ZytnekiThomas DerrienRoderic GuigoRamin Shiekhattar
QualitativeImprovement
QuantitativeImprovement
![Page 10: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/10.jpg)
ncRNAs can have different sequences and Similar Structures
![Page 11: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/11.jpg)
ncRNAs Can Evolve Rapidly
CCAGGCAAGACGGGACGAGAGTTGCCTGGCCTCCGTTCAGAGGTGCATAGAACGGAGG**-------*--**---*-**------**
GAACGGACC
CTTGCCTGG
GG
AAC CA
CGG
AG
AC G
CTTGCCTCC
GAACGGAGG
GG
AAC CA
CGG
AG
AC G
![Page 12: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/12.jpg)
ncRNAs are Difficult to Align
Same Structure Low Sequence Identity
Small Alphabet, Short Sequences Alignments often Non-Significant
![Page 13: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/13.jpg)
Obtaining the Structure of a ncRNA is difficult
Hard to Align The Sequences Without the Structure
Hard to Predict the Structures Without an Alignment
![Page 14: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/14.jpg)
The Holy Grail of RNA Comparison:Sankoff’ Algorithm
![Page 15: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/15.jpg)
The Holy Grail of RNA ComparisonSankoff’ Algorithm
Simultaneous Folding and Alignment
– Time Complexity: O(L2n)– Space Complexity: O(L3n)
In Practice, for Two Sequences:
– 50 nucleotides: 1 min. 6 M.– 100 nucleotides 16 min. 256 M.– 200 nucleotides 4 hours 4 G.– 400 nucleotides 3 days 3 T.
Forget about– Multiple sequence alignments– Database searches
![Page 16: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/16.jpg)
The next best Thing: Consan
Consan = Sankoff + a few constraints
Use of Stochastic Context Free Grammars
– Tree-shaped HMMs– Made sparse with constraints
The constraints are derived from the most confident positions of the alignment
Equivalent of Banded DP
![Page 17: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/17.jpg)
Going Multiple….
Structural Aligners
![Page 18: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/18.jpg)
Game Rules
Using Structural Predictions– Produces better alignments– Is Computationally expensive
Use as much structural information as possible while doing as little computation as possible…
![Page 19: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/19.jpg)
Adapting T-Coffee To
RNA Alignments
![Page 20: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/20.jpg)
T-Coffee and Concistency…
![Page 21: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/21.jpg)
T-Coffee and Concistency…
![Page 22: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/22.jpg)
T-Coffee and Concistency…
![Page 23: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/23.jpg)
T-Coffee and Concistency…
![Page 24: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/24.jpg)
Consistency: Conflicts and Information
X
Y
X
Z
Y
W Z
X
Z
Y
ZW
Y
W
X
Z
Y
Z
X
WY
Z
X
W
Partly Consistent
Less Reliable
Fully Consistent
More Reliable
Y-Z is unhappy X-W is unhappy
X
Y
![Page 25: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/25.jpg)
R-Coffee: Modifying T-Coffee at the Right Place
Incorporation of Secondary Structure information within the Library
Two Extra Components for the T-Coffee Scoring Scheme
– A new Library– A new Scoring Scheme
![Page 26: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/26.jpg)
RNA Sequences
Secondary Structures
Primary Library
R-Coffee ExtendedPrimary Library
Progressive AlignmentUsing The R-Score
RNAplfoldConsan
orMafft / Muscle / ProbCons
R-CoffeeExtension
R-Score
![Page 27: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/27.jpg)
CC
R-Coffee Extension
GG
TC Library
G G Score XC C Score Y
CC
GG
Goal: Embedding RNA Structures Within The T-Coffee Libraries The R-extension can be added on the top of any existing method.
![Page 28: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/28.jpg)
CC
R-Coffee Scoring Scheme
GG
R-Score (CC)=MAX(TC-Score(CC), TC-Score (GG))
![Page 29: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/29.jpg)
Validating R-Coffee
![Page 30: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/30.jpg)
RNA Alignments are harder to validate than Protein Alignments
Protein Alignments Use of Structure based Reference Alignments
RNA Alignments No Real structure based reference alignments– The structures are mostly predicted from
sequences– Circularity
![Page 31: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/31.jpg)
BraliBase and the BraliScore
Database of Reference Alignments
388 multiple sequence alignments.
Evenly distributed between 35 and 95 percent average sequence identity
Contain 5 sequences selected from the RNA family database Rfam
The reference alignment is based on a SCFG model based on the full Rfam seed dataset (~100 sequences).
![Page 32: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/32.jpg)
BraliBase SPS Score
RFam MSA
Number of Identically Aligned PairsSPS=Number of Aligned Pairs
![Page 33: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/33.jpg)
BraliBase: SCI Score
RNApfold
(((…)))…((..)) G Seq1(((…)))…((..)) G Seq2(((…)))…((..))G Seq3(((…)))…((..)) G Seq4(((…)))…((..)) G Seq5(((…)))…((..)) G Seq6
RNAlifold
(((…)))…((..)) ALN G
Average G Seq X Cov
G ALN
SCI=
Covariance
![Page 34: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/34.jpg)
BRaliScore
Braliscore= SCI*SPS
![Page 35: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/35.jpg)
R-Coffee + Regular Aligners
Method Avg Braliscore Net Improv.direct +T +R +T +R
-----------------------------------------------------------Poa 0.62 0.65 0.70 48 154Pcma 0.62 0.64 0.67 34 120Prrn 0.64 0.61 0.66 -63 45ClustalW 0.65 0.65 0.69 -7 83Mafft_fftnts 0.68 0.68 0.72 17 68ProbConsRNA 0.69 0.67 0.71 -49 39Muscle 0.69 0.69 0.73 -17 42Mafft_ginsi 0.70 0.68 0.72 -49 39-----------------------------------------------------------
Improvement= # R-Coffee wins - # R-Coffee looses
![Page 36: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/36.jpg)
RM-Coffee + Regular Aligners
Method Avg Braliscore Net Improv.direct +T +R +T +R
-----------------------------------------------------------Poa 0.62 0.65 0.70 48 154Pcma 0.62 0.64 0.67 34 120Prrn 0.64 0.61 0.66 -63 45ClustalW 0.65 0.65 0.69 -7 83Mafft_fftnts 0.68 0.68 0.72 17 68ProbConsRNA 0.69 0.67 0.71 -49 39Muscle 0.69 0.69 0.73 -17 42Mafft_ginsi 0.70 0.68 0.72 -49 39-----------------------------------------------------------RM-Coffee4 0.71 / 0.74 / 84
![Page 37: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/37.jpg)
R-Coffee + Structural Aligners
Method Avg Braliscore Net Improv.direct +T +R +T +R
-----------------------------------------------------------Stemloc 0.62 0.75 0.76 104 113Mlocarna 0.66 0.69 0.71 101 133Murlet 0.73 0.70 0.72 -132 -73Pmcomp 0.73 0.73 0.73 142 145T-Lara 0.74 0.74 0.69 -36 -8Foldalign 0.75 0.77 0.77 72 73-----------------------------------------------------------Dyalign --- 0.63 0.62 --- ---Consan --- 0.79 0.79 --- --------------------------------------------------------------RM-Coffee4 0.71 / 0.74 / 84
![Page 38: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/38.jpg)
How Best is the Best….
M-Locarna 234 *** 183 **
Stral 169 *** 62
FoldalignM 146 61
Murlet 130 * -12
Rnasampler 129 * -27
T-Lara 125 * -30
Poa 241 *** 217 ***
T-Coffee 241 *** 199 ***
Prrn 232 *** 198 ***
Pcma 218 *** 151 ***
Proalign 216 *** 150 **
Mafft fftns 206 *** 148 *
ClustalW 203 *** 136 ***
Probcons 192 *** 128 *
Mafft ginsi 170 *** 115
Muscle 169 *** 111
Methodvs. R-Coffee-Consan
vs. RM-Coffee4
![Page 39: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/39.jpg)
Range of Performances
Effect of Compensated Mutations
![Page 40: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/40.jpg)
Split Alignments and RNA
Few of the new long RNAs are reported with a secondary structure
Two explanations– They do not have a secondary structure– It is hard to predict the structure
To predict the structure– One needs an Homologues to build an MSA
To find homologues one needs to find them
![Page 41: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/41.jpg)
Split Alignments and RNA
-Protein Split Alignments-Guided by Primary structure
Transcript
genome
![Page 42: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/42.jpg)
Split Alignments and RNA
CCAGGCAAGACGGGACGAGAGTTGCCTGG
CCTCCGTTC AGAGGTGCATA GAACGGAGG
![Page 43: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/43.jpg)
Split Alignments and RNA
Homology appears through secondary structures
One needs to evaluate all possible secondary structures
Very computationaly intensive
![Page 44: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/44.jpg)
Conclusion/Future Directions
T-Coffee/Consan is currently the best MSA protocol for ncRNAs
Testing how important is the accuracy of the secondary structure prediction
Going deeper into Sankoff’s territory: predicting and aligning simultaneously
Solving the split alignment problem
![Page 45: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/45.jpg)
www.tcoffee.org
Credits and Web Servers
Andreas Wilm (UCD) Des Higgins (UCD) Sebastien Moretti (SIB) Ioannis Xenarios (SIB) Matthias Zytneki (CRG) Thomas Derrien (CRG) Roderic Guigo (CRG) Ramin Shiekhattar (CRG)
CGR, SIB, UCD
![Page 46: Multiply Aligning RNA Sequences](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813ff4550346895dab10ca/html5/thumbnails/46.jpg)