RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
MCMC sampling of RNA structureswith pseudoknots
Dirk Metzler
Johann Wolfgang Goethe-Universität Frankfurt am MainFachbereich Informatik und Mathematik
joint work with Markus Nebel, Universität Kaiserslautern
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Outline
1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming
2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots
3 Performance on DatatmRNASimulated Data
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Outline
1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming
2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots
3 Performance on DatatmRNASimulated Data
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
tmRNA of Escherichia Coli
GGGGCUGAUUCUGGAUUCGACGGGAUUUGCGAAACCCAAGGUGCAUGCCGAGGGGCGGUUGGCCUCGUAAAAAGCCGCAAAAAAUAGUCGCAAACGACGAAAACUACGCUUUAGCAGCUUAAUAACCUGCUUAGAGCCCUCUCUCCCUAGCCUCCGCUCUUAGGACGGGGAUCAAGAGAGGUCAAACCCAAAAGAGAUCGCGUGGAAGCCCUGCCUGGGGUUGAAGCGUUAAAACUUAAUCAGGCUAGUUUGUUAGUGGCGUGUCCGUCCGCAGCUGGCAAGCGAAUGUAAAGACUGACUAAGCAUGUAGUACCGAGGAUGUAGGAAUUUCGGACGCGGGUUCAACUCCCGCCAGCUCCACCA
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
tmRNA of Escherichia Coli
GGGGCUGAUUCUGGAUUCGACGGGAUUUGCGAAACCCAAGGUGCAUGCCGAGGGGCGGUUGGCCUCGUAAAAAGCCGCAAAAAAUAGUCGCAAACGACGAAAACUACGCUUUAGCAGCUUAAUAACCUGCUUAGAGCCCUCUCUCCCUAGCCUCCGCUCUUAGGACGGGGAUCAAGAGAGGUCAAACCCAAAAGAGAUCGCGUGGAAGCCCUGCCUGGGGUUGAAGCGUUAAAACUUAAUCAGGCUAGUUUGUUAGUGGCGUGUCCGUCCGCAGCUGGCAAGCGAAUGUAAAGACUGACUAAGCAUGUAGUACCGAGGAUGUAGGAAUUUCGGACGCGGGUUCAACUCCCGCCAGCUCCACCA
AG CGGC A C
C U
UUGG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
tmRNA of Escherichia Coli
GGGGCUGAUUCUGGAUUCGACGGGAUUUGCGAAACCCAAGGUGCAUGCCGAGGGGCGGUUGGCCUCGUAAAAAGCCGCAAAAAAUAGUCGCAAACGACGAAAACUACGCUUUAGCAGCUUAAUAACCUGCUUAGAGCCCUCUCUCCCUAGCCUCCGCUCUUAGGACGGGGAUCAAGAGAGGUCAAACCCAAAAGAGAUCGCGUGGAAGCCCUGCCUGGGGUUGAAGCGUUAAAACUUAAUCAGGCUAGUUUGUUAGUGGCGUGUCCGUCCGCAGCUGGCAAGCGAAUGUAAAGACUGACUAAGCAUGUAGUACCGAGGAUGUAGGAAUUUCGGACGCGGGUUCAACUCCCGCCAGCUCCACCA
GGGGCUG
A UUCU
GGA
UUCG
AC
GGGAUUUGCGA
AAC
CCA
AGG
UGC
AU G
CCG
A GG
GGCGGUU
GG
CCUC
GUA
A A AA G C C G C
AAAAAAUAGU C G C A A ACGACGAAAACUACGCUUUA
GCAGCUUA
AUA
ACCUG
CUUAG
A
GCC
CU
CU
CU
CC
CUA
GC
CUC
CG
C U CUU
AGG A
CGGGGAUCAAGAGAGG
UCA
AACCCAA A
AGAGAUCG
CG
UGG
A AG
CC
CUG C
CUG
GG
GU
UGA
AGC
GUUAAA
ACUUAAU
CAGGCUA
G U U U G U U A G UG G C G
U G UCC
GUCCGC
AGCUGGCAAGCGA
AUG
UA
AA
GAC
UGA
CU
AA
GCAUG
UA
GU
A
CCG
AGGAU
GUAGGAA
U U UCG
GACGCGGG U
UC
AACU
CC
CG
CCAGCUCC
ACCA
AG CGGC A C
C U
UUGG
RNAfoldImplementation ofZuker algorithm inVienna RNA Package
Zuker (1989)Zuker et al. (1999)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Stochastic Context-Free Grammar (SCFG)
Terminal SymbolsA,C,G,U
Non-Terminal Symbols
S, L, F
Rules with Probabilities
LS xLSy xր ր ր
S → L F → xFy L → axFyb
with x , y , a, b ∈ A,C,G,U
forbidden: xFy → xLy
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Stochastic Context-Free Grammar (SCFG)
Terminal SymbolsA,C,G,U
Non-Terminal Symbols
S, L, F
Rules with Probabilities
LS xLSy xր ր ր
S → L F → xFy L → axFyb
with x , y , a, b ∈ A,C,G,U
forbidden: xFy → xLy
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
S
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
S
S−>LS
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
L L L L L L L L L L L L L L L L L L S
S−>LS
L L L
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
L
S−>x
LL
L−>x
L L L L L L L L L L L L L L L L L L S
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
L
S−>x
LL
L−>x
a c g u a a g a u u a u g g c a u u a
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u aL L L
L−>axFyb
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u a
L−>axFybu a c g
g uauuacg
F FF
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u aua
g cu a
auc g
ugFFF F−>xFy
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u ac
u aau
c gug
F−>xFyF F F
uaga u a u g c
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u ac
u aau
c gug
F−>xFyua
F
F
F
ga u a u g ca
a cc c
gg
gg
uu
uu a
u a
u a
a u
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u ac
u aau
c gug
uaga u a u g ca
a cc c
gg
gg
uu
uu a
u a
u a
a uF
F
F
F−>xLSy
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u ac
u aau
c gug
uaga u a u g ca
a cc c
gg
gg
uu
uu a
a
u a
a u
F−>xLSy
L S
L S
L S
ua u
g c
c g
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u ac
u aau
c gug
uaga u a u g ca
a cc c
gg
gg
uu
uu a
a
u a
a u
ua uL
g cL
c gL
S−>LS
S
S
S
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u ac
u aau
c gug
uaga u a u g ca
a cc c
gg
gg
uu
uu a
a
u a
a u
ua u
g c
c g
S−>LS
S
S
SL
LLL L
LLLLL L L LL
L
L L
LL
L
LLL
LL LLLL
LL
L
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u ac
u aau
c gug
uaga u a u g ca
a cc c
gg
gg
uu
uu a
a
u a
a u
ua u
g c
c g
S−>x
acgu u ccg
cuc
ucgu c g u
c
uucuucg
c a aa
gcL
LL
L−>xL−>axFyb
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
a c g u a a g a u u a u g g c a u u ac
u aau
c gug
uaga u a u g ca
a cc c
gg
gg
uu
uu a
a
u a
a u
ua u
g c
c gac
gu u ccg
cuc
cgu c g u
uc
cuucg
c a a
u ucg
a
uuu
ua
a gg
aagu
cg
au
ccau
caa a
u
u
gg
aaaa
cc
ccug a
u cc
u ccug
ucg
gu
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Generating RNA structure from SCFG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Outline
1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming
2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots
3 Performance on DatatmRNASimulated Data
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Computing the Probability of a sequence
For given S = (s1, . . . , sn) ∈ a,c,g,un and probabilities ofgrammar rules
LS xLSy xր ր ր
S → L F → xFy L → axFyb
compute the probability that S is transformed into S.
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Computing the Probability of a sequence:partial problems
Φij(X ) := Pr(X is transformed into si , . . . , sj)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Computing the Probability of a sequence:partial problems
Φij(X ) := Pr(X is transformed into si , . . . , sj)
Aim: compute Φ1n(S) !
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Computing the Probability of a sequence:partial problems
Φij(X ) := Pr(X is transformed into si , . . . , sj)
Aim: compute Φ1n(S) !
Φij(F ) = Φi+1,j−1(F ) · Pr(F → xFy) · πsi sj
+∑
k
πsi sj · Pr(F → xLSy) · Φi+1,k (L) · Φk+1,j−1(S)
F
si sj
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Computing the Probability of a sequence:partial problems
Φij(X ) := Pr(X is transformed into si , . . . , sj)
Aim: compute Φ1n(S) !
Φij(F ) = Φi+1,j−1(F ) · Pr(F → xFy) · πsi sj
+∑
k
πsi sj · Pr(F → xLSy) · Φi+1,k (L) · Φk+1,j−1(S)
FF
si si+1 sjsj−1
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Computing the Probability of a sequence:partial problems
Φij(X ) := Pr(X is transformed into si , . . . , sj)
Aim: compute Φ1n(S) !
Φij(F ) = Φi+1,j−1(F ) · Pr(F → xFy) · πsi sj
+∑
k
πsi sj · Pr(F → xLSy) · Φi+1,k (L) · Φk+1,j−1(S)
F
si si+1 sjsj−1sk sk+1
L S
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Sampling a Structure from Posterior in SCFG model
Sample according to contribution to Φij(X )
S
s1 sn
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Sampling a Structure from Posterior in SCFG model
Sample according to contribution to Φij(X )
S
SL
s1 snsksk+1Pr(S → LS) · Φ1k (L) · Φk+1,n(S)
Φ1n(S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Sampling a Structure from Posterior in SCFG model
Sample according to contribution to Φij(X )
Pr(L → axFyb)·
·Φ3,k−2(F ) · πs1sk πs2sk−1
Φik (L)
S
S
SLF
shs3 sksk+1sk−2 sh+1
L
s1 sn
Pr(S → LS)·
·Φ1k (S) · Φk+1,n(L)
Φk+1,n(S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Sampling a Structure from Posterior in SCFG model
Sample according to contribution to Φij(X )
Pr(L → axFyb)·
·Φ3,k−2(F ) · πs1sk πs2sk−1
Φik (L)
S
S
SLF
shs3 sksk+1sk−2 sh+1
L
s1 sn
Pr(S → LS)·
·Φ1k (S) · Φk+1,n(L)
Φk+1,n(S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Sampling a Structure from Posterior in SCFG model
Parse TreeS S
S
S
SS
S
S
S
F
F
F
s2s3s4 s5 s6s7s8s9s10 sn−1. . .
L LL
L
LL
L
L
s1 sn
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Sampling a Structure from Posterior in SCFG model
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Sampling a Structure from Posterior in SCFG model
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Outline
1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming
2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots
3 Performance on DatatmRNASimulated Data
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Pseudoknots
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Pseudoknots
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Structure of Escherichia Coli tmRNA
picture stolen from tmRNA websitehttp://www.indiana.edu/∼tmrna/
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Structure of Escherichia Coli tmRNA
picture stolen from tmRNA websitehttp://www.indiana.edu/∼tmrna/
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Structure of Escherichia Coli tmRNA
RNAfold estimationGGGGCUG
A UUCU
GGA
UUCG
AC
GGGAUUUGCGA
AAC
CCA
AGG
UGC
AU G
CCG
A GG
GGCGGUU
GG
CCUC
GUA
A A AA G C C G C
AAAAAAUAGU C G C A A ACGACGAAAACUACGCUUUA
GCAGCUUA
AUA
ACCUG
CUUAG
A
GCC
CU
CU
CU
CC
CUA
GC
CUC
CG
C U CUU
AGG A
CGGGGAUCAAGAGAGG
UCA
AACCCAA A
AGAGAUCG
CG
UGG
A AG
CC
CUG C
CUG
GG
GU
UGA
AGC
GUUAAA
ACUUAAU
CAGGCUA
G U U U G U U A G UG G C G
U G UCC
GUCCGC
AGCUGGCAAGCGA
AUG
UA
AA
GAC
UGA
CU
AA
GCAUG
UA
GU
A
CCG
AGGAU
GUAGGAA
U U UCG
GACGCGGG U
UC
AACU
CC
CG
CCAGCUCC
ACCA
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Structure of Escherichia Coli tmRNA
RNAfold estimation Given on tmRNA WebsiteGGGGCUG
A UUCU
GGA
UUCG
AC
GGGAUUUGCGA
AAC
CCA
AGG
UGC
AU G
CCG
A GG
GGCGGUU
GG
CCUC
GUA
A A AA G C C G C
AAAAAAUAGU C G C A A ACGACGAAAACUACGCUUUA
GCAGCUUA
AUA
ACCUG
CUUAG
A
GCC
CU
CU
CU
CC
CUA
GC
CUC
CG
C U CUU
AGG A
CGGGGAUCAAGAGAGG
UCA
AACCCAA A
AGAGAUCG
CG
UGG
A AG
CC
CUG C
CUG
GG
GU
UGA
AGC
GUUAAA
ACUUAAU
CAGGCUA
G U U U G U U A G UG G C G
U G UCC
GUCCGC
AGCUGGCAAGCGA
AUG
UA
AA
GAC
UGA
CU
AA
GCAUG
UA
GU
A
CCG
AGGAU
GUAGGAA
U U UCG
GACGCGGG U
UC
AACU
CC
CG
CCAGCUCC
ACCA
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Structure of Escherichia Coli tmRNA
0 100 200 300
Given on tmRNA Webpage
Estimated by RNAfold
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Outline
1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming
2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots
3 Performance on DatatmRNASimulated Data
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Models with restricted types of pseudoknots
L. Cai, R. L. Malmberg and Y. Wu. (2003)Stochastic modeling of RNA pseudoknotted structures: agramatical approach. Bioinformatics 19: i66-i73.
E. Rivas and S. R. Eddy. (1999)A dynamic programming algorithm for RNA structureprediction including pseudoknots. J. Mol. Biol.285:2053-2068.
J. Reeder and R. Giegerich.(2004)Design, implementation and evaluation of a practicalpseudoknot folding algorithm based on thermodynamics.BMC Bioinformatics, 5:104.
... and several others
restrict the way pseuknots may intersect
compute THE BEST Structure for given RNA sequence
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Aims for our Pseudoknot Grammar / Method
a priori no restrictions on pseudoknot interactions
sampling structures from posterior distributions
efficient if high number of pseudoknots is unlikely
rather prior distribution than biologically meaningful model
simplicity!
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Aims for our Pseudoknot Grammar / Method
a priori no restrictions on pseudoknot interactions
sampling structures from posterior distributions
efficient if high number of pseudoknots is unlikely
rather prior distribution than biologically meaningful model
simplicity!
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Aims for our Pseudoknot Grammar / Method
a priori no restrictions on pseudoknot interactions
sampling structures from posterior distributions
efficient if high number of pseudoknots is unlikely
rather prior distribution than biologically meaningful model
simplicity!
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Aims for our Pseudoknot Grammar / Method
a priori no restrictions on pseudoknot interactions
sampling structures from posterior distributions
efficient if high number of pseudoknots is unlikely
rather prior distribution than biologically meaningful model
simplicity!
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Aims for our Pseudoknot Grammar / Method
a priori no restrictions on pseudoknot interactions
sampling structures from posterior distributions
efficient if high number of pseudoknots is unlikely
rather prior distribution than biologically meaningful model
simplicity!
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Combine SCFG with pseudoknots
LS xLSy xր ր ր
S → L F → xFy L → axFybց
Q
1 SCFG RNA with Q-Symbols2 random mating of Q-symbols3 Q-Q-pairs produce stems
QQQQQQQ
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Combine SCFG with pseudoknots
LS xLSy xր ր ր
S → L F → xFy L → axFybց
Q
1 SCFG RNA with Q-Symbols2 random mating of Q-symbols3 Q-Q-pairs produce stems
QQQQQQQ
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Combine SCFG with pseudoknots
LS xLSy xր ր ր
S → L F → xFy L → axFybց
Q
1 SCFG RNA with Q-Symbols2 random mating of Q-symbols3 Q-Q-pairs produce stems
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Notations
S: given Sequence
Q: Configuration of Q-stems
Ψ: SCFG Parse Tree
Ω = [Ψ,Q]: Stucture = (i , j) | Positions i and j are paired .
θ: Model parameters
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Notations
S: given Sequence
Q: Configuration of Q-stems
Ψ: SCFG Parse Tree
Ω = [Ψ,Q]: Stucture = (i , j) | Positions i and j are paired .
θ: Model parameters
Aims:
ComputeLS(θ) = Prθ(S) =
∑
Ψ,Q Prθ(S | Q, Ψ) · Prθ(Ψ) · Prθ(Q | Ψ)
Sample RNA Structure according toPr(Ω | S) =
∑
Ψ,Q : [Ψ,Q]=Ω Pr(Ψ,Q | S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Outline
1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming
2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots
3 Performance on DatatmRNASimulated Data
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
For fixed Q doable by dynamic programming
Compute Pr(Q | S) =∑
Ψ Pr(Ψ,Q | S)
Sample SCFG Parse Tree Ψ according to Pr(Ψ | Q,S)
Sample Structure Ω according to Pr(Ω | Q,S)
Computearg max
ΨPr(Ψ | Q,S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
For fixed Q doable by dynamic programming
Compute Pr(Q | S) =∑
Ψ Pr(Ψ,Q | S)
Sample SCFG Parse Tree Ψ according to Pr(Ψ | Q,S)
Sample Structure Ω according to Pr(Ω | Q,S)
Computearg max
ΨPr(Ψ | Q,S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
For fixed Q doable by dynamic programming
Compute Pr(Q | S) =∑
Ψ Pr(Ψ,Q | S)
Sample SCFG Parse Tree Ψ according to Pr(Ψ | Q,S)
Sample Structure Ω according to Pr(Ω | Q,S)
Computearg max
ΨPr(Ψ | Q,S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
For fixed Q doable by dynamic programming
Compute Pr(Q | S) =∑
Ψ Pr(Ψ,Q | S)
Sample SCFG Parse Tree Ψ according to Pr(Ψ | Q,S)
Sample Structure Ω according to Pr(Ω | Q,S)
Computearg max
ΨPr(Ψ | Q,S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Bayesian sampling of Ω
Strategy for sampling RNA structure Ω according to itsposterior probability Pr(Ω | S) for given RNA sequence S:
1 Sample Qi according to Pr(Q | S) by Markov-Chain MonteCarlo (MCMC) Method.
2 Sample Ψi according to Pr(Ψ | Qi ,S) by dynamicprogramming.
3 Then Ωi = [Ψi ,Qi ] is sample according to
Pr(Ω | S) =∑
Ψ,Q : [Ψ,Q]=Ω
Pr(Ψ,Q | S)
=∑
Ψ,Q : [Ψ,Q]=Ω
Pr(Ψ | Q,S) · Pr(Q | S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Bayesian sampling of Ω
Strategy for sampling RNA structure Ω according to itsposterior probability Pr(Ω | S) for given RNA sequence S:
1 Sample Qi according to Pr(Q | S) by Markov-Chain MonteCarlo (MCMC) Method.
2 Sample Ψi according to Pr(Ψ | Qi ,S) by dynamicprogramming.
3 Then Ωi = [Ψi ,Qi ] is sample according to
Pr(Ω | S) =∑
Ψ,Q : [Ψ,Q]=Ω
Pr(Ψ,Q | S)
=∑
Ψ,Q : [Ψ,Q]=Ω
Pr(Ψ | Q,S) · Pr(Q | S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Bayesian sampling of Ω
Strategy for sampling RNA structure Ω according to itsposterior probability Pr(Ω | S) for given RNA sequence S:
1 Sample Qi according to Pr(Q | S) by Markov-Chain MonteCarlo (MCMC) Method.
2 Sample Ψi according to Pr(Ψ | Qi ,S) by dynamicprogramming.
3 Then Ωi = [Ψi ,Qi ] is sample according to
Pr(Ω | S) =∑
Ψ,Q : [Ψ,Q]=Ω
Pr(Ψ,Q | S)
=∑
Ψ,Q : [Ψ,Q]=Ω
Pr(Ψ | Q,S) · Pr(Q | S)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Markov-Chain Monte Carlo (MCMC)
MCMC: construct Markov chain Q0,Q1,Q2, ... with stationarydistribution Pr(Q | S) and let it converge.
Metropolis-Hastings:Given current state Qi propose Q′ with Prob. p(Qi → Q′)Accept Qi+1 := Q′ with probability
min
1,p(Q′ → Qi) · Pr(Q′ | S)
p(Qi → Q′) · Pr(Qi | S)
otherwise Qi+1 := Qi
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Markov-Chain Monte Carlo (MCMC)
MCMC: construct Markov chain Q0,Q1,Q2, ... with stationarydistribution Pr(Q | S) and let it converge.
Metropolis-Hastings:Given current state Qi propose Q′ with Prob. p(Qi → Q′)Accept Qi+1 := Q′ with probability
min
1,p(Q′ → Qi) · Pr(Q′ | S)
p(Qi → Q′) · Pr(Qi | S)
otherwise Qi+1 := Qi
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Proposals for Qi+1
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Proposals for Qi+1
?
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Proposals for Qi+1
?
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Candidates for Pseudoknots
0 50 100 150 200 250 300 350
tmRNA of rice bacterium
position
0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0
SCFG stems
HSP scores
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Candidates for Pseudoknots
0 50 100 150 200 250 300 350
tmRNA of rice bacterium
position
0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0
weighted scores
SCFG stems
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Weight for Q-stem proposal
Proposal probability for HSP ∝1 − e(alignment score)·c1
max(SCFG stem probability), c2
c1 = 10−6, c2 = 10−5
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Looking for Optima
We search forarg max
[Ψ,Q]Pr([Ψ,Q] | S)
by simulated annealing.
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Outline
1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming
2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots
3 Performance on DatatmRNASimulated Data
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Model parameter values used
LS 0.85 xLSy 0.25 x 0.9ր ր ր
S F L → axFyb 0.06ց ց ց
L 0.15 xFy 0.75 Q 0.04
πCG = πGC = 27.5%πAT = πTA = 17.5%πGU = πUG = 5%
πA = 35%πC = 20%πG = 20%πU = 25%
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
0 50 100 150 200 250 300 350
Treponema pallidum pre−tmRNAposterior vs. most probable
position
0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
0 50 100 150 200 250 300 350
Treponema pallidum pre−tmRNAposterior vs. known
position
0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
0 50 100 150 200 250 300 350
Treponema pallidum pre−tmRNApredictions vs. real
position
pknotsRG
RNAfold & pknotsRGAll
McQFoldRNAfold
McQFold & RNAfoldMcQFold & pknotsRG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
0 50 100 150 200 250 300 350
tmRNA of rice bacteriumposterior vs. most probable
position
0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−1.0>=1.0
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
0 50 100 150 200 250 300 350
tmRNA of rice bacteriumposterior vs. known
position
0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
0 50 100 150 200 250 300 350
tmRNA of rice bacteriumpredictions vs. known
position
RNAfoldpknotsRG
RNAfold & pknotsRGAll
McQFold
McQFold & RNAfoldMcQFold & pknotsRG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Comparison with RNAfold and pknotsRG
351 tmRNA Sequeces of length> 200 fromhttp://www.indiana.edu/∼tmrna/
estimated estimatedto be paired not to be paired
in fact paired A ain fact not paired B b
correctness: A/(A + B)sensitivity: A/(A + a)
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
RNAfold pknotsRG RNAfold pknotsRG
0.0
0.2
0.4
0.6
0.8
1.0
tmRNAcorrectness sensitivity
McQFold McQFold
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
estimated pairing probabilities
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
tmRNA
estimated probabilities
obse
rved
rel
ativ
e fr
eque
ncie
s
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Outline
1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming
2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots
3 Performance on DatatmRNASimulated Data
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Simulated Data
200 folded sequences of length 300-460
generated according to our model
same parameters as above
McQFold uses same parameters
of course unfair against RNAfold and pknotsRG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Simulated Data
200 folded sequences of length 300-460
generated according to our model
same parameters as above
McQFold uses same parameters
of course unfair against RNAfold and pknotsRG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Simulated Data
200 folded sequences of length 300-460
generated according to our model
same parameters as above
McQFold uses same parameters
of course unfair against RNAfold and pknotsRG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Simulated Data
200 folded sequences of length 300-460
generated according to our model
same parameters as above
McQFold uses same parameters
of course unfair against RNAfold and pknotsRG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Simulated Data
200 folded sequences of length 300-460
generated according to our model
same parameters as above
McQFold uses same parameters
of course unfair against RNAfold and pknotsRG
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
RNAfold pknotsRG RNAfold pknotsRG
0.0
0.2
0.4
0.6
0.8
1.0
Simulated Datacorrectness sensitivity
McQFold McQFold
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Simulated Data
estimated probabilities
obse
rved
rel
ativ
e fr
eque
ncie
s
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Links
D. Metzler, M. Nebel (2006) Predicting RNA SecondaryStructures with Pseudoknots by MCMC Samplingsubmitted
Preprint:www.cs.uni-frankfurt.de/∼metzler/McQFold/McQFold.pdf
Homepage of McQFold Softwarewww.cs.uni-frankfurt.de/∼metzler/McQFold/
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Conclusions and Future Plans
Estimation of RNA structure from sequence can be veryuncertain.
Uncertainty should be assessed. This can be done byBayesian sampling.
Perhaps better to combine informations from relatedsequences.
Combine SCFGs with Q-stems to allow pseudoknots instructural alignments and/or structural sequence profiles.
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions
Conclusions and Future Plans
Estimation of RNA structure from sequence can be veryuncertain.
Uncertainty should be assessed. This can be done byBayesian sampling.
Perhaps better to combine informations from relatedsequences.
Combine SCFGs with Q-stems to allow pseudoknots instructural alignments and/or structural sequence profiles.
Thanks: Martina Fröhlich, Christian Färber, Markus Nebel,audience