the rna world, fitness landscapes, and all thatthe rna world, fitness landscapes, and all that peter...
TRANSCRIPT
![Page 1: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/1.jpg)
The RNA World, Fitness Landscapes, and all that
Peter F. Stadler
Bioinformatics Group, Dept. of Computer Science & Interdisciplinary Center forBioinformatics, University of Leipzig
RNomics Group, Fraunhofer Institute IZI, LeipzigInstitute for Theoretical Chemistry, Univ. of Vienna (external faculty)
The Santa Fe Institute (external faculty)
FOGA07, Cd. Mexico, Jan 2007
![Page 2: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/2.jpg)
Why RNA?
until relatively recently:Central Dogma of Molecular BiologyDNA → RNA → ProteinDNA = “genetic memory”, RNA = working copy, proteins dothe work
around 1980: discovery of catalytic RNAs (Nobelprize for TomCech and Sidney Altman)nevertheless long considered “exotic” remnants from theancient RNA world=⇒ RNA World Hypothesis for the Origin of Life
around 2000: structure of the ribosome showns that theribosome is an “RNA enzyme”
around 2000: microRNAs are discovered as a large class ofregulatory RNAs that inhibit translation of proteins
2006: the ENCODE project shows that human geneexpression is quite different from textbook knowledge
![Page 3: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/3.jpg)
RNA Bioinformatics
RNA Secondary Structures are an appropriate level of description
explain the thermodynamics of RNA Structures
often highly conserved in evolution
can be computed efficiently
![Page 4: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/4.jpg)
Many Functional RNAs are Structured
(a) Group I intron P4-P6 domain(b) Hammerhead ribozyme(c) HDV ribozyme
(d) Yeast tRNAphe
(e) L1 domain of 23S rRNA
Hermann & Patel, JMB 294, 1999
![Page 5: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/5.jpg)
The RNA Model
GCGGGAAU
AGCUC
AGUUGG U A
G A G CA
CGA
CC
UU
GC C
AAGGUCGGGGU
CG C G A G
U U CGA
GUCUCGU
UUCCCGC
UC
CA
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
![Page 6: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/6.jpg)
Formal Definition
A secondary structure on a sequence s is a collection of pairs (i , j)with i < j such that
Base pairing rules are respected, i.e., (i , j) ∈ Ω implies (si , sj)form an allowed pair (GC, CG, AU, UA, GU, UG)
Each base is involved in at most one pair, i.e., Ω is amatching, (i , j), (i , k) ∈ Ω implies j = k and (i , k), (j , k) ∈ Ωin implies i = j .
(i , j)Ω implies |j − i | > 3 (sterical constraint)
No-crossing rule: (i , j), (k, l) ∈ Ω and i < k implies eitheri < k < l < j or i < j < k < l .This excludes so-called pseudoknots
![Page 7: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/7.jpg)
Let’s count the structures . . .
Counting secondary structures. Given a sequence of length n.Πkl = 1 if sequence positions k, l can form a pair GC, CG, AU,
UA, GU, UG and Πkl = 0 otherwise.Nkl = number of structures of the subsequence from k to l .Basic recursion:
• xxxxxxx +∑
(xxxx)xxxx
Nkl = Nk+1,l +l
∑
j=k+m
ΠkjNk+1,j−1Nj+1,l
![Page 8: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/8.jpg)
RNA Folding in a nutshell
i jj i i+1 j i i+1 k−1 k k+1|=
Nij = Ni+1,j +∑
k(i,k)pair
Ni+1,k−1Nk+1,j
Eij = min
Ei+1,j + mink
(i,k)pair
(Ei+1,k−1 + Ek+1,j + εik)
Zij = Zi+1,j +∑
k(i,k)pair
Zi+1,k−1Zk+1,j exp(−εik/RT )
Partition function: Z =∑
Ω exp(−E (Ω)/RT )
![Page 9: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/9.jpg)
A word on the Partition Function
The partition function is the link between the combinatorics of thestructures (in general: states in an ensemble) and thethermodynamic properties of the physical ensemble, e.g.:
Free energy G = −RT lnZ
Expected Energy 〈E 〉 = RT 2 ∂ lnZ∂T
Heat Capacity Cp = −T ∂2G∂T 2
0 20 37 100
0
1
2
3
4
5
CUGUAUUGUUGUAUAGCCCGUGUGGUAAUAUGG
T [C]
C(T
) [k
cal•(
mol
•K)-1
]
![Page 10: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/10.jpg)
Realistic Energy Model
GC
U
A
C
G
closing base pair
A
5
3
3
5
A
5
3
closing base pair
interior base pairsclosing base pair
3
C
U
A
A
U G
C
closing base pair
multi-loop
interior base pair
A U
interior base pair
interior base pair
hairpin loop
bulge
C
G
C U
A
closing base pair
stacking pair
interior loop
G
G
G
3
5A
G
C
CA
5
Parameters from large number of melting experiments by Douglas
Turner, David Matthews, John Santa Lucia, and others
![Page 11: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/11.jpg)
Recursions for Linear RNAs
M1 M1
M1
i u u+1|
MC
= |
= | |
FC
i j i+1 j i
hairpin Cinterior
i j i i k l j
k k+1 j
=
C
F F
i j
M
=i ij j−1
j
j| C
i j
M
ui+1 u+1i j−1 j
j i
M
j−1 ji u u+1|C
![Page 12: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/12.jpg)
Folding Kinetics
RNA molecules may have kinetic traps which prevent them from reachingequilibrium within the lifetime of the molecule. Long molecules are oftentrapped in such meta-stable states during transcription.Possible solutions are
Stochastic folding simulations can predict folding pathways and finalstructures. Computationally expensive, few programs available.
Predicting structures for growing fragments of the sequence canshow whether large scale re-folding will occur during transcription.Cheap but inaccurate.
Analysis of the energy landscape based on complete suboptimalfolding can identify possible traps (local minima).
![Page 13: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/13.jpg)
Kinetic Folding Algorithm
Simulate folding kinetics by a Monte-Carlo type algorithm:Generate all neighbors using the move-setAssign rates to each move, e.g.
Pi = min
1, exp
(
−∆E
kT
)
Select a move with probability proportional toits rateAdvance clock 1/
∑
i Pi .
P4
P3
P5P6
P7
P8
P1 P2
![Page 14: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/14.jpg)
Characterization of Landscapes
A landscape consists of a configuration space V , a move set within thatconfiguration space and an energy function f : V → R.Simplest move set for secondary structure: opening and closing of basepairs.Speed of optimization depends on the roughness of the Landscape.Measures of roughness suggested in the literature:
Number of local optima
Correlation lengths (e.g. along a random walk)
Lengths of adaptive walks
Folding temperature vs. glass temperature Tf /Tg
Energy barriers between the local optima. Especially, themaximum barrier height (“depth” in SA literature)
![Page 15: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/15.jpg)
Ruggedness
Bryce Canyon, UT Capulin Volcano, NMrugged “smooth”
![Page 16: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/16.jpg)
Measures of Ruggedness
Number of Local Minima and Maxima
Length of Adaptive Walks
Correlation Functions:Random walk on V as defined through X : x0, x1, x2, . . .(Markov process on V ).=⇒ “Time Series” f (x0), f (x1), f (x2), . . .=⇒ Autocorrelation Function
r(s) =
⟨
(
f (xt+s ) − 〈f 〉) (
f (xt) − 〈f 〉)
⟩
t⟨
(
f (xt) − 〈f 〉)2
⟩
t
![Page 17: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/17.jpg)
For simplicity:
f (x) → f (x) − f f =1
|V |
∑
x∈V
f (x)
(Subtract the landscape average)Random Walk = Markov process with transition matrix T
Theorem. r(s) = 〈f ,Ts f 〉/
〈f , f 〉.Theorem. r(s) is an exponential function iff f is an eigenfunctionof T. We say (V , f ,T) is an elementary landscape.For d-regular graphs: Relation with graph Laplacians
−∆ = A − D = d(T − I )
Elementary landscapes are eigenfunctions of the Graph Laplacian
![Page 18: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/18.jpg)
Amplitude Spectra
Let ϕk be an orthonormal basis of eigenvectors of T. We will beinterested in the “Fourier Transform” ak where
f (x) =∑
k
akϕk(x)
Theorem. var[f ] := 1|V |
∑
x f (x)2 =∑
k 6=0 |ak |2.
Collect terms that belong to the same eigenspace:
Bp =1
var[f ]
∑
k:−∆ϕk=Λpϕk
|ak |2
“Amplitude Spectrum of f ”.
![Page 19: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/19.jpg)
Quantities derived from Amplitude Spectra
Autocorrelation Function:
r(s) =∑
p 6=0
Bp(1 − Λp/D)s
Correlation length
ℓ =
∞∑
s=0
r(s) = D∑
p 6=0
Bp/Λp
Elementary landscapes:f is elementary if and only if Bp = 1 for a single p, and 0otherwise
![Page 20: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/20.jpg)
Some Elementary Landscapes
Problem Move Set D Λ ℓ/n
NAES Hamming n 4 1/4p-spin Hamming n 2p 1/(2p)WP Hamming n 4 1/4GC Hamming (α − 1)n 2α (1 − 1/α)/2XY-Hamiltonian Hamming (α − 1)n 2α (1 − 1/α)/2
cyclic 2n 8 sin2(π/α) 1/[4 sin2(π/α)]
GBP Exchange n2/4 2(n − 1) 1/8 · n/(n − 1)symmetric TSP Transposition n(n − 1)/2 2(n − 1) 1/4
Inversions n(n − 1)/2 n 1 − 1/n)/4GMP Transposition n(n − 1)/2 2(n − 1) n/4
NAES = Non-All-Equal-Satisfiability, WP = Weight Partition, GC = Graph Coloring with α colors, GBP = Graph
Bipartitioning, TSP = Traveling Salesman Problem, GMP = Graph Matching Problem XY-Hamiltonian:
P
i<j Jij cos( 2π
α(xi − xj ) )
![Page 21: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/21.jpg)
Amplitude Spectra of RNA Landscapes
5 10 15 20 25Interaction Order p
0
0.1
0.2
0.3
0.4
0.5
Am
plitu
de S
pect
rum
Bp
Ensemble Free Energies
12345678910111213141516171819202122232425
5 10 15 20 250
0.1
0.2
0.3
0.4
0.5A
mpl
itude
Spe
ctru
m B
p
Ground State Energies
5 10 15 20 25Interaction Order p
Distance to Maximal Hairpin5 10 15 20 25
Distance to Open Structure
![Page 22: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/22.jpg)
Amplitude Spectra of RNA Landscapes
6 8 10 12
Interaction Order p
0
0.1
0.2
0.3
0.4
0.5
Am
plitu
de S
pect
rum
B(p
)
Ensemble Free Energies6 8 10 12
0
0.1
0.2
0.3
0.4
0.5A
mpl
itude
Spe
ctru
m
B(p
)
Groundstate Energies
6 8 10 12
Interaction Order p
123456789101112Distance to Maximal Hairpin
6 8 10 12
Distance to Open Structure
![Page 23: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/23.jpg)
Neutrality
Degree of neutrality E[νx ]is the expected number of neutral neighbors of x .Suppose µ0 > 0. Then
E[νx ] =∑
y :y∈∂x
µcx (y)
wherecx(y) =
∣
∣j |θj (x) 6= θj(y)∣
∣ y ∈ ∂x
Similar equations can be obtained for E[νxνy ].=⇒ Correlation length ℓ and degree of neutrality νx can be tunedindependently of each other in short range p-spin models.
![Page 24: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/24.jpg)
Energy barriers
E [s,w ] = min
max[
f (z)∣
∣z ∈ p]
∣
∣
∣
∣
p : path from s to w
,
B(s) = min
E [s,w ] − f (s)∣
∣w : f (w) < f (s)
Depth and Difficulty(borrowed from simulated annealing theory)
D = max
B(s)∣
∣s is not a global minimum
ψ = max
B(s)
f (s) − f (min)
∣
∣
∣
∣
s is not a global minimum
![Page 25: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/25.jpg)
Energy Barriers and Barrier Trees
Some topological definitions:A structure is a
local minimum if its energy is lowerthan the energy of all neighbors
local maximum if its energy ishigher than the energy of all
neighbors
saddle point if there are at leasttwo local minima that can bereached by a downhill walk startingat this point
![Page 26: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/26.jpg)
Calculating barrier trees
M1
M2
M3
S12
S23
M3
M3
M1
M1
M2
M2
S12
S12
S23
S23
M3
M1
M2
M3
M1
M2
S23
M3
M1
M2
S12
S23
The flooding algorithm:Read conformations in energy sortedorder.For each confirmation x we havethree cases:
(a) x is a local minimum if it hasno neighbors we’ve alreadyseen
(b) x belongs to basin B(s), if allknown neighbors belong toB(s)
(c) if x has neighbors in severalbasins B(s1) . . .B(sk) then it’sa saddle point that merges
these basins.Basins B(s1), . . . ,B(sk) arethen united and are assigned tothe deepest of local minimum.
![Page 27: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/27.jpg)
Information from the Barrier Trees
Local minima Saddle points Barrier heights Gradient basins Partition functions and free energies of (gradient) basins Depth and Difficulty of the landscape
N.B.: A gradient basin is the set of all initial points from which a
gradient walk (steepest descent) ends in the same local minimum.
![Page 28: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/28.jpg)
Energy Landscape of a Toy Sequence
G C U A U U A
GC
GC
G
UG
A
CG
UG
CG
U
UU
A
G C U A U UCG
UG
CG
U
UU
A
C
C
G
G
G
UG
A
A G C U A U UCG
UG
CG
U
UU
A
C C G G A
G
UG
A
G C U A U UCG
UG
CG
U
UU
A
C
C
A
G
UGGGA
G C U A G UCG
UG
CG
U
UU
A
G G G
AC
CU
U
A
G C G U G GCG
UG
CG
U
UU
A
G A
AU
C
CU
U
A
G U G G G ACG
UG
CG
U
UU
A
GC
AU
C
CU
U
A
G U
UG
CG
U
UU
A GC
AU
C
CU
U
A
C G G GG
G U
CG
U
UU
A
GC
AU
C
CU
U
A
C G
G GUGG
G U
GC
AU
C
CU
U
A
C G
GU
C GUUUAGGG
8 9 10
1 2 3
4567
A A
G U
GC
AU
C
CU
U
A
C G
GU
C G
UUUAGGG
11
U AA
Steps [arbitrary]
−8
−6
−4
−2
0
2
Ene
rgy
[kca
l/mol
]
1
2
3
4
5
6
7
8
9
10
11
2.45[5]
[9]3.80 1 [11]
1.60 3 [7]
13
141.30 9
1.40 192.10 11
3.80 2 [1][3] 12
1.20 8 [4]
173.60 4
2.20 73.61 5
1.40 161.90 10
1.50 152.00 205.02 6
3.90 18E
![Page 29: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/29.jpg)
Folding Kinetics
Transition rates from x to y :
ryx = r0e−
E6=yx−E(x)
RT for x 6= y
rxx = −∑
y 6=x
ryx
Kinetics as a Markov process:
dpx
dt=
∑
y∈X
rxypy (t) .
Transition states:E 6=
yx = maxE (x),E (y)
or more complex models (Tacker et al 1994, Schmitz et al 1996)
![Page 30: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/30.jpg)
Reduced Description of the Folding Dynamics
Macrostates = Classes of a partition of the state space.Partition function for a macro state:
Zα =∑
x∈α
exp(−E (x)/RT )
Free energy of a macro state:
G(α) = −RT ln Zα
rβα =∑
y∈β
∑
x∈α
ryxProb[x |α] for α 6= β
=1
Zα
∑
y∈β
∑
x∈α
ryxe−E(x)/RT
rβα “on flight” while executing the barriers program.Transition state free energy:
G6=βα = −RT ln
∑
y∈β
∑
x∈α
e−E6=xy
RT
![Page 31: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/31.jpg)
12
3
4
5
6
7
8
910
11
12
13
14
0.0
2.0
4.0
6.0
1.2
1.5
2.1
2.8
1.1
0.9
2.4
2.8
0.5 1.3
4.7
0.9
1.2
0.8
2.0
lillyA simple model sequence
10-1
100
101
102
103
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
mfe23456
10-1
100
101
102
103
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
mfe23456
10-1
100
101
102
103
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
mfe23456
![Page 32: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/32.jpg)
103
104
105
106
107
108
109
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
101
102
103
104
105
106
107
108
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
mfe255680
Refolding of a tRNA molecule.
![Page 33: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/33.jpg)
Summary I:
RNA structures can be computed efficiently by means ofdynamic programming
Computations are based on a set of carefully measures energyparameters and an additive energy model
Algorithms exist for ground state energy and structure, fullpartition functions, density of states, interacting structures,. . .
The folding kinetics of a given RNA Sequence can also beinvestigated as the level of secondary structures
VIENNA RNA PACKAGE
![Page 34: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/34.jpg)
PART II: How Do RNAs Evolve
Basic Assumption
Selection Acts on Secondary Structures, Mutations acts on theunderlying sequences⇒ We need to understand the sequence-to-structure map of RNAs(hang on, we’ll discuss the empirical evidence for that a bit later)
![Page 35: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/35.jpg)
Sewall Wright’s Fitness Landscapes
Fitn
ess
Phenotype
How do realistic fitness landscapes look like?
![Page 36: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/36.jpg)
Biological Landscapes
The RNA case is a special case of a very general paradigm:
genotype 7→ phenotype 7→ fitness
What is the relationship between Genotyp and Phenotype?
Central topic in any theory of evolutionbecause:* Selection acts on the Phenotype* Mutation/Recombination acts on the GenotypeBiopolymers as the simplest model:The molecule is both genotype (sequence) and phenotype (structure).
The map from genotype to genotype is determined by physical chemistry:
⇐⇒ folding problem
![Page 37: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/37.jpg)
Computational Analysis of the RNA Map
There are many more sequences than structures.(.)-string: 3-letters (with constraints)
=⇒ less than 3n structures
but 4n sequences.
=⇒ Redundancy
How are sequences folding into the same structure distributed insequence space?Neutral Set S(ψ) = x ∈ Qn
α|f (x) = ψ
![Page 38: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/38.jpg)
Sensitivity and Neutrality
GCGGGAAU
AGCUC
AGUUGG U A
G A G CA
CGA
CC
UU
GC C
AAGGUCGGGGU
CG C G A G
U U CGA
GUCUCGU
UUCCCGC
UC
CA
GCGGGUAUA
GCUCAGU
UGG U A
G A G CA C G
A C CUU G CC A A
G GU
C G G G GU CG C G A G
U U CGA
GUCUCGU
UUCCCGCUCC
A
Effect of a single
point mutation0 100 200 300
Structure Distance
10-4
10-3
10-2
10-1
100
Fre
quen
cy
Distribution of structure distances
![Page 39: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/39.jpg)
The Random Graph Model
Approach:Model S(ψ) as a random induced subgraph Γ with a given value
λ =〈#neutral neighbors〉
(α− 1)n
Threshold value:
λ∗ = 1 −
(
1
α
)1
α−1
Theorem. [Reidys, Stadler, Schuster]If λ > λ∗ then Γ is a.s. dense and connected,if λ < λ∗ then Γ is a.s. neither dense nor connected
![Page 40: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/40.jpg)
A complication: Base Pairing Rules
Unpaired bases:Alphabet A = A,U,G,C
Paired bases: 5’ and 3’ side correlated:Alphabet: B = AU,UA,GC,CG,GU,UG, .
Thus consider only the set of compatible sequences C (ψ):S(ψ) ⊆ C (ψ) ≡ Qnu
4 ×Qnp
6 .=⇒ Two neutrality parameters λu and λp
![Page 41: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/41.jpg)
Connected Components of Neutral Networks
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0λu
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
λp
gray many small components red 1 connected componentgreen 2 equal sized components yellow 3 components size 2:1blue 4 equal sized components
Explanation: for this deviation from the random graph model in terms of the energy model. Some structures can
be made only with a significant bias in the G/C ratio.
![Page 42: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/42.jpg)
x??P (d)
Length of Neutral Path: d !0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0
0.250.200.150.100.050.00
0 20 40 60 80 100 120
Chain lenght n
0
5
10
15
20
25
30
Cov
erin
g ra
dius
from enumeration from inverse folding lower bound
Distance to Target structure Covering radiuslength neutral paths
![Page 43: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/43.jpg)
Closest Approach
Intersection Theorem. For any two secondary structures φ, andψ holds
C (φ) ∩ C (ψ) 6= ∅
What is the distance of neutral networks
δ(φ,ψ) = mind(x , y)|f (x) = φ and f (y) = ψ
Random graph Theory: If λ > λ∗ then δ(φ,ψ) ≈ 2.Computer simulations: upper bounds on δ(φ,ψ):
n GC AU AUGC
50 5.6 2.6 2.170 9.3 4.6 3.4100 13.0 7.8 5.6
![Page 44: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/44.jpg)
AccessibilityFontana & Schuster 1998
Idea: The “interface” between two structures is large is they are“similar”.More precisely: Structure ψ is accessible for φ if x ∈ S(φ) is like tohave neighbor (mutant) x ′ ∈ S(ψ).Structural characterization of “easy” (continuous) transitions:
Shorteningof stacks
Elongationof stacks
Opening ofconstrained stacks
Closing ofconstrained stacks
![Page 45: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/45.jpg)
Summary II: Sequence-Structure Map of RNA
1. Redundancy: Many more sequences than structures
2. Sensitivity: Small changes in the sequences may lead to largechanges in the structure
3. Neutrality: A substantial fraction of mutations does not alter thestructure.
4. Isotropy: S(ψ) is “randomly” embedded in C (ψ).
Implications:
1. Neutral Networks: S(ψ) forms a connected “percolating” network insequence space for all “common” structures.
2. Shape Space Covering: Almost all structures can be found in arelatively small neighborhood of almost every sequence.
3. Mutual Accessibility: The neutral networks of any two structuresalmost touch each other somewhere in sequence space.
![Page 46: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/46.jpg)
Simulated Trajectories
0.5 1.0 1.5time (arbitrary units)
5
15
25
35
45
stru
ctur
e di
stan
ce
0 510
40
proj
ectio
n co
ordi
nate
Punctuated equilibra = diffusion of neutral networks +constant rate of innovation +exponential selection of rare mutants
Proc.Natl.Acad.Sci. 93: 397-401 (1996)
![Page 47: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/47.jpg)
Diffusion Constant
. . . can be deduced from Moran model:
D = λ6Anp
3 + 4Np(1 + 1/N) ∼
(3/2)A(n/N) p ≫ 0 orN ≫ 12Anp p ≪ 1
A . . . replication raten . . . sequence lengthN . . . population sizep . . . mutation rateλ . . . neutrality of network
![Page 48: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/48.jpg)
Dynamics of Interacting Replicators
Ik + Ij −→ Il + Ik + Ij
With mutation:
xk = xk
∑
j
Akjxj −∑
i ,j
Aijxixj
+∑
l ,j
(QklAljxjxl − QlkAkjxkxj)
where
Qkl = (1 − p)n−d(k,l)
(
p
α− 1
)d(k,l)
How does this behave in sequence space?
![Page 49: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/49.jpg)
Simplest case: Simplest case: Akl = A0(1 − d(Ik , Il )/n):
0 1000 2000 3000 4000 5000time
0.0
0.2
0.4
0.6
0.8di
vers
ity
0 2×105
4×105
6×105
8×105
time lag τ
0
10
20
30
40
50
60
g(τ)
g(τ) =1
T2 − T1 + 1
T2∑
t=T1
‖p(t + τ) − p(t)‖2
B.M.R. Stadler, Adv. Complex Syst. (2003)
![Page 50: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/50.jpg)
10-6
10-4
10-2
100
p
10-6
10-4
10-2
100
D
100
101
N/n
10-3
10-2
10-1
100
D
Left: Diffusion coefficient D as a function of the mutation rate for N = 10, 20, 30, 40, 80 and
n = 10, 20, 30, 40, 80 such that N/n = 1 after equilibration for 105 timesteps. Right: Dependence of the ratio
D/p on N/n.
![Page 51: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/51.jpg)
An RNA-Based Model in the Plane
Target hypercycle with 8 mem-bers.
Model:Hypercyclically coupledspecies, each sequence hasa function that depends onits structure.
![Page 52: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/52.jpg)
Spatial Extension: CA Model
Possible Catalysts Actual CatalystsPossible Replicators
Rules of replication. For each of the neighbors (•) of the empty cell (marked by a bold outline) the replication rate
ρz is computed taking into account their neighbors in the direction of the replication () as potential catalysts.
The neighbor with the largest values of ρz invades the empty position. In this example, for the chosen replicator,
only three of its neighbours are catalysts according to the hypercycle topology.
![Page 53: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/53.jpg)
Spirals formed after 3000 generations in an evolution experimentstarted with 300 random sequences in the absence of parasites.see also Borlijst & Hogeweg (1993)
![Page 54: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/54.jpg)
Diffusion in Sequence Space
2000 4000 6000 8000 10000time lag (tau)
2
4
6
8
10
12
g(ta
u)
0.0005 0.0010 0.0015 0.0020Mutation rate
0.002
0.003
0.004
Diff
usio
n co
nsta
nt
![Page 55: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/55.jpg)
Summary III: Dynamics of RNA Evolution
Neutrality of the Sequence-Structure Map impliesdiffusion/drift-like motion in sequence independent of detailsof the selection/mutation mechanisms and whether spatialextension is taken into account or not.
=⇒ The basic assumption of molecular phylogenetics, namelya dominating influence of drift in sequence evolution, holdstrue even when phenotypic evolution is dominated byinteractions(co-evolution).
TODO Development of a rigorous mathematical theorydescribing the motion in sequence space of a population withstrong interactions.
![Page 56: The RNA World, Fitness Landscapes, and all thatThe RNA World, Fitness Landscapes, and all that Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary](https://reader035.vdocuments.us/reader035/viewer/2022062915/5eb5cf90e4f0c65eeb50c8c9/html5/thumbnails/56.jpg)
Acknowledgments: It’s not my fault . . .
Peter Schuster, Walter Fontana, Wolfgang SchnablChristoph Flamm, Ivo L. HofackerChristian M. ReidysCamille Stephan-Otto Attolini, Barbel Stadler