conformation networks: an application to protein folding
DESCRIPTION
Center for Nonlinear Studies. Conformation Networks: an Application to Protein Folding. Zoltán Toroczkai. Erzsébet Ravasz. Center for Nonlinear Studies. Gnana Gnanakaran (T-10). Theoretical Biology and Biophysics. Los Alamos National Laboratory. Proteins. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/1.jpg)
Conformation Networks: an Application to Protein
Folding
Zoltán Toroczkai
Center for Nonlinear Studies
Los Alamos National Laboratory
Center for Nonlinear Studies
Erzsébet Ravasz
Gnana Gnanakaran (T-10)Theoretical Biology and Biophysics
![Page 2: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/2.jpg)
Proteins
the most complex molecules in nature
globular or fibrous
basic functional units of a cell
chains of amino acids (50 – 103)
peptide bonds link the backbone
unique 3D structure (native physiological conditions)
biological function
fold in nanoseconds to minutes
about 1000 known 3D structures: X-ray crystallography, NMR
Native state
![Page 3: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/3.jpg)
153 Residues, Mol. Weight=17181 [D], 1260 Atoms
Main function: primary oxygen storage and carrier in muscle tissue
It contains a heme (iron-containing porphyrin ) group in the center. C34H32N4O4FeHO
Myoglobin
![Page 4: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/4.jpg)
Protein conformations
• defined by dihedral angles
2 angles with 2-3 local minima of the torsion energy
N monomers about 10N different conformations
Amino-acid
![Page 5: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/5.jpg)
Levinthal’s paradox
• Levinthal’s paradox, 1968
finding the native state by random sampling is not possible
40 monomer polypeptide 1013 conf/s
3 1019 years to sample all
universe ~ 2 1010 years old
Wetlaufer, P.N.A.S. 70, 691 (1973)
Levinthal, J. Chim. Phys. 65, 44-45 (1968)
nucleation
folding pathways
• Anfinsen: thermodynamic hypothesis
native state is at the global minimum of the free energy
Epstain, Goldberger, & Anfinsen, Cold Harbor Symp. Quant. Biol. 28, 439 (1963)
![Page 6: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/6.jpg)
Free energy landscapes
• Bryngelson & Wolynes, 1987
free energy landscape
Bryngelson & Wolynes, P.N.A.S. 84, 7524 (1987)
a random hetero-polymer typically does NOT fold
Davidson & Sauer, P.N.A.S. 91, 2146 (1994)
Experiment:— random sequences— GLU, ARG, LEU— 80-100 amino-acids
~ 95% did not fold in a stable manner
![Page 7: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/7.jpg)
Funnels
• Leopold, Mortal & Onuchic, 1992
Leopold, Mortal & Onuchic, P.N.A.S. 89, 8721 (1992)
many folding pathways
Energy funnels Given any amino-acid sequence: can we tell if it is a good
folder? experiments (X-ray, NMR) molecular dynamics simulations homology modeling
Difficult and slow
![Page 8: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/8.jpg)
Molecular dynamics
• State of the art
supercomputer (LANL)
Ribosome in explicit solvent:– targeted MD– 2.64x106 atoms (2.5x105 + water)– Q machine, 768 processors– 260 days of simulation (event: 2 ns)
Sanbonmatsu, Joseph & Tung, P.N.A.S. 102 15854 (2005)
– more than 100,000 CPU’s– simulation of complete folding event
» BBA5, 23-residue, implicit water» 10,000 CPU days/folding event (~1s)
distributed computing (Stanford, Folding@home)
Shirts & Pande, Science 290, 1903 (2000)Snow, Nguyen, Pande, Gruebele, Nature 420,102 (2002)
~ 1016 times slower
![Page 9: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/9.jpg)
Configuration networks
• Configuration networks
NODE configurationLINK change of one
degree of freedom (angle)
refinement of angle values continuous case
Protein conformations
dihedral angles have few preferred values
Ramachandran mapPDB structures
Ramachandran & Sasisekharan, J.Mol.Biol. 7, 95 (1963)
• Helix• Sheet• other
![Page 10: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/10.jpg)
Why networks?
• VERY LARGE: 100 monomers 10100 nodes. However:
Generic features of folding are determinedby STATISTICAL properties
of the configuration network
degree distribution average distance clustering degree correlations
Albert & Barabási, Rev. Mod. Phys. 74, 67 (2002); Newman, SIAM Rev. 45, 167 (2003)
toolkit from network research
captures the high dimensionality
faster algorithms to simulate folding events
pre-screening synthetic proteins
insights into misfolding
![Page 11: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/11.jpg)
A real example
• The Protein Folding Network: F. Rao, A. Caflisch, J.Mol.Biol, 342342, 299 (2004)
beta3s: 20 monomers, antiparallel beta sheets
MD simulation, implicit water
330K, equilibrium folded random coil
NODE -- 8 letters / AA (local secondary struct)
LINK -- 2ps transition
![Page 12: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/12.jpg)
Its native conformation has been studied by NMR experiments:
De Alba et.al. Prot.Sci. 8, 854 (1999).
Beta3s in aqueous solution forms a monomeric triple-stranded antiparallel beta sheet in equilibrium with the denaturated state.
•Simulations @ 330K
•The average folding time from denaturated state ~ 83ns
•The average unfolding time ~83ns
•Simulation time ~12.6s
•Coordinates saved at every 20ps (5105 snapshots in 10s)
•Secondary structures: H,G,I,E,B,T,S,- (-helix, 310 helix, -helix, extended, isolated -bridge, hydrogen-bonded turn, bend and unstructured).
•The native state: -EEEESSEEEEEESSEEEE-
•There are approx. 818 1016 conformations.
•Nodes: conformations, transitions: links.
![Page 13: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/13.jpg)
Many real-world networks are scale free
hubs
co-authorship (=1 - 2.5) citations (=3) sexual contacts (=3.4) movie actors (=2.3) Internet (y=2.4) World Wide Web (=2.1/2.5) Genetic regulation (=1.3) Protein-protein interactions ( =2.4) Metabolic pathways (=2.2) Food webs (=1.1)
Barabási & Albert, Science 286, 509,
(1999);
Scale-free network
beta3srandomized
Many reasons behind SF topology
• Why is the protein network scale free?• Why does the randomized chain have similar degree distribution?• Why is = - 2 ?
![Page 14: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/14.jpg)
Robot arm networks
000 100 200
010
020
021
000 100 200
010
020
021
n-dimensional hypercube
binomial degree distribution
20 1
00
0102 10 11 12 20
21
22
n=0
n=1
n=2 • Steric constraints?
missing nodes
missing links
Swiss cheese
00 10 20
01
02 12 22
2111
Homogeneous
![Page 15: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/15.jpg)
A bead-chain model
• Beads on a chain in 3D: robot arm model
similar to C protein models
rod-rod angle
3 positions around axis
N=18; = 120 2212112212111122
N=6; = 90
Honeycutt & Thirumalai, Biopolymers 32, 695 (1992)
Homogeneous network
![Page 16: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/16.jpg)
Another example: L = 7, = 75 , r = 0.25
“00100”
state “00100”
allowed state
forbidden state
![Page 17: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/17.jpg)
Adding monomers not only increases the number of nodes in the network but also its dimensionality!! The combined effect is small-world.
![Page 18: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/18.jpg)
Shortcuts in Folding Space
![Page 19: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/19.jpg)
![Page 20: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/20.jpg)
The “dilemma”
HOMOGENEOUS
• from studies of conformation networks
bead chain
robot arm
SCALE FREE
• from polypeptide MD simulations
beta3s
randomized version
?
![Page 21: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/21.jpg)
Gradient NetworksGradient Networks
Ex.:
Y. Rabani, A. Sinclair and R. Wanka, Proc. 39th Symp. On Foundations of Computer Science (FOCS), 1998: “Local Divergence of Markov Chains and the Analysis of Iterative Load-balancing Schemes”
Load balancing in parallel computation and packet routing on the internet
Gradients of a scalar (temperature, concentration, potential, etc.) induce flows (heat, particles, currents, etc.).
Naturally, gradients will induce flows on networks as well.
Z. T. and K.E. Bassler, “Jamming is Limited in Scale-free Networks”, Nature, 428, 716 (2004)
Z. T., B. Kozma, K.E. Bassler, N.W. Hengartner and G. Korniss “Gradient Networks”, http://www.arxiv.org/cond-mat/0408262
References:
![Page 22: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/22.jpg)
Setup:
Let G=G(V,E) be an undirected graph, which we call the substrate network.
}1,...,2,1,0{},...,,{ 110 −≡= − NxxxV N The vertex set:
loops)-self (no ),,( , , ExxjixxeEeVVE ji ∉==∈×⊂ The edge set:
A simple representation of E is via the Nx N adjacency (or incidence) matrix AA
⎩⎨⎧
∉∈
==Eji
EjiaxxA ijji ),( if 0
),( if 1),(
Let us consider a scalar field ℜ→Vh :}{
Set of nearest neighbor nodes on G of i :)1(
iS
(1)
![Page 23: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/23.jpg)
Definition 1 The gradient h(i) of the field {h} in node i is a directed edge:
))(,()( iiih =∇
Which points from i to that nearest neighbor }{)1( iSi U∈ for G for which the increase in the
scalar is the largest, i.e.,:
)(maxarg)(}{)1(
jiSj
hii U∈
=
The weight associated with edge (i,) is given by:
ihhih −=∇ )(
)(),()( then )( If iiiihii 0≡=∇= The self-loop )(i0.. is a loop through i
with zero weight.
Definition 2 The set F of directed gradient edges on G together with the vertex set V forms the gradient network:
),( FVGG ∇=∇
(3)
(2)
If (3) admits more than one solution, than the gradient in i is degenerate.
![Page 24: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/24.jpg)
In the following we will only consider scalar fields with non-degenerate gradients. This means:
0}),( if {Prob. =∈= Ejihh ji
Theorem 1 Non-degenerate gradient networks form forests.
Proof:
![Page 25: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/25.jpg)
Theorem 2 The number of trees in this forest = number of local maxima of {h} on G.
0.43
0.1
0.2
0.5
0.2
0.15
0.7
0.6
0.87
0.440.24
0.14
0.18
0.16 0.13
0.15
0.05
0.65 0.8
0.55
0.160.19
0.2
0.670.44
0.05
0.82
0.46
0.48
0.650.67
0.53
0.650.22
0.32
0.65
![Page 26: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/26.jpg)
, 1 , 1
)(
: 1 , . , ,0limit In the
Npzlzl
lR
zconstNpzNp
N =<≤≈
>>==∞→→
For Erdős - Rényi random graph substrates with i.i.d random numbers as scalars, the in-degree distribution is:
![Page 27: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/27.jpg)
![Page 28: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/28.jpg)
The Configuration model
A. Clauset, C. Moore, Z.T., E. Lopez, to be published.
![Page 29: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/29.jpg)
K-th Power of a Ring
Generating functions: ∑=i
ki zkzg )(
∫ ⎟⎟⎠
⎞⎜⎜⎝
⎛′′
−−=1
0 )1(
)()1(1 )(
g
xgxzgdxzR
![Page 30: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/30.jpg)
( )
( )
( )
( )⎪⎪⎪⎪⎪⎪⎪
⎩
⎪⎪⎪⎪⎪⎪⎪
⎨
⎧
=+
−≤≤+++++++
+
=+++
++
−≤≤+++++++
+++
=
KlK
KlKlKlKlK
K
KlKKKK
KK
KllKlKlKlK
KlKK
lR K
2 ,14
1
121 ,)32)(22)(12(
124
,)33)(23)(13(3
7726
11 ,)32)(22)(12)(2(
24934
)(
2
2
)2(
![Page 31: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/31.jpg)
2K+l
Power law with exponent =- 3
![Page 32: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/32.jpg)
![Page 33: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/33.jpg)
The energy landscape
What generates = - 2 ?
• Energy associated with each node (configuration)
the gradient network
most favorable transitions
T=0 backbone of the flow
MD simulation
tracks the flow network
biased walk close to the gradient network
trees
basins of local minimaThe REM generates an exponent of -1.The REM generates an exponent of -1.
![Page 34: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/34.jpg)
Model ingredients
• A network model of configuration spaces
network topology
homogeneous
degree correlations
constrained (folded)small kconf
lower energy
loose (random coil)large kconf
higher energy
k, E increases
how to associate energies
![Page 35: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/35.jpg)
Random geometric graph
• random geometric graph
in higher D: similar to hypercube with holes
degree correlations k
E
• Energy proportional to connectivity
R=0.113, <k>=20
Dall & Christensen, Phys.Rev.E 66, 026121 (2002)
![Page 36: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/36.jpg)
N=30000, <k> = 1000, d=2.
![Page 37: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/37.jpg)
Exponent is - 22 essential ingredients:
1) k1-k2 correlations2) <E> with k monotonic
![Page 38: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/38.jpg)
AttractiveRepulsive
Lennard-Jones potential
Bead-chain model
• more realistic model: bead-chain
configuration network excluded volume
energy: Lennard-Jones
![Page 39: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/39.jpg)
L = 30, = 75
![Page 40: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/40.jpg)
![Page 41: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/41.jpg)
The case of the -helix
AKA peptide
• ALA: orange• LYS: blue• TYR: green
MD simulations, no water.
![Page 42: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/42.jpg)
T = 400
More than one simulation
3 different runs: yellow, red and green
The MD traced network
The role of temperature
![Page 43: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/43.jpg)
![Page 44: Conformation Networks: an Application to Protein Folding](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d4c550346895dba7a7d/html5/thumbnails/44.jpg)
Conclusions
• A network approach was introduced to study sterically constrained conformations of ball-chain like objects. • This networks approach is based on the “statistical dogma” stating that generic features must be the result of statistical properties of the networks and should not depend on details.• Protein conformation dynamics happens in high dimensional spaces that are not adequately described by simplistic reaction coordinates. • The dynamics performs a locally biased sampling of the full conformational network. For low enough temperatures the sampled network is a gradient graph which is typically a scale-free structure.• The -2 degree exponent appears at and bellow the temperature where the basins of the local energy minima become kinetically disconnected.• Understanding the protein folding network has the potential of leading to faster simulation algorithms towards closing the gap between nature’s speed and ours.
Coming up: conditions on side chain distributions for the existence of funneled energy landscapes.