evangelos coutsias dept of mathematics and statistics ......evangelos coutsias dept of mathematics...
TRANSCRIPT
Loop closure and molecular shapespaces
Evangelos CoutsiasDept of Mathematics and Statistics,
University of New Mexico
IMA-1/15/08
HybridLoop Closure
sampling protein loops with Sobol’squasirandom algorithm
using a combination of inverse kinematic and constraint methods
Applications to loop modeling, exhaustive conformational covering of ring moleculres, the
assembly of helical proteins, Concerted Move MC,...2
Many versions of analytic loop closure
• Go & Scheraga (1970) (transcendental)• Lee & Liang (Mec. Mach. Theory, 1988) (polynomial,
efficient 16x16 resultant, arbitrary chain)• Wedemayer & Scheraga (1999) (polynomial, planar
tripeptide, Sylvester (24x24) resultant etc.)• Coutsias, Seok, Jacobson, Dill (JCC, 2004) (polynomial,
general tripeptide/arbitrary loop, Sylvester resultant)• Coutsias, Seok, Wester, Dill (IJQC, 2005) (polynomial,
arbitrary structure, Dixon (16x16) resultant)• many others.... 3
4
baseend effector: position and orientation are prescribed
arbitrary chain
rotatable arm (R-joint)
5
( ) iijij dtRRdR −×Γ=Effect of changing one torsion grows with distance from torsional axisiΓ
ijRiR
jR
jdRidt
6
1Γ
j
04
1=×Γ= ∑
=iij
iij dtRdR
4Γ
2Γ
3Γ
4 torsions can be changed together so that their combined effect cancels at a given point
fixed end
7a
04
1=×Γ= ∑
=iij
iij dtRdR
0
:
44332211 =+++
×Γ
×Γ=
jjjj
iji
ijiij
uauauaua
RR
u
Given 4 vectors in 3-space, there is always a way to combine them so that they add up to zero, while the coefficients do not all vanish:
we can always combine 4 rotations so that they leave a given point invariant in 3-space 7b
[ ] 0
4
3
2
1
4321 =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
aaaa
uuuu jjjj
System of 3 equations (the u’s are 3-vectors) in 4unknowns (the a’s): a nontrivial solution (null vector)is guaranteed
7c
04
1=×Γ= ∑
=iij
iij dtRdR
1Γ
j
4Γ
2Γ
3Γ 1+j2+j
Although atom j is now fixed, the chain past j rotates about an effective axis through j
8
baseend effector
The end triad is moved due to change in the j-th torsion
jR
jt
2−NR
1−NRNR
9a
base Using six “adjustor” torsions cancels effect of changing j-th torsion, keeping end triad (and all subsequent atoms) fixed in space
jR
jt
2−NR
1−NRNR
9b
02,1
2 =×Γ= −=
− ∑ iNi
N
iiN dtRdR
01,1
1 =×Γ= −=
− ∑ iNi
N
iiN dtRdR
0,1
=×Γ= ∑=
iNi
N
iiN dtRdR
( ) 6,,1;,,7 …… == ittBt Nii
Loop closure conditions (arbitrary indexing)
base Could determine the values of the adjustor torsions that cancel the change in j-th torsion by integrating a system of differential equations
jR
jt
2−NR
1−NRNR
6,...,1
:,
=
∂∂
==∂∂
i
tBF
tt
j
iji
j
i
ijiijk
kji
kij
N
iij Rvv
tBvtR ×Γ=
⎭⎬⎫
⎩⎨⎧
∂∂
+Δ=Δ ∑∑==
,6
17
This applies to all atoms, including the end triad where displacement must vanish
NNNjRj ,1,2;0 −−==Δ
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
×Γ−×Γ−×Γ−
=
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
∂∂
∂∂
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
×Γ×Γ×Γ×Γ×Γ×Γ
−
−
−−
−−
Nii
Nii
Nii
i
i
NN
NN
NN
RRR
tB
tB
RRRRRR
,
1,
2,
6
1
,66,11
1,661,11
2,662,11
Kinematic Jacobian
The rates of change of the “adjustor” torsions with respect to an arbitrary (sampled) torsion can be found by solving such systems. Here, except for degenerate arrangements, the six columns are independent. But there are 9 rows. The system is solvable, because there are 3 relations among those (“left null vectors”), due to the invariance of the distances among the end triad atoms.
( ) ( ) 0
,2
2
=−⋅−=−
≠=−
jijiji
ji
RRdRdRRRd
jiconstRR
Solvability conditions for rate-of-change system: 3 relations for i,j=N-2,N-1,N
locked hinge
free hinge
pivot
bond
free torsion
pivot torsion
virtual torsion9c
Our New View of Loop Closure
Old View6 rotations/6 constraints
New View3 rotations/3 constraints
Transcendental equation Polynomial equation for a more general case
10
In the new frameIn the original frame
Representation of Loop Structures
Coutsias, Seok, Jacobson, Dill, JCC 200411
α1−nC
αnC
α1+nC
In the body frame of thethree carbons, the anchor bonds lie in cones about the fixed base.
αC
Given the distance and angle
constraints, three types of virtual
motions are encountered in the
body frame12
α1−nC α
1+nC
αnC
With the base andthe lengths of the two peptidevirtual bonds fixed, the vertex
is constrained to lie ona circle.
αα11 +− − nn CC
αnC
Tripeptide Loop ClosureN
αC'C
Bond vectorsfixed in space
Fixed distance
13
αnC
δ
τ
z
σ
α1+nC
Peptide axis rotation:With the two end carbons fixed in space, the peptide unit rotates about the virtual bond
δτσ +=
14
z
σ
τ
θ
α
η
ξ
x
y ),,,;( ηξθαστ ±= FTransfer Function for concerted rotations: rotations of two peptide units about respective virtual bonds couple to preserve bond angle at Ca atom
15
τ
c
O
σ
ξ
y
x
zThe 4-bar spherical linkage
θ
α
η
16
τσξ θ η
α
x
BN M
Az
y
STetrahedral Angle
17
22
2
22
2
12sin;
11cos
2tan
12sin;
11cos
2tan
uu
uuu
ww
www
+=
+−
=⇔⎟⎠⎞
⎜⎝⎛=
+=
+−
=⇔⎟⎠⎞
⎜⎝⎛=
τττ
σσσ
18
Equation of the Tetrahedral AngleRaoul Bricard, 1897
(study of flexible octahedra)
( )( )
( )( )ηξαθ
ηξαθηξ
ηξαθηξαθ
+++=+−+=
−=−++=−−+=
coscoscoscossinsin4
coscoscoscos
EDCBA
( ) ( ) 0222 =++++ EDuCuwwBAu
19
Bricard Flexible Octahedra
C
D
E
A
FB
A
C 'C
D
B 'B
E
A
B
C
D
20
EA
F
B
CD
1α
2α
3α
1τ
2τ
3τ
constraints: enforced as relations among the 3 torsions
21
( ) ( )
( ) ( )
( ) ( )
2/tan
0
0
0
0023201311
2102
2322
0022203211
2302
2222
0021202111
2202
2122
iiu
CuCuuCuCuC
BuBuuBuBuB
AuAuuAuAuA
τ=
=++++
=++++
=++++
General 6-member ring (Octahedron)
22
1C
2C
3C
4C
5C
6C 3τ1
2
4
slightly perturbed 6-member ring may assume 4 distinct conformations
Canonical cyclohexane
o
o
o
DCEEACCEAACE
FABABC
5.3260
115
===<=
==<=<=<
==<==<
δβ
α
γ
( ) ( )( ) ( )( )
( ) ( )γαγγ
αγγαγ
++=−−=
−==−+=
coscoscos1
coscoscoscos
i
i
ii
i
EC
DBA
( ) ( ) 02, 12
122
12
1 =++++= ++++ iiiiiiiiiiiii EuuCuuBuuAuuG
( ) ( ) ( ) 0,,, 133221 === uuGuuGuuG
( ) 0, =uuG( )
( )
Chair:
Twist Boat:
⎟⎟⎠
⎞⎜⎜⎝
⎛ +−±=
−====
−
+
−
βγτβαβαγ
τ
sinsincossinsincoscoscoscos
,,3,2,1,tan2
1
122
1
s
ststiu
ii
i
rigid, comes from double rootof tetrahedral equation
For flexible solutions to exist, the tetrahedral equation must be satisfied by arbitrary triplets! This is not an obvious fact.
( ) ( ) 02, 2222 =++++= ECuvvuBvAuvuG
Bricard Conjecture
For the system of biquadratics to have a continuum of solutions, it must be derivable via successive elimination, from the equations:
00
321321
211332
=+++=+++
uuPuNuMuLupunuumuulu
easily seen to be a sufficient condition, not obvious that it is necessary (and not proven)!
Sufficient conditions for flexibility
The system of biquadratics can be reduced toBricard’s sufficient condition if:
BCBAEDB
22 −=
=
It is easy to see that the cyclohexane coefficients satisfy these relations; therefore the three equations in three variables are equivalent to two equations in three variables!
uw vw
u
v
conformation circle for cyclohexanetwist-boat. As u spans the feasible range, so do v and w. Any triplet (u,v,w) corresponds to a molecular conformation
v
w
The conformation space of a flexible octahedron is composed of circles
A. Bushmelev & I. Sabitov
x
( )( )βα
βα−−+=+−+=
cos2cos2
min
max
abbaxabbax
3M1M
4M2M
αβb
a
3M1M
4M
2M
αβ
b
a
'4M
'2M
The extreme values of x are found when M4 lies on the plane of M1, m2, M3. If the range for the upper half overlap with that of the lower, there are solutions.
case 2: one circle
cases 3(resp. 4): 2 circles, fused at 1(resp. 2) points
case 3: 2 circles
EA
F
B
CD
1α
2α
3α
1τ2τ
3τ
23a
EA
F
B
CD
1α
2α
3α
1τ
2τ
3τ
1δ2δ
3δ
1θ
1η
1ξ'D
'F
'B
23b
( ) ( )
( ) ( )
( ) ( )
2/tan,1
0
0
0
0023201311
2102
2322
0022203211
2302
2222
0021202111
2202
2122
iiii
iii u
uv
CvCuvCuCvC
BvBuvBuBvB
AvAuvAuAvA
δ=ΔΔ−
Δ+=
=++++
=++++
=++++
24
( )( )( )
( ) ( ) ( )
( ) ( ) ( ) 0
0
0
301312
32
302312232
0001202
1102112212
2120221
2222
1=++
=++
=+++++
+++
uCuuCuuC
uBuuBuuB
AuAuAuAuAuA
uAuAuA
Three constraint equations:three generalized angles
25
Elimination in Polynomial systems
• Pairwise elimination: Sylvester resultant• Simultaneous elimination: Dixon resultant• Both methods lead to a single polynomial in
one of the variables: 16th degree• Differ in complexity and numerical
accuracy
26
( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )⎥
⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
1322211
1322211
1322211
2121
,,,
,,,vpvpvvpvpupuvpupupuup
vvuuC
( ) ( )( )22112121
det,,,vuvu
Cvvuud−−
=
Dixon resultant: details
Elimination matrix
Dixon polynomial
Vanishes at all common roots of original system.27
( )
[ ]
[ ]TuuuuuuuuuuU
vvvvvvvvvvV
VDUvvuud
2312
21212
31
211
321
32
221
222121
2121
1
1
,,,
=
=
=
Dixon matrix: coefficients of Dixon polynomial, arranged by monomials in x, y
If u1, u2 are roots of original system, U becomes a right null vector. Therefore the vanishing of det D is a necessary condition for the existence of a common root.
28
( )
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
==
000000000000000
00000
00
00
00
00
:det4
210
210
210
210
2221
21
1211
21
0201
21
2220
20
1210
20
0200
20
2221
21
1211
21
0201
21
2220
20
1210
20
0200
20
2220
20
1210
20
0200
20
2120
10
1110
10
0100
10
2220
20
1210
20
0200
20
2120
10
1110
10
0100
10
2222
CCCCCC
CCCCCC
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
AABB
SSCBAD
The Dixon resultant contains an extraneous factor (deg. 16)Coutsias, Seok, Wester, Dill, IJQC 2005
29
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
000000000000000
0000000
0000
00
210
210
210
210
210210
210210
210210
210210
iii
iii
iii
iii
iiiiii
iiiiii
iiiiii
iiiiii
i
CCCCCC
CCCCCC
NNNMMMNNNMMM
MMMLLLMMMLLL
S
031232
031232
SuSuSS
LuLuLL kkkk
++=
++=
30
00
00
det10
33
=⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦
⎤⎢⎣
⎡−−
−⎥⎦
⎤⎢⎣
⎡SSI
uS
I
Eliminating the extraneous factor there results a matrix polynomial of degree 2; its determinant can be computed with a companion matrix of size 16X16. Then the real eigenvalues give u3 for all possible conformations. Values for the other two variables are found from the appropriate eigenvector components
Successive elimination (Sylvester resultant method) results in a matrix of size 6X6 but with quartic coefficients, leading to a companion matrix of size 24X24 (but still a 16th degree polynomial)
31
QZ or Characteristic PolynomialEither:
• (1) Solve generalized eigenproblem using QZ algorithm (cubic order in matrix size, here approx. 16^3 ~4kflops)Or:
• (2a) Use Lagrange’s expansion in complementary minors to efficiently compute the characteristic polynomial of the Dixon resultant (~2.2kflops).
• (2b) Use Sturm sequences (count number of real zeros between two values) for an efficient computation of real zeros from characteristic polynomial (bisection/Newton: variable, but low, cost)
Rotate segments 1,2 about resp. virtual axes by angles τ1, τ2
Rotate entire loop about virtual axis 3 by angle τ3
Choose values of the τ1,τ2,τ3 to fix bond angles θ at 3 junctions
1τ2τ
3τ1θ
2θ
3θ
Method calculates all (0-16)possible loop configurationsthat bridge given gap.
are anchorbonds defining the two endsof the loop.
VC
TriaxialLoop Closure
1 2
3
32
10 solutions from RNA modeling
33
Extensions to More General Cases
Longer Loop Closure
General Loop Sampling
previous
new
34
α3C
α2C
α1C
3x
2x
1x
3z
1z2z
y
yy 1τ
2τ
3τ33 δτ +
22 δτ +
+
Reconstruction Procedure:
1. Rotate about axis 3 by angle to place triangle into space. Compute local space axes.
2. Bring objects 1,2 into their own body frame, so that the corresponding C atoms are located in the positive-X direction (while the Z-axis is the body axis).
3. Rotate each of 1, 2 by ,
4. Place 1, 2 in space, bringing each set of body axes into coincidence with respective space axes.
α2C 3τ−
2τ1τ
1σ
1τ
1α
1a2a
3a4a
3x
2x
1x1R
4R
1z
4z1s
4s
3G
2GCrank
Follower
Two-revolute, two-spherical-pair mechanism 35
Lee & Liang, Mech. Mach. Theory, 1988
36a
Lee-Liang
Loop
Closure
S. Pollock, Undergrad. thesis, UNM ‘07
36bS. Pollock, Undergrad. thesis, UNM ‘07
36cS. Pollock, Undergrad. thesis, UNM ‘07
36dS. Pollock, Undergrad. thesis, UNM ‘07
36eS. Pollock, Undergrad. thesis, UNM ‘07
“The Mt. Everest of kinematics”
A. Freudenstein
36f
The formulation is quite involved; however it eventually leads to a generalized eigenproblem of identical form as the one found in the Triaxial case with the Dixon resultant, so the solution from that point on is identical for the two methods. Key difference is the considerable additional overhead for carrying out the Hartenberg-Denavit transform. The numerical computation in this form is approximately 3 times slower than for the Triaxial-Dixon algorithm (comparison in progress).
1θ 8θ
7θ
6θ
5θ4θ
3θ2θ
o115
Comparison of algorithms: exhaustive covering of the conformational space of cyclooctane. Two torsions set to arbitrary values, other 6 determined to satisfy closure 37
Pollock & Coutsias, (preprint, 2007)38a
Low res. plots of solution no. and energy (only lowest energy state shown, no solvation)
energyno. of solutions
6
6
6
4
4
2
2
8θ
6θ
AutoEncoder
IsoMap
Cyclooctane as Benchmark for Automated Dimensionality Reduction
• Automated dimensionality reduction algorithms offer the potential for visualization of high dimensional energy landscapes and more efficient sampling in surrogate spaces for general problems in protein folding, molecular recognition, self assembly, etc.
• Examples:– Principal Component Analysis (PCA)– Locally Linear Embedding (LLE)– IsoMap– Autoencoder Neural Network (AE)
• How well do they perform on cyclooctane?with M. Brown, J.-P. Watson, Sandia Nat. Labs
Automated Dimensionality Reduction Must Be Able To…
1. Identify the dimensionality of a manifold2. Provide a forward map into low
dimensional space (Visualization)3. Provide a reverse map into high
dimensional space (Recover meaningful structure from low-dimensional samples)
4. Reconstruction error – Error between original structure and structure obtained following forward and reverse mapping.
with M. Brown, J.-P. Watson, Sandia Nat. Labs
Results
• Can obtain successful reduction from D=72 (with hydrogens), 24, or 8 (dihedral angles) to ~3-5, but not 2.
with M. Brown, J.-P. Watson, Sandia Nat. Labs
3-D Embedding from Isomapwith M. Brown, J.-P. Watson, Sandia Nat. Labs
Manifold Tearing?• Cannot guarantee a smooth embedding of an
n-manifold in n dimensions• For the problems of interest, it may not be
necessary to ensure a strict preservation of topology as long as the reconstruction error is minimized.
• Tearing the manifold or considering only a subset of charts at a time may allow for a more efficient encoding– Self organizing maps or methods for automating
the tearing might be beneficialwith M. Brown, J.-P. Watson, Sandia Nat. Labs
Examples• By restricting 1 of the independent dihedral
angles to (0,pi/2) domain or by using a single molecular dynamics trajectory a successful reduction to 2 dimensions is obtained:
with M. Brown, J.-P. Watson, Sandia Nat. Labs
Completeness/efficiency issues• Complete sampling: use perturbation theory of eigenvalues
to ensure sampling at sufficient resolution to detect all possible bifurcations and crossings
• Efficiency: .5 degree grid sampling requires ~1hr CPU time on a 2.8GHz Xeon processor (compared to 4 months for
calculation by Ros et al JCC 2007, using distance geometry)
BxAxBxAx βαλ =⇒=
( )ε
βαλ
OFE
yByAyByAFBBEAA
=
=⇒=⇒⎭⎬⎫
+=+=
22,
~~~~~~~~~
( )2~,~~,~ εβα OxByxAy HH +=
Stewart & Sun, Matrix Perturbation Theory, Academic Press, 1990
40
Numerical Sensitivity Issues• Numerical errors accumulate roughly like the square root of
the number of algebraic operations• Setting up the resultant for the LL algorithm involves approx
4x the amount of computation as for the TLC algorithm• The two resultants have similar forms, with nonzero entries in
the same locations, so that the generalized eigenproblemsinvolve identical computations
• The sensitivity of the eigenproblem depends on the separation of eigenvalues/eigenvectors
• Near degeneracies, the sensitivity in the eigenproblem serves as an amplifier of numerical errors in the matrix elements
Dangerous territory: in a broad sampling need to tread carefully for trouble may be lurking!
Bricard curve
1st chaotic ridge, in 6-8 projection
4
6
38b
8θ
6θ
39
1st chaotic ridge, shown in t5-t8 projection, is pathological for both TLC and LL algorithms
8θ
5θ
10^-5 stepsize
Cyclooctane conformation analysis(under the hood!)
• The TLC calculations employ 3 equations and use t5, t8 as control parameters
• The LL computations employ additionally descriptions in terms of (t6,t8) and (t4,t8), not possible for TLC
• The various descriptions are equivalent, as alternative 2d projections of a 2-parameter surface in 8 dimensional space.
• The phenomena seen can be more easily explained by employing a 4-equation generalization of TLC, using as control parameters the angles (a1, a2) which serve to completely characterize the quadrilateral formed by atoms (1-3-5-7)
1α
2α
3(5)
β
γ1z
2z
3z4z
1(1)
2(3)
4(7)
( )2/tan ii z=τ
(6)
(8)
(2)
(4)
34z
1
2
4
δ
2/tan,1 ii
ii
iii δ
ττσ =Δ
Δ+Δ−
=
( )2/tan ii s=σ
δ−= 44 zs
3
1
2
4
δ
4s4z
( )
( ) ( ) 12/tan2/tan2cos0
2/tan,1
02
02
02
02
21
2212142
242
21
242
1241431
231
24
231
2232322
222
23
222
1221211
211
22
211
≤=≤
=ΔΔ+
Δ−=
=++++
=++++
=++++
=++++
ααδ
δτ
τσ
σσσσσσ
ττττττ
σσσσσσ
ττττττ
i
ii
EDCBA
EDCBA
EDCBA
EDCBA
( )
( )
πααα
γπβ
βαγβ
αγβαγ
≤+≤
−=
−−=−=
=−=+−=
21
2
,02
2coscossin2
coscos2coscos
i
ii
i
iii
ii
EC
DBA
The coefficients for cyclooctane; the canonical values are γ=115, β=32.5 deg.
Determining backbone torsions:spherical trigonometry
1
34
2
5
1z 2z
2t 3t
2αβ
γ
3122
2222
cossinsincoscoscossinsincoscoscossinsincoscoscossinsincoscos
tt
βγβγτβαβαβγβγτβαβα
−=−−=−
Canonical cyclooctanequasi-planar conformation
o
o
o
5.32817
90571135357713115812123
21
===<=
==<=<==<=<
==<==<
δβ
αα
γ
( ) ( ) ( ) ( )( ) ( ) ( )( )
( ) ( )γγγ
γπγγγγπγ
sincoscos1
cos2/coscossincos2/coscos
−=−−=
=−==+=−+=
i
i
ii
i
EC
DBA
Square scaffold conformation
( ) ( )
43214
1421
2414
21
24
21
24
:021sin12cos
τττττ
ττττγττττττγ
→→→→
=−−++−++cyclic
1τ
4τ
3τ
2τ
o115=α
o5.32== δβo90=γ
1
23
4
2
34
5
6
7
8
1t
7t
6t
5t
4t 3t
2t
8t
Flexible conformation
{ } [ ]( ) 02
,,,,,,,2222
81
=++++
−−−−==
ECuvvuBvAu
vvuuvvuut ii
When the 1-3-5-7 frame forms a square, t6=-u and t8=v lie on a Bricard curve and atoms 2,4,6,8 may rotate freely about the square frame. Two eigenvalues coincide, leading to the sensitive behavior. This coincidence is accidental, not associated with merger and disappearance of solutions, i.e. not associated with a boundary where solution number changes
v
u
Bricard curve
( )( )2/tan
2/tan
8
6
tvtu
=−=
The 1st chaotic ridge is a Bricard curve, shown here in terms of the half-tangents of the control parameters
2nd chaotic ridge, (6-8). Inaccurate solutions found.
4
6
38c
another accidental degeneracy, with no simple geometrical interpretation
6θ
8θ
610−
710−
Hybrid Algorithm
• Triaxial Loop Closure (paired axes) preferrablewhen applicable – faster, more robust
• Lee-Liang (independent axes) loop closure: more general, employed when necessary
• Similar resultant: equivalent solution as eigenproblem or polynomial (new formulation)
• Automatic switching between two methods• Additional algebraic safeguards against accidental
degeneracies (optimal choice of solution variable,nongeneric flexibility detection)
41
locked hinge
free hinge
bond
free torsion
42
locked hinge
free hinge
bond
virtual torsions
constraints
virtual axes
5 coupled tetrahedrals=> 2*2^5 = 64 possible solutions
example: additional backbone distance constraints
43
locked hinge
free hinge
bond
first loop
Example: Sampling a contact: sample triad positions and double loop closure
second loop
44
locked hinge
free hinge
bond
first loop
Example: Sampling a contact: sample triad positions and double loop closure
second loop
45
Sobol sampling
A quasirandom sampling scheme that retains nearly uniform density in an n-
dimensional box for arbitrary resolution
46
47
seed = 1 to 100 +
48a
seed = 1 to 100 + , seed = 101 to 200 +
48-b
seed = 1 to 100 + , seed = 101 to 200 o
48c
seed = 1 to 1000 + , seed = 9001 to 10000 +
48d
seed = 1 to 10000 +
48e
General
49
Proline
Glycine
Ramachandran regions
Ramachandran region sampling
• Sobol samples hypercubes• dedicating two dimensions per residue, would
“waste” ¾ of sample points at the most unlikely areas (probability<5%)
• Simple: 95% probability region for each residue is pixelated at 5deg squares, and laid out in a linear array. Each residue is assigned 1 Sobol dimension
• Advanced: each pixel in 95% region stretched according to probability measure
50
Applications
• modeling of ring molecules • Loop refinement• Complementing Rosetta Relax algorithm
for sampling loop and helix variability• A simple algorithm for Helical Protein
Assembly using maximal hydrophobic packing
51
refinement:CASP7 target T308
best model
native best 5 among 100 TLC samples
Loop
preliminary result
Coutsias, Jacobson, Dill (2007)4
Sampling loops and repositioning helices in Rosetta
Mandell, Coutsias, Kortemme (2007, work in progress)
Can apply to longer loops by sampling dihedrals and applying closure
Can also perturb fragments like helices and close surrounding loops
50
Current Backbone Flexibility in Rosetta: Relax
Ensemble generated using Rosetta Relax from a single PDZ domain
Natural structural variation of a family of PDZ domains
Single PDZ domainwith ligand
Fragment insertion +local backbone moves
51
Loop closure complements fragment insertion
• Combining loop closure with fragment insertion might yield even more realistic ensembles
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
0
0.5
1
1.5
2
2.5
RM
SD
fro
m 1
BE9
C-a
lph
a
Residue C-alpha
Rosetta Loop closure Natural
Rosetta Relax Natural
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
0
0.5
1
1.5
2
2.5
RM
SD
fro
m 1
BE9
C-a
lph
a
Residue C-alpha
Rosetta Loop closure Natural
Loop closure
sheets helix
52
53
Triaxial Loop Closure - An Efficient Method for Sampling
BackbonesModeling the antigen-binding antibody H3 loopApo conformation
starting from Holo
using Rosetta Relax
triaxial loop closure(D. Mandell, Kortemme lab,UCSF)
54
Comparison: Rosetta Relax vs. TLC
• the Rosetta Relax structures are generated by fragment insertion and small torsion moves over the whole structure,
• the TLC loop is generated by choosing 3 fixed pivots, and sobol-sampling the remaining torsions (the rest of the protein is fixed). Best candidate selected using Rosetta’s energy function.
• The Roseta Relax protocol thus took much longer (~1000 times, since it is operating on the entire structure).
55
Modeling PDZ domain variability
• PDZ domains are topologies frequently found in scaffold proteins that organize complexes, particularly those involved in cell trafficking.
• They frequently bind the C-terminus of a protein ligand, as well as peptides internal to the ligand. PDZ domains have a well-defined core typically consisting of 6 strands and 2 helices arranged in a compact, globular structure.
• For example, substrate binding consistently occurs at the interface formed by two strands and a helix.
1be9 helix ensemble, generated by perturbation of helix torsions, corrected by loop closure. The helix forms an interface together with the adjacent beta strand also shown as a backbone trace. The ligand is shown in green with side chains.
To compare the ensemble with naturally occurring variation, the wild type of this PDZ domain is shown in cyan superimposed with another PDZ domain in blue. The helix from the first PDZ domain is in purple backbone trace and the second is in orange backbone trace. The ligandsare also shown in dark green and light green (side-chain trace only for the second).
much of the variationbetween the naturally occurring helices is captured in the ensemble
TLC as a refinement strategyin new Rosetta loop modelingprotocol (1thg 12 residue loop)
native new
old
with D. Mandell, T. Kortemme, UCSF
Minimization of Angle Deviationin loop modeling by fragment assembly
Joulyan Lee, Soongsil U., Chaok Seok, Seoul Natl. U., (S. Korea), and E.Coutsias
• Fragment assembly methods often applied to protein structure problems.• When structures are generated for a segment such as a protein loop,
assembled fragments are not fit into the frame of the protein of interest exactly.
• Deviation of the dihedral angles of the loop from the fragment angles are minimized here to maintain the features obtained from the structure database as well as possible,
• Two methods have been tried: 1) Monte Carlo simulation and 2) A local minimization in the space of loop conformations satisfying the loop closure constraint. In both method, root-mean-square deviation in dihedral angles is used as the objective function.
Application of Our Loop Closure Method to Protein Loop Modeling: Move Sets in
Monte Carlo MinimizationMonte Carlo Minimization: Sampling of Local Minima
LTD Move Sets Baysal and Meirovitch
J. Phys. Chem. A 101, 2185 (1997).Loop Closure Move Sets
movie
move
minimize
8 residue loop of turkey egg whitelysozyme
Finding Native structure
Loop Modeling Using Monte Carlo Minimization: Example
movie
Loop Modeling: Increased Efficiency and Accuracy
LTD
CSJD
Native
Coutsias et al, J. Comput. Chem. 25, 510 (2004)Baysal, C. and Meirovitch, H., J. Phys. Chem. A, 101, 2185, (1997 )
Monte Carlo simulation: 1 driver angle is perturbed randomly within 10 deg, 6 torsion angles are used to close the loop. kT=0.5 deg and 2000 MC steps. 20 independent simulations starting from different initial loop closure were performed for each starting conformation generated from fragment assembly.Minimization using the kinematic Jacobian: 100 steps of steepest descent minimization, and subsequent LBFGS-b minimization (termination criterion: function decrease: 10^7*machine precision, gradient: 10^(-3)). 20 independent minimization starting from different initial loop closure.Results: two loops, 8-residue loop of 135l (residues 84-91) and 12-residue loop of (35-46), were tested. R_ave is RMSD averaged over the different conformations generated from fragment assembly. R_min is the min RMSD. Dphi_init is the initial deviation in angles, and dphi_ave is the final deviation averaged over the conformations. The overallperformance of the two methods is similar, but the computation time is much faster with Jac method.
with C. Seok , J. Lee
Protein135l 1eco
length8 12
# of conf from fragment assembly 87 127
dphi_init (deg)34.3 28.0
R_ave (A) MC 2.5 3.9
Jac 2.7 5.2
R_min (A) MC 1.0 1.3
Jac 1.1 1.3
dphi_ave (deg) MC 18.1 15.8
Jac 17.0 15.4
Time (sec) MC 38.9 91.7
Jac 5.4 5.1
with C. Seok , J. Lee
135l
0
5
10
15
20
25
30
35
0 0.5 1 1.5 2 2.5 3 3.5 4
RMSD (A)
Devi
ation (
deg)
MC
Jac
with C. Seok , J. Lee
1eco
0
5
10
15
20
25
30
0 2 4 6 8 10
RMSD (A)
Devi
ation (
deg)
MC
Jac
with C. Seok , J. Lee
Assembly of helical proteins
A simple heuristic based on fast loop closure and maximal hydrophobic packing—as measured by radius
of gyration of the Ca atoms in hydrophobic residuesMotivated from need to improve assembly
performance of Dill group’s Zipping & Assembly strategy for tertiary structure prediction
G.A. Wu, E.A. Coutsias, K.A. Dill, Iterative Assembly of Helical Proteins by Optimal Hydrophobic Packing, (2007, in press)
56
Related works• Huang, Samudrala, Ponder (JMB 1999): Rg packing• Yue & Dill (Prot Sci 2000): special angles• Fain & Levitt (JMB 2001): database scoring; graph
theoretic/combinatorial assembly• Nanias et al. (PNAS 2003): simplified energy (Ca rep.)• Narang et al. (Phys Chem Chem Phys 2005): torsion sampling,
knowledge constraints• McAllister et al. (Proteins, 2006), contact prediction, distance
restraints• Zhang, Hu, Kim (PNAS 2002): torsion angle dynamics• …• Issues: Assembly, (Loop closure), Scoring
57
Paths not followed 1: Object assembly en masse: identical equations to loop closure. A possible approach to avoid searching. Assemble all elements at once into geometrically feasible configurations.
58
1, 2 rings
3 rings
4 rings
all possible graph topologies with up to 4 rings, with only 3-node contacts
http://topology.health.unm.edu
Graph Theory: assemble using topological graphs and combinatorics
59
Wester, Pollock, Coutsias, Allu, Muresan & Oprea, Topological Analysis of Molecular Scaffolds II: Analysis of Chemical Databases (submitted) (2007)
The assembly algorithm
• Begin with a protein whose secondary structure is known to contain helices (as determined, e.g. by DSSP, Kabsch,Sander, Biopolymers 22, 2577-2637 (1983) ) remove loops and consider the problem of placing the helices relative to each other
• Align two helices; score each alignment; select best subset, close loop(s).
• Align next helix with assemblage of first two, close loop; iterate
• Cluster/rank by RgH; select best candidates based on a hydrophobic packing criterion.
60
Initial arrangement: juxtaposition of Ca atoms for two hydrophobic residues at 10 A
61a
Initial arrangement: juxtaposition of Ca atoms for two hydrophobic residues at 10 A
• 1) do shortest loop 1st
61b
Initial arrangement: juxtaposition of Ca atoms for two hydrophobic residues at 10 A
• 1) do shortest loop 1st• 2) always add one helix at a time (i.e. the 2nd
shortest loop may not be second, unless it is next to the shortest loop) 61c
• starting with long loops, we would need to keep less compact conformations (in addition to compact ones) in order to ensure the native conformation is covered.
62a
• starting with long loops, we would need to keep less compact conformations (in addition to compact ones) in order to ensure the native conformation is covered.
• by adding long loop closure at the later steps, we have a more limited conformation space to explore due of excluded volume effects (i.e. steric constraints with the pre-assembled parts).
62b
• starting with long loops, we would need to keep less compact conformations (in addition to compact ones) in order to ensure the native conformation is covered.
• by adding long loop closure at the later steps, we have a more limited conformation space to explore due of excluded volume effects (i.e. steric constraints with the pre-assembled parts).
• Amber force field energy minimization is done after loop closure for better sterics (30-60 sd/cg steps)
62c
Sample all possible hydrophobic pairings between two helices
2 DoF sampled: translation and rotation about alignment axis
Limited by loop closability
Ensemble of structures generated, ranked by RgH, clustered by mutual RMSD, lowest RgH structure kept per cluster
Cluster cutoff heuristic: (n-1)Ang, n = #of helices in assembly63
2HEP (2hx)
Topology 1
a
rmsd = 1.27(32)
1PRB (3hx):
assembly order:
2-12
1
Topology 2
a
rmsd = 1.5(1)
Larger proteins
• subtle connectivity topology errors mayexist, although overall RMSD is still good
66
2CRO (5 helix)assembly order:
2-1-3-41
4
3
2
Topology 22
a
b
Topology 21
rmsd = 3.99(55)
Disulfide bridged proteins
• RgH scoring not discriminating: native notamong few best structures
• Imposing known disulfide bonds introduces sufficient restrictions (at least in the 5 proteins we attempted) that still allowed us to sample near-native structures
• TLC applicable to S2 bridges (but not included in current implementation); used amber9 with restraints to close bridges for these studies
68
1C5A (4hx)
assembly order:
3-2-1
1
23a
Topology 6 (21)RMSD = 2(103), 2.49(17)
1RPO 1.52(3) 2 helices
70
1BDD 1.58(24) 3 helices
71
1HP8 2.45(30) (3 helices, 2 S2 bridges)
72
2MHR 2.14(8) 4 helices
73
2ICP 5.10(15), 4.09(137) 5 helices
74
75
Subtleties re. clustering rmsd
• a large rmsd cutoff (~ 3 Ang) during clustering, may lead to a smaller number of centroids but may end up throwing the smallest rmsd conformation away, assuming (falsely) it is represented by another in the cluster
• using a low rmsd cutoff for clustering may keep best rmsd conf, but will become computationally too expensive, or may miss the best conf among the top ranking ones by RgH (because of too many candidates, with insufficient variability).
76
Helical protein gallery: nativevs. lowest rmsd model
)32(27.1 )3(52.1 )24(58.1 )19(92.1 )21(67.1
)13(51.1
)6(93.1
)20(80.1 )1(50.1 )13(03.2
)1(80.1 )2(21.2 )34(50.2 )10(67.2 )19(96.2
77a
[ ]3 [ ]2 [ ]3 [ ]2 [ ]2
)30(29.2 )8(91.2 )22(54.4 )47(21.3 )42(22.4
)8(14.2 )43(00.4 )12(01.4 )15(10.5 )23(59.4
)30(45.2 )9(69.1 )17(49.2 )2(21.3 )28(73.2
77b
Thanks
• Ken Dill, Albert Wu (now at UC Berkeley), Matthew Jacobson, Tanja Kortemme, Dan Mandell, Pharmaceutical Chemistry, UCSF
• Michael Wester, Sara Pollock (now at Dept of Applied Math, U. Washington), Dept of Math and Division of Biocomputing, UNM
• Mike Brown, Jean-Paul Watson, Sandia National Labs
• Chaok Seok, Dept. of Chemistry, Seoul National University, S. Korea, Julian Lee, Soongsil University, S. Korea
• Bob Lewis, Fordham U., Manfred Minimair, Setton Hall U.
78
Support: NIH, UNM Division of Biocomputing
PEP25:CLLRMKSAC
21
3
BA
C
Designing a 9-peptide ring
pep virtual bond3-pep bridge
design triangle
cysteine bridge 9-pepring
ModelingR. Larson’s9-peptide
(P-G-P-G-dA)
native
sample from exhaustive enumeration of conformations for cyclic pentapeptide P-G-P-G-dA