evangelos coutsias dept of mathematics and statistics ......evangelos coutsias dept of mathematics...

Loop closure and molecular shapespaces

Evangelos CoutsiasDept of Mathematics and Statistics,

University of New Mexico

IMA-1/15/08

HybridLoop Closure

sampling protein loops with Sobol’squasirandom algorithm

using a combination of inverse kinematic and constraint methods

Applications to loop modeling, exhaustive conformational covering of ring moleculres, the

assembly of helical proteins, Concerted Move MC,...2

Many versions of analytic loop closure

• Go & Scheraga (1970) (transcendental)• Lee & Liang (Mec. Mach. Theory, 1988) (polynomial,

efficient 16x16 resultant, arbitrary chain)• Wedemayer & Scheraga (1999) (polynomial, planar

tripeptide, Sylvester (24x24) resultant etc.)• Coutsias, Seok, Jacobson, Dill (JCC, 2004) (polynomial,

general tripeptide/arbitrary loop, Sylvester resultant)• Coutsias, Seok, Wester, Dill (IJQC, 2005) (polynomial,

arbitrary structure, Dixon (16x16) resultant)• many others.... 3

baseend effector: position and orientation are prescribed

arbitrary chain

rotatable arm (R-joint)

5

( ) iijij dtRRdR −×Γ=Effect of changing one torsion grows with distance from torsional axisiΓ

ijRiR

jR

jdRidt

6

1Γ

j

04

1=×Γ= ∑

=iij

iij dtRdR

4Γ

2Γ

3Γ

4 torsions can be changed together so that their combined effect cancels at a given point

fixed end

7a

04

1=×Γ= ∑

=iij

iij dtRdR

0

:

44332211 =+++

×Γ

×Γ=

jjjj

iji

ijiij

uauauaua

RR

u

Given 4 vectors in 3-space, there is always a way to combine them so that they add up to zero, while the coefficients do not all vanish:

we can always combine 4 rotations so that they leave a given point invariant in 3-space 7b

[ ] 0

4

3

2

1

4321 =

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

aaaa

uuuu jjjj

System of 3 equations (the u’s are 3-vectors) in 4unknowns (the a’s): a nontrivial solution (null vector)is guaranteed

7c

04

1=×Γ= ∑

=iij

iij dtRdR

1Γ

j

4Γ

2Γ

3Γ 1+j2+j

Although atom j is now fixed, the chain past j rotates about an effective axis through j

8

baseend effector

The end triad is moved due to change in the j-th torsion

jR

jt

2−NR

1−NRNR

9a

base Using six “adjustor” torsions cancels effect of changing j-th torsion, keeping end triad (and all subsequent atoms) fixed in space

jR

jt

2−NR

1−NRNR

9b

02,1

2 =×Γ= −=

− ∑ iNi

N

iiN dtRdR

01,1

1 =×Γ= −=

− ∑ iNi

N

iiN dtRdR

0,1

=×Γ= ∑=

iNi

N

iiN dtRdR

( ) 6,,1;,,7 …… == ittBt Nii

Loop closure conditions (arbitrary indexing)

base Could determine the values of the adjustor torsions that cancel the change in j-th torsion by integrating a system of differential equations

jR

jt

2−NR

1−NRNR

6,...,1

:,

=

∂∂

==∂∂

i

tBF

tt

j

iji

j

i

ijiijk

kji

kij

N

iij Rvv

tBvtR ×Γ=

⎭⎬⎫

⎩⎨⎧

∂∂

+Δ=Δ ∑∑==

,6

17

This applies to all atoms, including the end triad where displacement must vanish

NNNjRj ,1,2;0 −−==Δ

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

×Γ−×Γ−×Γ−

=

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

∂∂

∂∂

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

×Γ×Γ×Γ×Γ×Γ×Γ

−

−

−−

−−

Nii

Nii

Nii

i

i

NN

NN

NN

RRR

tB

tB

RRRRRR

,

1,

2,

6

1

,66,11

1,661,11

2,662,11

Kinematic Jacobian

The rates of change of the “adjustor” torsions with respect to an arbitrary (sampled) torsion can be found by solving such systems. Here, except for degenerate arrangements, the six columns are independent. But there are 9 rows. The system is solvable, because there are 3 relations among those (“left null vectors”), due to the invariance of the distances among the end triad atoms.

( ) ( ) 0

,2

2

=−⋅−=−

≠=−

jijiji

ji

RRdRdRRRd

jiconstRR

Solvability conditions for rate-of-change system: 3 relations for i,j=N-2,N-1,N

locked hinge

free hinge

pivot

bond

free torsion

pivot torsion

virtual torsion9c

Our New View of Loop Closure

Old View6 rotations/6 constraints

New View3 rotations/3 constraints

Transcendental equation Polynomial equation for a more general case

10

In the new frameIn the original frame

Representation of Loop Structures

Coutsias, Seok, Jacobson, Dill, JCC 200411

α1−nC

αnC

α1+nC

In the body frame of thethree carbons, the anchor bonds lie in cones about the fixed base.

αC

Given the distance and angle

constraints, three types of virtual

motions are encountered in the

body frame12

α1−nC α

1+nC

αnC

With the base andthe lengths of the two peptidevirtual bonds fixed, the vertex

is constrained to lie ona circle.

αα11 +− − nn CC

αnC

Tripeptide Loop ClosureN

αC'C

Bond vectorsfixed in space

Fixed distance

13

αnC

δ

τ

z

σ

α1+nC

Peptide axis rotation:With the two end carbons fixed in space, the peptide unit rotates about the virtual bond

δτσ +=

14

z

σ

τ

θ

α

η

ξ

x

y ),,,;( ηξθαστ ±= FTransfer Function for concerted rotations: rotations of two peptide units about respective virtual bonds couple to preserve bond angle at Ca atom

15

τ

c

O

σ

ξ

y

x

zThe 4-bar spherical linkage

θ

α

η

16

τσξ θ η

α

x

BN M

Az

y

STetrahedral Angle

17

22

2

22

2

12sin;

11cos

2tan

12sin;

11cos

2tan

uu

uuu

ww

www

+=

+−

=⇔⎟⎠⎞

⎜⎝⎛=

+=

+−

=⇔⎟⎠⎞

⎜⎝⎛=

τττ

σσσ

18

Equation of the Tetrahedral AngleRaoul Bricard, 1897

(study of flexible octahedra)

( )( )

( )( )ηξαθ

ηξαθηξ

ηξαθηξαθ

+++=+−+=

−=−++=−−+=

coscoscoscossinsin4

coscoscoscos

EDCBA

( ) ( ) 0222 =++++ EDuCuwwBAu

19

http://eo.wikipedia.org/wiki/Raoul_Bricard

Bricard Flexible Octahedra

C

D

E

A

FB

A

C 'C

D

B 'B

E

A

B

C

D

20

EA

F

B

CD

1α

2α

3α

1τ

2τ

3τ

constraints: enforced as relations among the 3 torsions

21

( ) ( )

( ) ( )

( ) ( )

2/tan

0

0

0

0023201311

2102

2322

0022203211

2302

2222

0021202111

2202

2122

iiu

CuCuuCuCuC

BuBuuBuBuB

AuAuuAuAuA

τ=

=++++

=++++

=++++

General 6-member ring (Octahedron)

22

1C

2C

3C

4C

5C

6C 3τ1

2

4

slightly perturbed 6-member ring may assume 4 distinct conformations

Canonical cyclohexane

o

o

o

DCEEACCEAACE

FABABC

5.3260

115

===<=

==<=<=<

==<==<

δβ

α

γ

( ) ( )( ) ( )( )

( ) ( )γαγγ

αγγαγ

++=−−=

−==−+=

coscoscos1

coscoscoscos

i

i

ii

i

EC

DBA

( ) ( ) 02, 12

122

12

1 =++++= ++++ iiiiiiiiiiiii EuuCuuBuuAuuG

( ) ( ) ( ) 0,,, 133221 === uuGuuGuuG

( ) 0, =uuG( )

( )

Chair:

Twist Boat:

⎟⎟⎠

⎞⎜⎜⎝

⎛ +−±=

−====

−

+

−

βγτβαβαγ

τ

sinsincossinsincoscoscoscos

,,3,2,1,tan2

1

122

1

s

ststiu

ii

i

rigid, comes from double rootof tetrahedral equation

For flexible solutions to exist, the tetrahedral equation must be satisfied by arbitrary triplets! This is not an obvious fact.

( ) ( ) 02, 2222 =++++= ECuvvuBvAuvuG

Bricard Conjecture

For the system of biquadratics to have a continuum of solutions, it must be derivable via successive elimination, from the equations:

00

321321

211332

=+++=+++

uuPuNuMuLupunuumuulu

easily seen to be a sufficient condition, not obvious that it is necessary (and not proven)!

Sufficient conditions for flexibility

The system of biquadratics can be reduced toBricard’s sufficient condition if:

BCBAEDB

22 −=

=

It is easy to see that the cyclohexane coefficients satisfy these relations; therefore the three equations in three variables are equivalent to two equations in three variables!

uw vw

u

v

conformation circle for cyclohexanetwist-boat. As u spans the feasible range, so do v and w. Any triplet (u,v,w) corresponds to a molecular conformation

v

w

The conformation space of a flexible octahedron is composed of circles

A. Bushmelev & I. Sabitov

x

( )( )βα

βα−−+=+−+=

cos2cos2

min

max

abbaxabbax

3M1M

4M2M

αβb

a

3M1M

4M

2M

αβ

b

a

'4M

'2M

The extreme values of x are found when M4 lies on the plane of M1, m2, M3. If the range for the upper half overlap with that of the lower, there are solutions.

case 2: one circle

cases 3(resp. 4): 2 circles, fused at 1(resp. 2) points

case 3: 2 circles

EA

F

B

CD

1α

2α

3α

1τ2τ

3τ

23a

EA

F

B

CD

1α

2α

3α

1τ

2τ

3τ

1δ2δ

3δ

1θ

1η

1ξ'D

'F

'B

23b

( ) ( )

( ) ( )

( ) ( )

2/tan,1

0

0

0

0023201311

2102

2322

0022203211

2302

2222

0021202111

2202

2122

iiii

iii u

uv

CvCuvCuCvC

BvBuvBuBvB

AvAuvAuAvA

δ=ΔΔ−

Δ+=

=++++

=++++

=++++

24

( )( )( )

( ) ( ) ( )

( ) ( ) ( ) 0

0

0

301312

32

302312232

0001202

1102112212

2120221

2222

1=++

=++

=+++++

+++

uCuuCuuC

uBuuBuuB

AuAuAuAuAuA

uAuAuA

Three constraint equations:three generalized angles

25

Elimination in Polynomial systems

• Pairwise elimination: Sylvester resultant• Simultaneous elimination: Dixon resultant• Both methods lead to a single polynomial in

one of the variables: 16th degree• Differ in complexity and numerical

accuracy

26

( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )⎥

⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=

1322211

1322211

1322211

2121

,,,

,,,vpvpvvpvpupuvpupupuup

vvuuC

( ) ( )( )22112121

det,,,vuvu

Cvvuud−−

=

Dixon resultant: details

Elimination matrix

Dixon polynomial

Vanishes at all common roots of original system.27

( )

[ ]

[ ]TuuuuuuuuuuU

vvvvvvvvvvV

VDUvvuud

2312

21212

31

211

321

32

221

222121

2121

1

1

,,,

=

=

=

Dixon matrix: coefficients of Dixon polynomial, arranged by monomials in x, y

If u1, u2 are roots of original system, U becomes a right null vector. Therefore the vanishing of det D is a necessary condition for the existence of a common root.

28

( )

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

==

000000000000000

00000

00

00

00

00

:det4

210

210

210

210

2221

21

1211

21

0201

21

2220

20

1210

20

0200

20

2221

21

1211

21

0201

21

2220

20

1210

20

0200

20

2220

20

1210

20

0200

20

2120

10

1110

10

0100

10

2220

20

1210

20

0200

20

2120

10

1110

10

0100

10

2222

CCCCCC

CCCCCC

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

SSCBAD

The Dixon resultant contains an extraneous factor (deg. 16)Coutsias, Seok, Wester, Dill, IJQC 2005

29

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=

000000000000000

0000000

0000

00

210

210

210

210

210210

210210

210210

210210

iii

iii

iii

iii

iiiiii

iiiiii

iiiiii

iiiiii

i

CCCCCC

CCCCCC

NNNMMMNNNMMM

MMMLLLMMMLLL

S

031232

031232

SuSuSS

LuLuLL kkkk

++=

++=

30

00

00

det10

33

=⎟⎟⎠

⎞⎜⎜⎝

⎛⎥⎦

⎤⎢⎣

⎡−−

−⎥⎦

⎤⎢⎣

⎡SSI

uS

I

Eliminating the extraneous factor there results a matrix polynomial of degree 2; its determinant can be computed with a companion matrix of size 16X16. Then the real eigenvalues give u3 for all possible conformations. Values for the other two variables are found from the appropriate eigenvector components

Successive elimination (Sylvester resultant method) results in a matrix of size 6X6 but with quartic coefficients, leading to a companion matrix of size 24X24 (but still a 16th degree polynomial)

31

QZ or Characteristic PolynomialEither:

• (1) Solve generalized eigenproblem using QZ algorithm (cubic order in matrix size, here approx. 16^3 ~4kflops)Or:

• (2a) Use Lagrange’s expansion in complementary minors to efficiently compute the characteristic polynomial of the Dixon resultant (~2.2kflops).

• (2b) Use Sturm sequences (count number of real zeros between two values) for an efficient computation of real zeros from characteristic polynomial (bisection/Newton: variable, but low, cost)

Rotate segments 1,2 about resp. virtual axes by angles τ1, τ2

Rotate entire loop about virtual axis 3 by angle τ3

Choose values of the τ1,τ2,τ3 to fix bond angles θ at 3 junctions

1τ2τ

3τ1θ

2θ

3θ

Method calculates all (0-16)possible loop configurationsthat bridge given gap.

are anchorbonds defining the two endsof the loop.

VC

TriaxialLoop Closure

1 2

3

32

10 solutions from RNA modeling

33

Extensions to More General Cases

Longer Loop Closure

General Loop Sampling

previous

new

34

α3C

α2C

α1C

3x

2x

1x

3z

1z2z

y

yy 1τ

2τ

3τ33 δτ +

22 δτ +

+

Reconstruction Procedure:

1. Rotate about axis 3 by angle to place triangle into space. Compute local space axes.

2. Bring objects 1,2 into their own body frame, so that the corresponding C atoms are located in the positive-X direction (while the Z-axis is the body axis).

3. Rotate each of 1, 2 by ,

4. Place 1, 2 in space, bringing each set of body axes into coincidence with respective space axes.

α2C 3τ−

2τ1τ

1σ

1τ

1α

1a2a

3a4a

3x

2x

1x1R

4R

1z

4z1s

4s

3G

2GCrank

Follower

Two-revolute, two-spherical-pair mechanism 35

Lee & Liang, Mech. Mach. Theory, 1988

36a

Lee-Liang

Loop

Closure

S. Pollock, Undergrad. thesis, UNM ‘07

36bS. Pollock, Undergrad. thesis, UNM ‘07

36cS. Pollock, Undergrad. thesis, UNM ‘07

36dS. Pollock, Undergrad. thesis, UNM ‘07

36eS. Pollock, Undergrad. thesis, UNM ‘07

“The Mt. Everest of kinematics”

A. Freudenstein

36f

The formulation is quite involved; however it eventually leads to a generalized eigenproblem of identical form as the one found in the Triaxial case with the Dixon resultant, so the solution from that point on is identical for the two methods. Key difference is the considerable additional overhead for carrying out the Hartenberg-Denavit transform. The numerical computation in this form is approximately 3 times slower than for the Triaxial-Dixon algorithm (comparison in progress).

http://affiliates.allposters.com/link/redirect.asp?item=1985700&AID=424961&PSTID=1&LTID=2&lang=1

1θ 8θ

7θ

6θ

5θ4θ

3θ2θ

o115

Comparison of algorithms: exhaustive covering of the conformational space of cyclooctane. Two torsions set to arbitrary values, other 6 determined to satisfy closure 37

Pollock & Coutsias, (preprint, 2007)38a

Low res. plots of solution no. and energy (only lowest energy state shown, no solvation)

energyno. of solutions

6

6

6

4

4

2

2

8θ

6θ

AutoEncoder

IsoMap

Cyclooctane as Benchmark for Automated Dimensionality Reduction

• Automated dimensionality reduction algorithms offer the potential for visualization of high dimensional energy landscapes and more efficient sampling in surrogate spaces for general problems in protein folding, molecular recognition, self assembly, etc.

• Examples:– Principal Component Analysis (PCA)– Locally Linear Embedding (LLE)– IsoMap– Autoencoder Neural Network (AE)

• How well do they perform on cyclooctane?with M. Brown, J.-P. Watson, Sandia Nat. Labs

Automated Dimensionality Reduction Must Be Able To…

1. Identify the dimensionality of a manifold2. Provide a forward map into low

dimensional space (Visualization)3. Provide a reverse map into high

dimensional space (Recover meaningful structure from low-dimensional samples)

4. Reconstruction error – Error between original structure and structure obtained following forward and reverse mapping.

with M. Brown, J.-P. Watson, Sandia Nat. Labs

Results

• Can obtain successful reduction from D=72 (with hydrogens), 24, or 8 (dihedral angles) to ~3-5, but not 2.


3-D Embedding from Isomapwith M. Brown, J.-P. Watson, Sandia Nat. Labs

Manifold Tearing?• Cannot guarantee a smooth embedding of an

n-manifold in n dimensions• For the problems of interest, it may not be

necessary to ensure a strict preservation of topology as long as the reconstruction error is minimized.

• Tearing the manifold or considering only a subset of charts at a time may allow for a more efficient encoding– Self organizing maps or methods for automating

the tearing might be beneficialwith M. Brown, J.-P. Watson, Sandia Nat. Labs

Examples• By restricting 1 of the independent dihedral

angles to (0,pi/2) domain or by using a single molecular dynamics trajectory a successful reduction to 2 dimensions is obtained:


Completeness/efficiency issues• Complete sampling: use perturbation theory of eigenvalues

to ensure sampling at sufficient resolution to detect all possible bifurcations and crossings

• Efficiency: .5 degree grid sampling requires ~1hr CPU time on a 2.8GHz Xeon processor (compared to 4 months for

calculation by Ros et al JCC 2007, using distance geometry)

BxAxBxAx βαλ =⇒=

( )ε

βαλ

OFE

yByAyByAFBBEAA

=

=⇒=⇒⎭⎬⎫

+=+=

22,

~~~~~~~~~

( )2~,~~,~ εβα OxByxAy HH +=

Stewart & Sun, Matrix Perturbation Theory, Academic Press, 1990

40

Numerical Sensitivity Issues• Numerical errors accumulate roughly like the square root of

the number of algebraic operations• Setting up the resultant for the LL algorithm involves approx

4x the amount of computation as for the TLC algorithm• The two resultants have similar forms, with nonzero entries in

the same locations, so that the generalized eigenproblemsinvolve identical computations

• The sensitivity of the eigenproblem depends on the separation of eigenvalues/eigenvectors

• Near degeneracies, the sensitivity in the eigenproblem serves as an amplifier of numerical errors in the matrix elements

Dangerous territory: in a broad sampling need to tread carefully for trouble may be lurking!

Bricard curve

1st chaotic ridge, in 6-8 projection

4

6

38b

8θ

6θ

39

1st chaotic ridge, shown in t5-t8 projection, is pathological for both TLC and LL algorithms

8θ

5θ

10^-5 stepsize

Cyclooctane conformation analysis(under the hood!)

• The TLC calculations employ 3 equations and use t5, t8 as control parameters

• The LL computations employ additionally descriptions in terms of (t6,t8) and (t4,t8), not possible for TLC

• The various descriptions are equivalent, as alternative 2d projections of a 2-parameter surface in 8 dimensional space.

• The phenomena seen can be more easily explained by employing a 4-equation generalization of TLC, using as control parameters the angles (a1, a2) which serve to completely characterize the quadrilateral formed by atoms (1-3-5-7)

1α

2α

3(5)

β

γ1z

2z

3z4z

1(1)

2(3)

4(7)

( )2/tan ii z=τ

(6)

(8)

(2)

(4)

34z

1

2

4

δ

2/tan,1 ii

ii

iii δ

ττσ =Δ

Δ+Δ−

=

( )2/tan ii s=σ

δ−= 44 zs

3

1

2

4

δ

4s4z

( )

( ) ( ) 12/tan2/tan2cos0

2/tan,1

02

02

02

02

21

2212142

242

21

242

1241431

231

24

231

2232322

222

23

222

1221211

211

22

211

≤=≤

=ΔΔ+

Δ−=

=++++

=++++

=++++

=++++

ααδ

δτ

τσ

σσσσσσ

ττττττ

σσσσσσ

ττττττ

i

ii

EDCBA

EDCBA

EDCBA

EDCBA

( )

( )

πααα

γπβ

βαγβ

αγβαγ

≤+≤

−=

−−=−=

=−=+−=

21

2

,02

2coscossin2

coscos2coscos

i

ii

i

iii

ii

EC

DBA

The coefficients for cyclooctane; the canonical values are γ=115, β=32.5 deg.

Determining backbone torsions:spherical trigonometry

1

34

2

5

1z 2z

2t 3t

2αβ

γ

3122

2222

cossinsincoscoscossinsincoscoscossinsincoscoscossinsincoscos

tt

βγβγτβαβαβγβγτβαβα

−=−−=−

Canonical cyclooctanequasi-planar conformation

o

o

o

5.32817

90571135357713115812123

21

===<=

==<=<==<=<

==<==<

δβ

αα

γ

( ) ( ) ( ) ( )( ) ( ) ( )( )

( ) ( )γγγ

γπγγγγπγ

sincoscos1

cos2/coscossincos2/coscos

−=−−=

=−==+=−+=

i

i

ii

i

EC

DBA

Square scaffold conformation

( ) ( )

43214

1421

2414

21

24

21

24

:021sin12cos

τττττ

ττττγττττττγ

→→→→

=−−++−++cyclic

1τ

4τ

3τ

2τ

o115=α

o5.32== δβo90=γ

1

23

4

2

34

5

6

7

8

1t

7t

6t

5t

4t 3t

2t

8t

Flexible conformation

{ } [ ]( ) 02

,,,,,,,2222

81

=++++

−−−−==

ECuvvuBvAu

vvuuvvuut ii

When the 1-3-5-7 frame forms a square, t6=-u and t8=v lie on a Bricard curve and atoms 2,4,6,8 may rotate freely about the square frame. Two eigenvalues coincide, leading to the sensitive behavior. This coincidence is accidental, not associated with merger and disappearance of solutions, i.e. not associated with a boundary where solution number changes

v

u

Bricard curve

( )( )2/tan

2/tan

8

6

tvtu

=−=

The 1st chaotic ridge is a Bricard curve, shown here in terms of the half-tangents of the control parameters

2nd chaotic ridge, (6-8). Inaccurate solutions found.

4

6

38c

another accidental degeneracy, with no simple geometrical interpretation

6θ

8θ

610−

710−

Hybrid Algorithm

• Triaxial Loop Closure (paired axes) preferrablewhen applicable – faster, more robust

• Lee-Liang (independent axes) loop closure: more general, employed when necessary

• Similar resultant: equivalent solution as eigenproblem or polynomial (new formulation)

• Automatic switching between two methods• Additional algebraic safeguards against accidental

degeneracies (optimal choice of solution variable,nongeneric flexibility detection)

41

locked hinge

free hinge

bond

free torsion

42

locked hinge

free hinge

bond

virtual torsions

constraints

virtual axes

5 coupled tetrahedrals=> 2*2^5 = 64 possible solutions

example: additional backbone distance constraints

43

locked hinge

free hinge

bond

first loop

Example: Sampling a contact: sample triad positions and double loop closure

second loop

44

locked hinge

free hinge

bond

first loop

Example: Sampling a contact: sample triad positions and double loop closure

second loop

45

Sobol sampling

A quasirandom sampling scheme that retains nearly uniform density in an n-

dimensional box for arbitrary resolution

46

seed = 1 to 100 +

48a

seed = 1 to 100 + , seed = 101 to 200 +

48-b

seed = 1 to 100 + , seed = 101 to 200 o

48c

seed = 1 to 1000 + , seed = 9001 to 10000 +

48d

seed = 1 to 10000 +

48e

General

49

Proline

Glycine

Ramachandran regions

Ramachandran region sampling

• Sobol samples hypercubes• dedicating two dimensions per residue, would

“waste” ¾ of sample points at the most unlikely areas (probability<5%)

• Simple: 95% probability region for each residue is pixelated at 5deg squares, and laid out in a linear array. Each residue is assigned 1 Sobol dimension

• Advanced: each pixel in 95% region stretched according to probability measure

50

Applications

• modeling of ring molecules • Loop refinement• Complementing Rosetta Relax algorithm

for sampling loop and helix variability• A simple algorithm for Helical Protein

Assembly using maximal hydrophobic packing

51

refinement:CASP7 target T308

best model

native best 5 among 100 TLC samples

Loop

preliminary result

Coutsias, Jacobson, Dill (2007)4

Sampling loops and repositioning helices in Rosetta

Mandell, Coutsias, Kortemme (2007, work in progress)

Can apply to longer loops by sampling dihedrals and applying closure

Can also perturb fragments like helices and close surrounding loops

50

Current Backbone Flexibility in Rosetta: Relax

Ensemble generated using Rosetta Relax from a single PDZ domain

Natural structural variation of a family of PDZ domains

Single PDZ domainwith ligand

Fragment insertion +local backbone moves

51

Loop closure complements fragment insertion

• Combining loop closure with fragment insertion might yield even more realistic ensembles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

0

0.5

1

1.5

2

2.5

RM

SD

fro

m 1

BE9

C-a

lph

a

Residue C-alpha

Rosetta Loop closure Natural

Rosetta Relax Natural

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

0

0.5

1

1.5

2

2.5

RM

SD

fro

m 1

BE9

C-a

lph

a

Residue C-alpha

Rosetta Loop closure Natural

Loop closure

sheets helix

52

Triaxial Loop Closure - An Efficient Method for Sampling

BackbonesModeling the antigen-binding antibody H3 loopApo conformation

starting from Holo

using Rosetta Relax

triaxial loop closure(D. Mandell, Kortemme lab,UCSF)

54

Comparison: Rosetta Relax vs. TLC

• the Rosetta Relax structures are generated by fragment insertion and small torsion moves over the whole structure,

• the TLC loop is generated by choosing 3 fixed pivots, and sobol-sampling the remaining torsions (the rest of the protein is fixed). Best candidate selected using Rosetta’s energy function.

• The Roseta Relax protocol thus took much longer (~1000 times, since it is operating on the entire structure).

55

Modeling PDZ domain variability

• PDZ domains are topologies frequently found in scaffold proteins that organize complexes, particularly those involved in cell trafficking.

• They frequently bind the C-terminus of a protein ligand, as well as peptides internal to the ligand. PDZ domains have a well-defined core typically consisting of 6 strands and 2 helices arranged in a compact, globular structure.

• For example, substrate binding consistently occurs at the interface formed by two strands and a helix.

1be9 helix ensemble, generated by perturbation of helix torsions, corrected by loop closure. The helix forms an interface together with the adjacent beta strand also shown as a backbone trace. The ligand is shown in green with side chains.

To compare the ensemble with naturally occurring variation, the wild type of this PDZ domain is shown in cyan superimposed with another PDZ domain in blue. The helix from the first PDZ domain is in purple backbone trace and the second is in orange backbone trace. The ligandsare also shown in dark green and light green (side-chain trace only for the second).

much of the variationbetween the naturally occurring helices is captured in the ensemble

TLC as a refinement strategyin new Rosetta loop modelingprotocol (1thg 12 residue loop)

native new

old

with D. Mandell, T. Kortemme, UCSF

Minimization of Angle Deviationin loop modeling by fragment assembly

Joulyan Lee, Soongsil U., Chaok Seok, Seoul Natl. U., (S. Korea), and E.Coutsias

• Fragment assembly methods often applied to protein structure problems.• When structures are generated for a segment such as a protein loop,

assembled fragments are not fit into the frame of the protein of interest exactly.

• Deviation of the dihedral angles of the loop from the fragment angles are minimized here to maintain the features obtained from the structure database as well as possible,

• Two methods have been tried: 1) Monte Carlo simulation and 2) A local minimization in the space of loop conformations satisfying the loop closure constraint. In both method, root-mean-square deviation in dihedral angles is used as the objective function.

Application of Our Loop Closure Method to Protein Loop Modeling: Move Sets in

Monte Carlo MinimizationMonte Carlo Minimization: Sampling of Local Minima

LTD Move Sets Baysal and Meirovitch

J. Phys. Chem. A 101, 2185 (1997).Loop Closure Move Sets

movie

move

minimize

8 residue loop of turkey egg whitelysozyme

Finding Native structure

Loop Modeling Using Monte Carlo Minimization: Example

movie

Loop Modeling: Increased Efficiency and Accuracy

LTD

CSJD

Native

Coutsias et al, J. Comput. Chem. 25, 510 (2004)Baysal, C. and Meirovitch, H., J. Phys. Chem. A, 101, 2185, (1997 )

Monte Carlo simulation: 1 driver angle is perturbed randomly within 10 deg, 6 torsion angles are used to close the loop. kT=0.5 deg and 2000 MC steps. 20 independent simulations starting from different initial loop closure were performed for each starting conformation generated from fragment assembly.Minimization using the kinematic Jacobian: 100 steps of steepest descent minimization, and subsequent LBFGS-b minimization (termination criterion: function decrease: 10^7*machine precision, gradient: 10^(-3)). 20 independent minimization starting from different initial loop closure.Results: two loops, 8-residue loop of 135l (residues 84-91) and 12-residue loop of (35-46), were tested. R_ave is RMSD averaged over the different conformations generated from fragment assembly. R_min is the min RMSD. Dphi_init is the initial deviation in angles, and dphi_ave is the final deviation averaged over the conformations. The overallperformance of the two methods is similar, but the computation time is much faster with Jac method.

with C. Seok , J. Lee

Protein135l 1eco

length8 12

# of conf from fragment assembly 87 127

dphi_init (deg)34.3 28.0

R_ave (A) MC 2.5 3.9

Jac 2.7 5.2

R_min (A) MC 1.0 1.3

Jac 1.1 1.3

dphi_ave (deg) MC 18.1 15.8

Jac 17.0 15.4

Time (sec) MC 38.9 91.7

Jac 5.4 5.1


135l

0

5

10

15

20

25

30

35

0 0.5 1 1.5 2 2.5 3 3.5 4

RMSD (A)

Devi

ation (

deg)

MC

Jac


1eco

0

5

10

15

20

25

30

0 2 4 6 8 10

RMSD (A)

Devi

ation (

deg)

MC

Jac


Assembly of helical proteins

A simple heuristic based on fast loop closure and maximal hydrophobic packing—as measured by radius

of gyration of the Ca atoms in hydrophobic residuesMotivated from need to improve assembly

performance of Dill group’s Zipping & Assembly strategy for tertiary structure prediction

G.A. Wu, E.A. Coutsias, K.A. Dill, Iterative Assembly of Helical Proteins by Optimal Hydrophobic Packing, (2007, in press)

56

Related works• Huang, Samudrala, Ponder (JMB 1999): Rg packing• Yue & Dill (Prot Sci 2000): special angles• Fain & Levitt (JMB 2001): database scoring; graph

theoretic/combinatorial assembly• Nanias et al. (PNAS 2003): simplified energy (Ca rep.)• Narang et al. (Phys Chem Chem Phys 2005): torsion sampling,

knowledge constraints• McAllister et al. (Proteins, 2006), contact prediction, distance

restraints• Zhang, Hu, Kim (PNAS 2002): torsion angle dynamics• …• Issues: Assembly, (Loop closure), Scoring

57

Paths not followed 1: Object assembly en masse: identical equations to loop closure. A possible approach to avoid searching. Assemble all elements at once into geometrically feasible configurations.

58

1, 2 rings

3 rings

4 rings

all possible graph topologies with up to 4 rings, with only 3-node contacts

http://topology.health.unm.edu

Graph Theory: assemble using topological graphs and combinatorics

59

Wester, Pollock, Coutsias, Allu, Muresan & Oprea, Topological Analysis of Molecular Scaffolds II: Analysis of Chemical Databases (submitted) (2007)

The assembly algorithm

• Begin with a protein whose secondary structure is known to contain helices (as determined, e.g. by DSSP, Kabsch,Sander, Biopolymers 22, 2577-2637 (1983) ) remove loops and consider the problem of placing the helices relative to each other

• Align two helices; score each alignment; select best subset, close loop(s).

• Align next helix with assemblage of first two, close loop; iterate

• Cluster/rank by RgH; select best candidates based on a hydrophobic packing criterion.

60

Initial arrangement: juxtaposition of Ca atoms for two hydrophobic residues at 10 A

61a


• 1) do shortest loop 1st

61b


• 1) do shortest loop 1st• 2) always add one helix at a time (i.e. the 2nd

shortest loop may not be second, unless it is next to the shortest loop) 61c

• starting with long loops, we would need to keep less compact conformations (in addition to compact ones) in order to ensure the native conformation is covered.

62a


• by adding long loop closure at the later steps, we have a more limited conformation space to explore due of excluded volume effects (i.e. steric constraints with the pre-assembled parts).

62b


• by adding long loop closure at the later steps, we have a more limited conformation space to explore due of excluded volume effects (i.e. steric constraints with the pre-assembled parts).

• Amber force field energy minimization is done after loop closure for better sterics (30-60 sd/cg steps)

62c

Sample all possible hydrophobic pairings between two helices

2 DoF sampled: translation and rotation about alignment axis

Limited by loop closability

Ensemble of structures generated, ranked by RgH, clustered by mutual RMSD, lowest RgH structure kept per cluster

Cluster cutoff heuristic: (n-1)Ang, n = #of helices in assembly63

2HEP (2hx)

Topology 1

a

rmsd = 1.27(32)

1PRB (3hx):

assembly order:

2-12

1

Topology 2

a

rmsd = 1.5(1)

Larger proteins

• subtle connectivity topology errors mayexist, although overall RMSD is still good

66

2CRO (5 helix)assembly order:

2-1-3-41

4

3

2

Topology 22

a

b

Topology 21

rmsd = 3.99(55)

Disulfide bridged proteins

• RgH scoring not discriminating: native notamong few best structures

• Imposing known disulfide bonds introduces sufficient restrictions (at least in the 5 proteins we attempted) that still allowed us to sample near-native structures

• TLC applicable to S2 bridges (but not included in current implementation); used amber9 with restraints to close bridges for these studies

68

1C5A (4hx)

assembly order:

3-2-1

1

23a

Topology 6 (21)RMSD = 2(103), 2.49(17)

1RPO 1.52(3) 2 helices

70

1BDD 1.58(24) 3 helices

71

1HP8 2.45(30) (3 helices, 2 S2 bridges)

72

2MHR 2.14(8) 4 helices

73

2ICP 5.10(15), 4.09(137) 5 helices

74

Subtleties re. clustering rmsd

• a large rmsd cutoff (~ 3 Ang) during clustering, may lead to a smaller number of centroids but may end up throwing the smallest rmsd conformation away, assuming (falsely) it is represented by another in the cluster

• using a low rmsd cutoff for clustering may keep best rmsd conf, but will become computationally too expensive, or may miss the best conf among the top ranking ones by RgH (because of too many candidates, with insufficient variability).

76

Helical protein gallery: nativevs. lowest rmsd model

)32(27.1 )3(52.1 )24(58.1 )19(92.1 )21(67.1

)13(51.1

)6(93.1

)20(80.1 )1(50.1 )13(03.2

)1(80.1 )2(21.2 )34(50.2 )10(67.2 )19(96.2

77a

[ ]3 [ ]2 [ ]3 [ ]2 [ ]2

)30(29.2 )8(91.2 )22(54.4 )47(21.3 )42(22.4

)8(14.2 )43(00.4 )12(01.4 )15(10.5 )23(59.4

)30(45.2 )9(69.1 )17(49.2 )2(21.3 )28(73.2

77b

Thanks

• Ken Dill, Albert Wu (now at UC Berkeley), Matthew Jacobson, Tanja Kortemme, Dan Mandell, Pharmaceutical Chemistry, UCSF

• Michael Wester, Sara Pollock (now at Dept of Applied Math, U. Washington), Dept of Math and Division of Biocomputing, UNM

• Mike Brown, Jean-Paul Watson, Sandia National Labs

• Chaok Seok, Dept. of Chemistry, Seoul National University, S. Korea, Julian Lee, Soongsil University, S. Korea

• Bob Lewis, Fordham U., Manfred Minimair, Setton Hall U.

78

Support: NIH, UNM Division of Biocomputing

PEP25:CLLRMKSAC

21

3

BA

C

Designing a 9-peptide ring

pep virtual bond3-pep bridge

design triangle

cysteine bridge 9-pepring

ModelingR. Larson’s9-peptide

(P-G-P-G-dA)

native

sample from exhaustive enumeration of conformations for cyclic pentapeptide P-G-P-G-dA

evangelos coutsias dept of mathematics and statistics ......evangelos coutsias dept of mathematics...

Documents