wei huang · web viewtheoretical modeling of multiprotein complexes by ispot: integration of...

SUPPORTING INFORMATION

FOR

Theoretical modeling of multiprotein complexes by iSPOT: Inte-

gration of small-angle X-ray scattering, hydroxyl radical footprint-

ing, and computational docking

Wei Huang1*, Krishnakumar M. Ravikumar1*, Marc Parisien2, and Sichun Yang1

1

SI Figure 1. A combined rotation-translational RotPPR method for MD simulations.

(A) A rotationally uniform pose generator. On the left is a set of uniform grid points on a pro-

tein’s surface, based on a recent concept that was originally developed for rigid-body docking.

Shown are a set of 25 Fibonacci grid points on a protein's surface, each representing the pro-

jection of the center-of-mass of one protein on the surface of another protein. For comparison,

on the right shown is a non-uniform distribution of the same number of 25 grid points generated

using the traditional equal-angular meshing, where more points are condensed near north/south

poles.

(B) Translation-centric PPR sampling. Each pull-push-release (PPR) cycle is repeated where a

biasing force via a harmonic spring between the centers-of-mass is imposed to drive two

proteins to move along a target distance Rt alternating between its upper and lower bound (Rmax

and Rmin, respectively). Below a cutoff distance Rc, this biasing force is removed and the system

is released for free brute-force simulations (marked as green regions).

2

(C) Combination of the PPR sampling and the rotationally-uniform pose generator. Illustrated on

one protein's surface, a brute-force simulation starts from a starting pose (illustrated by the

projection of the center-of-mass of one protein on the other protein's surface marked by a red

circle) generated from the rotationally-uniform pose generator resulting in a limited sampling

around its starting point. The RotPPR method, combining the translation-centric PPR strategy

and the rotational pose generator, collectively enables an extensive conformational sampling.

3

SI Figure 2. Rapid convergence of exhaustive RotPPR search. (A) Projection of the centers-of-

mass of FKBP12 on the surface of TGFβ (with both a front and back view), where each

simulation snapshot is represented by a blue dot. (B) Shown is a reverse projection of the

center of TGFβ on the surface of FKBP12. (C) A subset of configurations (making contacts in

the region marked by a red circle in panel B) are reversely projected, showing that both proteins’

rotations are extensively sampled. (D) The sampling starts to converge as a function of PPR

cycles of each starting pose, each of which lasts 20 ns with a total of 200 ns per PPR run. This

analysis was achieved by dividing the first protein's surface into a set of grids each with a 20°

angular separation, projected the structures within each grid on the second protein's surface

with the same 20° resolution, and then counted how many grids are sampled on the second

surface. For clarity, only 5% of the total simulation data are shown in (A) and (B).

4

SI Figure 3. Comparison of fRMSD and oRMSD clustering. Ensemble structures of the very

same cluster (near its crystal-structure TGFβ-FKBP12) is shown when fRMSD (A,B) and

oRMSD (C,D) clustering are used. Within the same cluster, all structures are aligned in two

different ways one with the full complex used for alignment (A, C) and the other with only

FKBP12 (in blue) is aligned (B,D). Compared to fRMSD clustering results (A,B), it appears that

oRMSD clustering (C,D) yields a sharp, well-defined structure-ensemble regarding relative

orientation between the proteins. The same cutoff value of 8 Å was used for both fRMSD and

oRMSD clustering algorithms.

5

SI Figure 4. Assessment of sparseness in footprinting data coverage. (A) All surface residues

(Nfp=116) used for modeling are separated into two categories: buried in the interface (Nin=16; in

red) and outside the interface (Nout=98; in blue). (B) Half of all residues are randomly selected

6

from each category (f=50%). (C) A strong correlation of is found with a high correlation

coefficient of ρ=0.9 when only half of data points (a fraction f=50%, i.e., 57 residues) were used.

(D) The histogram-frequency distribution of correlation coefficient ρ for three different factions

f=75%, f=50% and f=25%, respectively. Each histogram was calculated based on a set of

10,000 random sample sets of residues. (E) A plot of ρ values (mean and s.d.) calculated using

various fractions of Nin and Nout are shown.

7

1. Exhaustive RotPPR search in MD-based docking simulations

1.1. Translation-centric PPR sampling. Following our published protocol (Ravikumar et al.,

2012), we used a pull-push-release (PPR) sampling scheme for coarse-grained docking

simulations of two individual proteins, where each protein has a high-resolution structure

available. The PPR focuses on sampling translational DoFs between the proteins along the

center-of-mass distance R6 (shown in Fig. 2). The PPR works by repeating the following three

steps (illustrated in SI Fig. 1): (i) pull the two proteins away from each other based on a given

starting configuration, (ii) push them closer together to dock, and (iii) release the push-pull bias

so that the two proteins can interact freely with one another. For the pull and push portions of

each PPR cycle, a biasing potential EPPR is imposed on the center-of-mass distance R6 between

the two proteins as follows,

EPPR={0 ,if R t≤Rc∧rm<7.6Å ,k PPR (R6−Rt )

2, if Rt>Rc , (1)

where Rt is the target distance that oscillates linearly within a confined region of Rmax≤ Rt≤ Rmin

and this Rt distance is closely followed by the instantaneous distance R6 (shown in SI Fig. 1)

before entering the release portion. A cut-off distance of Rcwas used to define the entrance and

exit of the unbiased simulation region and a minimum distance ofrm between any pair of

residues was used to define docking contacts made. In the simulations, a force constant of

k PPR=¿ 100 kcal/(mol Å) was used. Rmax, Rc, and Rmwere chosen on a case-by-case basis

depending on the size of the protein complex. For each complex, at least two PPR cycles were

used during the RotPPR sampling.

1.2. A rotationally uniform pose generator. In contrast to a traditional angular mesh-grid used

in most protein-ligand docking, a set of uniformly distributed initial poses were generated for

8

protein-protein docking simulations, spanning over five rotational degrees-of-freedom (i.e., ω1 –

ω5 shown in Fig. 2) as follows:

1. The center of each protein was fixed at the origin and their principal axes were aligned

along the global Cartesian coordinate axis.

2. The largest principal axis of each protein (aligned to the x-direction; marked by an arrow

in Fig. 2) is rotated to point toward each uniformly distributed grid point on a unit radius

spherical surface (illustrated in SI Fig. 1A). These points, often known as Fibonacci grids,

were generated on the surface at the position of polar and azimuthal angles(Θ ,φ )=¿),

whereΦ=(1+√5)/2 is the golden-ratio. The index k runs over (-a, -a+1, -a+2, … -1, 0, 1,

…, a-2, a-1, a) where a = -(n2-1)/2 [21-23]. The pose generation results in a total of n2

poses for each protein, independently. Any resulting steric clashes between the two

proteins are removed by translating the second protein away along the x-axis. In

combination, a total number of n4 starting poses were generated for docking simulations,

covering the four rotational DoFs (i.e., ω1 – ω4 shown in Fig. 2).

3. The rotational DoF between the two proteins (ω5 shown in Fig. 2) was divided n times

as well. As such, a grand total of n5 starting poses are generated over all five rotational

DoFs. For all simulations presented this work, the value of n=5 was used,

resulting in a grand total of n5 = 3,125 starting poses.

2. Energy function of RotPPR docking simulations

2.1. Predictive coarse-grained (CG) modeling. Following Ravikumar et al (Ravikumar et al.,

2012), we defined the energy function between two interacting proteins (EPPI), where each

residue is coarse-grained and represented by a single bead at its Cα position. This CG

modeling is predictive because it does not take prior knowledge of the protein complex

9

structure. Specifically, EPPIincludes the contributions from electrostatic (Eelec) and hydrophobic (

EH) interactions (Kim and Hummer, 2008; Ravikumar et al., 2012).

EPPI=∑i , j

Eelec (i , j )+¿EH (i , j)¿, where Eelec=qi q i

4 π ϵoDeff rij (q i is the charge of residue i, and ϵ ois

the vacuum permittivity). An effective dielectric coefficient Deff=Ds erij/ξ was used with Deff=10,

and ξ = 8.2 Å. Residue charge q i=+e for Lys and Arg, +0.5e for His, and −e for Asp and Glu

were used to mimic the condition at pH=7. Hydrophobic interactions (EH) were modeled either

as an attractive interaction ( ) or a purely repulsive interaction ( ), given by

EH={ εij[5 (σ ij/r ij )12−6 (σ ij /r ij )

10]

εij5 (σ ij /r ij)12[1−exp (− (r ij−σ ij)

4

2

)] (2)

where ε ij=α (eijMJ+β ), based on the Miyazawa-Jernigan e ij

MJ statistical energy between residues

i and j (Miyazawa and Jernigan, 1996). We used d=3.8Å and σ ij=γ (r i+r j)/2, where σ ij is the

van der Waals radius of coarse-grained residue i. The values of α=0.4, β=1.3 and γ=1.25 were

used, as used in previous publications (Huang et al., 2014; Huang et al., 2013; Ravikumar et al.,

2012). The parameters of ε ij and σ ij for pair-wise residue interactions were described previously

(Huang et al., 2014).

2.2. Structural flexibility of individual proteins. A modest amount of structural flexibility

(within individual proteins) is allowed for and defined by a structure-based Gō-type model as

described previously (Yang et al., 2006), where each residue was simplified by a single bead at

its Cα position. See details in our previous publication (Ravikumar et al., 2012).

2.2. Langevin-based simulations. The RotPPR sampling was implemented as a Langevin-

based coarse-grained simulation using CHARMM (Brooks et al., 1983). The simulations were

carried out at a temperature of 300 K with a friction coefficient of 50 ps -1. A time-step of 0.01 ps

10

was used. Each PPR cycle lasted 20 ns. Coordinated were saved every 500 ps during the last 5

ns of the release-part in each PPR cycle. A series of 10 PPR cycles were performed from each

starting pose.

2.3. Structure Clustering. The clustering scheme of grouping similar structures together is

based on two specific RMSD metrics (defined below). It consists of two phases: assignment and

merging. In the assignment phase, each structure is assigned to a cluster if its shortest distance

(dm) to its respective cluster center is within a cutoff distance (Dmax). Here, the center was

defined as the "averaged" structure from all the members within the corresponding cluster. A

new cluster is created if dm > Dmax, which is achieved in a similar fashion to the widely used

nonhierarchical clustering algorithm based on a self-organizing neural net (Feig et al., 2004;

Karpen et al., 1993). The center of each cluster is updated whenever a new member is

assigned to the cluster. In the merging phase, two clusters are merged only if all the members

of one cluster are within the cut-off distance to another cluster's center. Subsequently, the

information about centers of new or reassigned clusters is updated accordingly after merging.

This process of assignment and merging is repeated until there are less than 1% of the total

structures having their membership changed from the previous iteration. To measure the

distance of each structure to its cluster center, two specific distance metrics, namely fRMSD

and oRMSD, were used as follows.

2.3.1. fRMSD. The traditional RMSD (referred to hereafter as fRMSD defined for an entire "full"

protein-protein complex) was used as a distance metric. Specifically, the fRMSD value for two

structures u and v was calculated after the alignment of their Cα atoms by

fRMSD=√∑i( xi

u−xiv)2

N A+NB

(3)

11

where i's are the indices of Cα atoms over the entire protein-protein complex. NA and NB are the

number of Cα atoms in the two proteins (say, protein A and protein B), respectively, and xiu and

xiv are the position vectors of the ith Cα atom in structure u and v, respectively.

2.3.2. oRMSD. In contrast with fRMSD, an orientation-specific RMSD metric, hereby termed

oRMSD, was used to account for the relative, mutual orientation between two proteins.

Specifically, the oRMSD value between two structures u and v is defined by

oRMSD=√ N A R MSDA2 +N BRMSDB

2

N A+N B

(4)

where RMSDAis the structural distance of protein A between the two structures (u, v) but the

alignment is performed on the basis of protein B alone,

RMSDA=√∑j(x j

u−x jv)2

N A

(5)

where j's are the indices of Cα atoms in protein A. The same procedure is applied to protein B,

whose RMSDBvalue was given by

RMSDB=√∑k(xk

u−xkv )2

N B

(6)

where k's are the indices of Cα atoms of protein B. Thus, the oRMSD value is defined by

explicitly accounting for the relative orientations between protein A and protein B (SI Eq. 4).

2.4. Energetic stability. After the post-simulation clustering analyses, each conformational

cluster was assigned an effective energy Eeff by

(7)

12

where is the protein-protein interacting energy in each configuration (see SI Eq. 2). Note that

within each cluster, all configurations were weighted based on a Boltzmann distribution, so Eeff

effectively measures the relative energetics across different conformational clusters, conceptu-

ally similar to what has been used in WHAM (Kumar et al., 1995; Roux, 1995).

13

References

Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M., 1983. Charmm - a Program for Macromolecular Energy, Minimization, and Dynamics Calcula-tions. J Comput Chem 4, 187-217.

Feig, M., Karanicolas, J., Brooks, C.L., 3rd, 2004. MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology. J Mol Graph Model 22, 377-395.

Huang, W., Ravikumar, K.M., Yang, S., 2014. A Newfound Cancer-Activating Mutation Re-shapes the Energy Landscape of Estrogen-Binding Domain. J Chem Theory Comput 10, 2897-2900.

Kim, Y.C., Hummer, G., 2008. Coarse-grained models for simulations of multiprotein com-plexes: application to ubiquitin binding. J Mol Biol 375, 1416-1433.

Kumar, S., Rosenberg, J.M., Bouzida, D., Swendsen, R.H., Kollman, P.A., 1995. Multidimen-sional Free-Energy Calculations Using the Weighted Histogram Analysis Method. J Comput Chem 16, 1339-1350.

Miyazawa, S., Jernigan, R.L., 1996. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 256, 623-644.

Roux, B., 1995. The Calculation of the Potential of Mean Force Using Computer-Simulations. Comput Phys Commun 91, 275-282.

Yang, S., Onuchic, J.N., Levine, H., 2006. Effective stochastic dynamics on a protein folding en-ergy landscape. J Chem Phys 125, 054910.

14

wei huang · web viewtheoretical modeling of multiprotein complexes by ispot: integration of...

Documents