wei huang · web viewtheoretical modeling of multiprotein complexes by ispot: integration of...
TRANSCRIPT
SUPPORTING INFORMATION
FOR
Theoretical modeling of multiprotein complexes by iSPOT: Inte-
gration of small-angle X-ray scattering, hydroxyl radical footprint-
ing, and computational docking
Wei Huang1*, Krishnakumar M. Ravikumar1*, Marc Parisien2, and Sichun Yang1
1
SI Figure 1. A combined rotation-translational RotPPR method for MD simulations.
(A) A rotationally uniform pose generator. On the left is a set of uniform grid points on a pro-
tein’s surface, based on a recent concept that was originally developed for rigid-body docking.
Shown are a set of 25 Fibonacci grid points on a protein's surface, each representing the pro-
jection of the center-of-mass of one protein on the surface of another protein. For comparison,
on the right shown is a non-uniform distribution of the same number of 25 grid points generated
using the traditional equal-angular meshing, where more points are condensed near north/south
poles.
(B) Translation-centric PPR sampling. Each pull-push-release (PPR) cycle is repeated where a
biasing force via a harmonic spring between the centers-of-mass is imposed to drive two
proteins to move along a target distance Rt alternating between its upper and lower bound (Rmax
and Rmin, respectively). Below a cutoff distance Rc, this biasing force is removed and the system
is released for free brute-force simulations (marked as green regions).
2
(C) Combination of the PPR sampling and the rotationally-uniform pose generator. Illustrated on
one protein's surface, a brute-force simulation starts from a starting pose (illustrated by the
projection of the center-of-mass of one protein on the other protein's surface marked by a red
circle) generated from the rotationally-uniform pose generator resulting in a limited sampling
around its starting point. The RotPPR method, combining the translation-centric PPR strategy
and the rotational pose generator, collectively enables an extensive conformational sampling.
3
SI Figure 2. Rapid convergence of exhaustive RotPPR search. (A) Projection of the centers-of-
mass of FKBP12 on the surface of TGFβ (with both a front and back view), where each
simulation snapshot is represented by a blue dot. (B) Shown is a reverse projection of the
center of TGFβ on the surface of FKBP12. (C) A subset of configurations (making contacts in
the region marked by a red circle in panel B) are reversely projected, showing that both proteins’
rotations are extensively sampled. (D) The sampling starts to converge as a function of PPR
cycles of each starting pose, each of which lasts 20 ns with a total of 200 ns per PPR run. This
analysis was achieved by dividing the first protein's surface into a set of grids each with a 20°
angular separation, projected the structures within each grid on the second protein's surface
with the same 20° resolution, and then counted how many grids are sampled on the second
surface. For clarity, only 5% of the total simulation data are shown in (A) and (B).
4
SI Figure 3. Comparison of fRMSD and oRMSD clustering. Ensemble structures of the very
same cluster (near its crystal-structure TGFβ-FKBP12) is shown when fRMSD (A,B) and
oRMSD (C,D) clustering are used. Within the same cluster, all structures are aligned in two
different ways one with the full complex used for alignment (A, C) and the other with only
FKBP12 (in blue) is aligned (B,D). Compared to fRMSD clustering results (A,B), it appears that
oRMSD clustering (C,D) yields a sharp, well-defined structure-ensemble regarding relative
orientation between the proteins. The same cutoff value of 8 Å was used for both fRMSD and
oRMSD clustering algorithms.
5
SI Figure 4. Assessment of sparseness in footprinting data coverage. (A) All surface residues
(Nfp=116) used for modeling are separated into two categories: buried in the interface (Nin=16; in
red) and outside the interface (Nout=98; in blue). (B) Half of all residues are randomly selected
6
from each category (f=50%). (C) A strong correlation of is found with a high correlation
coefficient of ρ=0.9 when only half of data points (a fraction f=50%, i.e., 57 residues) were used.
(D) The histogram-frequency distribution of correlation coefficient ρ for three different factions
f=75%, f=50% and f=25%, respectively. Each histogram was calculated based on a set of
10,000 random sample sets of residues. (E) A plot of ρ values (mean and s.d.) calculated using
various fractions of Nin and Nout are shown.
7
1. Exhaustive RotPPR search in MD-based docking simulations
1.1. Translation-centric PPR sampling. Following our published protocol (Ravikumar et al.,
2012), we used a pull-push-release (PPR) sampling scheme for coarse-grained docking
simulations of two individual proteins, where each protein has a high-resolution structure
available. The PPR focuses on sampling translational DoFs between the proteins along the
center-of-mass distance R6 (shown in Fig. 2). The PPR works by repeating the following three
steps (illustrated in SI Fig. 1): (i) pull the two proteins away from each other based on a given
starting configuration, (ii) push them closer together to dock, and (iii) release the push-pull bias
so that the two proteins can interact freely with one another. For the pull and push portions of
each PPR cycle, a biasing potential EPPR is imposed on the center-of-mass distance R6 between
the two proteins as follows,
EPPR={0 ,if R t≤Rc∧rm<7.6Å ,k PPR (R6−Rt )
2, if Rt>Rc , (1)
where Rt is the target distance that oscillates linearly within a confined region of Rmax≤ Rt≤ Rmin
and this Rt distance is closely followed by the instantaneous distance R6 (shown in SI Fig. 1)
before entering the release portion. A cut-off distance of Rcwas used to define the entrance and
exit of the unbiased simulation region and a minimum distance ofrm between any pair of
residues was used to define docking contacts made. In the simulations, a force constant of
k PPR=¿ 100 kcal/(mol Å) was used. Rmax, Rc, and Rmwere chosen on a case-by-case basis
depending on the size of the protein complex. For each complex, at least two PPR cycles were
used during the RotPPR sampling.
1.2. A rotationally uniform pose generator. In contrast to a traditional angular mesh-grid used
in most protein-ligand docking, a set of uniformly distributed initial poses were generated for
8
protein-protein docking simulations, spanning over five rotational degrees-of-freedom (i.e., ω1 –
ω5 shown in Fig. 2) as follows:
1. The center of each protein was fixed at the origin and their principal axes were aligned
along the global Cartesian coordinate axis.
2. The largest principal axis of each protein (aligned to the x-direction; marked by an arrow
in Fig. 2) is rotated to point toward each uniformly distributed grid point on a unit radius
spherical surface (illustrated in SI Fig. 1A). These points, often known as Fibonacci grids,
were generated on the surface at the position of polar and azimuthal angles(Θ ,φ )=¿),
whereΦ=(1+√5)/2 is the golden-ratio. The index k runs over (-a, -a+1, -a+2, … -1, 0, 1,
…, a-2, a-1, a) where a = -(n2-1)/2 [21-23]. The pose generation results in a total of n2
poses for each protein, independently. Any resulting steric clashes between the two
proteins are removed by translating the second protein away along the x-axis. In
combination, a total number of n4 starting poses were generated for docking simulations,
covering the four rotational DoFs (i.e., ω1 – ω4 shown in Fig. 2).
3. The rotational DoF between the two proteins (ω5 shown in Fig. 2) was divided n times
as well. As such, a grand total of n5 starting poses are generated over all five rotational
DoFs. For all simulations presented this work, the value of n=5 was used,
resulting in a grand total of n5 = 3,125 starting poses.
2. Energy function of RotPPR docking simulations
2.1. Predictive coarse-grained (CG) modeling. Following Ravikumar et al (Ravikumar et al.,
2012), we defined the energy function between two interacting proteins (EPPI), where each
residue is coarse-grained and represented by a single bead at its Cα position. This CG
modeling is predictive because it does not take prior knowledge of the protein complex
9
structure. Specifically, EPPIincludes the contributions from electrostatic (Eelec) and hydrophobic (
EH) interactions (Kim and Hummer, 2008; Ravikumar et al., 2012).
EPPI=∑i , j
Eelec (i , j )+¿EH (i , j)¿, where Eelec=qi q i
4 π ϵoDeff rij (q i is the charge of residue i, and ϵ ois
the vacuum permittivity). An effective dielectric coefficient Deff=Ds erij/ξ was used with Deff=10,
and ξ = 8.2 Å. Residue charge q i=+e for Lys and Arg, +0.5e for His, and −e for Asp and Glu
were used to mimic the condition at pH=7. Hydrophobic interactions (EH) were modeled either
as an attractive interaction ( ) or a purely repulsive interaction ( ), given by
EH={ εij[5 (σ ij/r ij )12−6 (σ ij /r ij )
10]
εij5 (σ ij /r ij)12[1−exp (− (r ij−σ ij)
4
2
)] (2)
where ε ij=α (eijMJ+β ), based on the Miyazawa-Jernigan e ij
MJ statistical energy between residues
i and j (Miyazawa and Jernigan, 1996). We used d=3.8Å and σ ij=γ (r i+r j)/2, where σ ij is the
van der Waals radius of coarse-grained residue i. The values of α=0.4, β=1.3 and γ=1.25 were
used, as used in previous publications (Huang et al., 2014; Huang et al., 2013; Ravikumar et al.,
2012). The parameters of ε ij and σ ij for pair-wise residue interactions were described previously
(Huang et al., 2014).
2.2. Structural flexibility of individual proteins. A modest amount of structural flexibility
(within individual proteins) is allowed for and defined by a structure-based Gō-type model as
described previously (Yang et al., 2006), where each residue was simplified by a single bead at
its Cα position. See details in our previous publication (Ravikumar et al., 2012).
2.2. Langevin-based simulations. The RotPPR sampling was implemented as a Langevin-
based coarse-grained simulation using CHARMM (Brooks et al., 1983). The simulations were
carried out at a temperature of 300 K with a friction coefficient of 50 ps -1. A time-step of 0.01 ps
10
was used. Each PPR cycle lasted 20 ns. Coordinated were saved every 500 ps during the last 5
ns of the release-part in each PPR cycle. A series of 10 PPR cycles were performed from each
starting pose.
2.3. Structure Clustering. The clustering scheme of grouping similar structures together is
based on two specific RMSD metrics (defined below). It consists of two phases: assignment and
merging. In the assignment phase, each structure is assigned to a cluster if its shortest distance
(dm) to its respective cluster center is within a cutoff distance (Dmax). Here, the center was
defined as the "averaged" structure from all the members within the corresponding cluster. A
new cluster is created if dm > Dmax, which is achieved in a similar fashion to the widely used
nonhierarchical clustering algorithm based on a self-organizing neural net (Feig et al., 2004;
Karpen et al., 1993). The center of each cluster is updated whenever a new member is
assigned to the cluster. In the merging phase, two clusters are merged only if all the members
of one cluster are within the cut-off distance to another cluster's center. Subsequently, the
information about centers of new or reassigned clusters is updated accordingly after merging.
This process of assignment and merging is repeated until there are less than 1% of the total
structures having their membership changed from the previous iteration. To measure the
distance of each structure to its cluster center, two specific distance metrics, namely fRMSD
and oRMSD, were used as follows.
2.3.1. fRMSD. The traditional RMSD (referred to hereafter as fRMSD defined for an entire "full"
protein-protein complex) was used as a distance metric. Specifically, the fRMSD value for two
structures u and v was calculated after the alignment of their Cα atoms by
fRMSD=√∑i( xi
u−xiv)2
N A+NB
(3)
11
where i's are the indices of Cα atoms over the entire protein-protein complex. NA and NB are the
number of Cα atoms in the two proteins (say, protein A and protein B), respectively, and xiu and
xiv are the position vectors of the ith Cα atom in structure u and v, respectively.
2.3.2. oRMSD. In contrast with fRMSD, an orientation-specific RMSD metric, hereby termed
oRMSD, was used to account for the relative, mutual orientation between two proteins.
Specifically, the oRMSD value between two structures u and v is defined by
oRMSD=√ N A R MSDA2 +N BRMSDB
2
N A+N B
(4)
where RMSDAis the structural distance of protein A between the two structures (u, v) but the
alignment is performed on the basis of protein B alone,
RMSDA=√∑j(x j
u−x jv)2
N A
(5)
where j's are the indices of Cα atoms in protein A. The same procedure is applied to protein B,
whose RMSDBvalue was given by
RMSDB=√∑k(xk
u−xkv )2
N B
(6)
where k's are the indices of Cα atoms of protein B. Thus, the oRMSD value is defined by
explicitly accounting for the relative orientations between protein A and protein B (SI Eq. 4).
2.4. Energetic stability. After the post-simulation clustering analyses, each conformational
cluster was assigned an effective energy Eeff by
(7)
12
where is the protein-protein interacting energy in each configuration (see SI Eq. 2). Note that
within each cluster, all configurations were weighted based on a Boltzmann distribution, so Eeff
effectively measures the relative energetics across different conformational clusters, conceptu-
ally similar to what has been used in WHAM (Kumar et al., 1995; Roux, 1995).
13
References
Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M., 1983. Charmm - a Program for Macromolecular Energy, Minimization, and Dynamics Calcula-tions. J Comput Chem 4, 187-217.
Feig, M., Karanicolas, J., Brooks, C.L., 3rd, 2004. MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology. J Mol Graph Model 22, 377-395.
Huang, W., Ravikumar, K.M., Yang, S., 2014. A Newfound Cancer-Activating Mutation Re-shapes the Energy Landscape of Estrogen-Binding Domain. J Chem Theory Comput 10, 2897-2900.
Kim, Y.C., Hummer, G., 2008. Coarse-grained models for simulations of multiprotein com-plexes: application to ubiquitin binding. J Mol Biol 375, 1416-1433.
Kumar, S., Rosenberg, J.M., Bouzida, D., Swendsen, R.H., Kollman, P.A., 1995. Multidimen-sional Free-Energy Calculations Using the Weighted Histogram Analysis Method. J Comput Chem 16, 1339-1350.
Miyazawa, S., Jernigan, R.L., 1996. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 256, 623-644.
Roux, B., 1995. The Calculation of the Potential of Mean Force Using Computer-Simulations. Comput Phys Commun 91, 275-282.
Yang, S., Onuchic, J.N., Levine, H., 2006. Effective stochastic dynamics on a protein folding en-ergy landscape. J Chem Phys 125, 054910.
14