point scattering: a new geometric invariant with ... · pdf filepoint scattering: a new...
TRANSCRIPT
Point Scattering: A New Geometric Invariant with
Applications From (Nano)Clusters to Biomolecules
ERNESTO ESTRADA
Complex Systems Research Group, X-ray Unit, RIAIDT, Edificio CACTUS, University of Santiagode Compostela, 15782 Santiago de Compostela, Spain
Received 2 May 2006; Revised 10 August 2006; Accepted 16 August 2006DOI 10.1002/jcc.20541
Published online 16 January 2007 in Wiley InterScience (www.interscience.wiley.com).
Abstract: A new geometric invariant is defined from ‘‘first principles’’ for a point ensemble, which can represent
clusters, molecules, crystals, and biomolecules. The scattering of a point ensemble is defined in terms of the Euclidean
distance matrix and a vector measuring the weighted departure of the points from the cluster centre. Using the Rayleigh–
Ritz theorem this function is maximized obtaining the point scattering of the ensemble. The point scattering shows
several properties which are useful for studying clusters, molecules, crystals, and biomolecules. We examined different
natural clusters of hard spheres such as colloidal particles and fullerenes, as well as protein–peptide complexes and the
effect of temperature on protein structure. In all cases point scattering differentiates point ensembles with different
structures, which are not distinguished by other geometric invariants, such as the second moment of mass distribution,
surface areas, and volumes. Point scattering also shows better correlation with thermodynamic parameters of binding
and describes the interior cavities of hollowed ensembles better than the other geometric measures.
q 2007 Wiley Periodicals, Inc. J Comput Chem 28: 767–777, 2007
Key words: Euclidean distance; discrete geometry; molecular geometry; protein packing; fullerenes; clusters
Introduction
A geometric invariant is a quantity which remains unchanged
under certain classes of transformation, such as the group of
translations or rotations.1 They are useful for comparing objects
because they usually reflect intrinsic properties of objects. Among
the most well-known geometrical invariants we can mention the
perimeter, second moment of mass distribution, surface area, and
volume.1 The last two are particularly of great interest in chemis-
try because of their relationship with the concept of packing,
which is a fundamental and essential characteristic of natural sys-
tems.2,3 In particular these concepts are necessary in understand-
ing protein structure and for uncovering the relationship between
packing and stability as well as for the study of solute hydropho-
bicity.4 A very well known example is provided by the average
packing density inside proteins,5–8 which is as high as in crystal-
line solids. It is known that a number of oligomeric proteins,
including, acetylcholine receptors, aquaporins, tight junction
occluding, and claudins, and gap junction channels arrange into
densely packed clusters, arrays or strands in the plasma mem-
brane.9–13 The optimal packing of tubes has also received atten-
tion as a model for understanding the way in which a DNA mole-
cule could be packed within a small virus.14–17 Packing is also an
important characteristic of clusters, such as colloidal materials
like photonic crystals and macroporous media.18,19
In macromolecules, such as proteins and nucleic acids, the sur-
face area and volume are commonly used as geometric invari-
ants,6,20,21 which are related to various molecular properties, such
as stability, solubility, crystal packing or molecular recognition.
Some of these measures, such as the hydrophobic surface area,
excluded volume, and radius of gyration, have been incorporated
into energy minimization algorithms used in studies of the three-
dimensional (3D) structure of proteins.22–24 The development of
global optimization procedures for clusters, crystals, and biomole-
cules is of great importance in fields ranging from protein struc-
ture prediction to the design of microprocessor circuitry.25 In
computational geometry, the second moment of mass distribution
is often used as a geometric invariant which is related to packing
density26–30 and has been found to be of great utility in the study
of colloidal clusters obtained from densely packed micro-
spheres.18,19
In this article, we introduce a new measure of packing that
accounts for the scattering of points in a discrete ensemble. This
measure solves some of the problems found with other packing
measures and reveals several other useful properties with which
to study clusters, molecules, crystals, and biomolecules.
Contract/grant sponsor: Ramon y Cajal, Spain.
Correspondence to: E. Estrada; e-mail: [email protected]
q 2007 Wiley Periodicals, Inc.
Theoretical Approach
Some ‘‘Classical’’ Geometric Invariants
Some of the most widely used geometric invariants are related to
the area and volume of surfaces. There are three definitions of
‘‘surface’’ which are widely used as geometric invariants of
points, particularly molecules, as well as for the definition of
packing measures.31–33 They are the van der Waals (vdW) sur-
face, the solvent accessible surface (SA), and the molecular sur-
face (MS). The first is defined as the surface of what is covered
by the points, i.e., atoms, which are represented by spherical balls
with radii equal to their vdW radii. The second surface is gener-
ated by the center of the solvent, which is modeled by a rigid
sphere, when rolling over the vdW surface of the cluster or mole-
cule. The third surface is generated by the front of the same sol-
vent sphere. The area and volume of such surfaces can then be
determined by approximation techniques or by analytic methods.
One of these methods, which treats volume and area overlaps
fully and accurately, is the ‘‘alpha shape method.’’20
The second moment of mass distribution is defined as: M2 ¼Si¼1n kri � r0k2, where ri is the centre coordinate of the ith vertex,
e.g., centre of cluster or atom, and r0 ¼ n�1 Si¼1n ri is the centre
of mass of the object, e.g., cluster or molecule, subject to the con-
straint kri � rjk � 2 for i = j. (Here k. . .k denotes the usual Eu-
clidean distance). M2 is the sum, divided by n, of the squared dis-
tances between all pairs of points. That is, M2 ¼ 1n
Pi>jðrijÞ2,
which geometrically represents the sum of the areas of the squares
formed by the edges of length rij.26 M2 has also been interpreted as
an energy function in determining clusters of hard spheres26 as
well as in communications, where the points represent a constella-
tion of n signals with total energyM2.34,35 It is known that the clus-
ters obtained using this energy function are quite different from
those using other potentials, such as Lennard–Jones for n � 8. The
second moment is intimately related to the radius of gyration, Rgyr,
another parameter which provides information regarding the global
conformation of a system, which is widely used in polymer statis-
tics.36 It can be calculated from the second moment M2 as follows:
Rgyr ¼ (M2/n)1/2.
Definition of Point Scattering
In this work we will use the term ‘‘point’’ to designate a body
whose spatial extent and internal motion and structure, if any, are
irrelevant to the specific problem under study, which is the pack-
ing of points in the discrete object. Consequently, a point here
can be a sphere, colloidal particle or an atom which is part of a
cluster or a molecule, which represents the discrete object.
We start by considering an object O formed by n points. Let us
represent the object by means of a column vector x, whose kthentry captures the relative departure of point k from the geometri-
cal centre, o, of the object. The entries of the vector x, xi, take val-ues between 0 and 1. They represent a sort of weighted distance
from the center of the object in which the outlying points receives
more weights than the points closest to the center. We impose the
restriction that the norm of this weight vector x be one or xxT ¼ 1.
Then, based on the Euclidean distances between the pairs of
points in the object, rij, we can define a measure for the spreading
of the points in O in a similar way as used in spectral clustering
techniques37–39:
SðOÞ ¼Xni¼1
Xnj¼1
rijxixj ¼ xTDx (1)
where D is the Euclidean distance matrix of the points in the
object. The function S(O) increases with the increase in the sepa-
ration between points as well as with the departure of the points
from the centre of the object. Consequently, we consider S(O) asa measure of the scattering of the points in the discrete object,
which will take minimum values for the least scattered objects.
We are interested in finding the maximum value of the scatter-
ing function, which can be obtained by maximizing the expres-
sion (2). Let {�1, �2, . . . , �n} be the nondecreasing order of the
eigenvalues of D and let xi be the orthonormal eigenvector corre-
sponding to the ith eigenvalue:kxk2 ¼ xxT ¼ 1.40 Then, according
to the Rayleigh–Ritz theorem,41 we have:
SðOÞ ¼ maxx
xTDxjxTx ¼ 1� � ¼ �1ðDÞ (2)
where �1(D) is the spectral radius or largest eigenvalue of D. Thismaximum is attained when x ¼ x1, where x1 is the principal eigen-
vector of D. It is clear that �1(D), which is our measure of point
scattering, remains unchanged under translation and rotation. This
geometric measure can also be invariant to scaling by normalizing
the distances by using a canonical representation of the object.
However, we will not consider this item in the current work.
The interpretation of the principal eigenvector of D as a rela-
tive weighted measure of the departure of a point from the centre
of the object can be understood by means of the following analy-
sis. It is known that the principal eigenvector is proportional to
the row sum of a matrix M formed by summing all powers of the
distance matrix, weighted by corresponding powers of the recip-
rocal of the principal eigenvalue:
M ¼ limn!1
1
nDþ ��1
1 D2 þ ��21 D3 þ � � � þ ��n
1 Dnþ1� �� �
: (3)
Then, we can consider an object formed by three points placed
on a straight line, in such a way that point i is equidistant from
points j and k, i.e., i is placed at the centre of the object. Let
�a ¼ Pb rab be the sum of distances for the point a in the discrete
object, i.e., the sum of the ath row or column of D. It is obvious
that the minimal value of �a will be obtained for the point which is
at the centre of the object, which has the lowest entries in D: min
�a ¼ �i. The same is also true for the different powers of D, which
makes that the lowest value of the row sum for the matrixM corre-
sponds to the point at the centre of the object. As the ith row sum
of this matrix is proportional to the ith entry of the principal eigen-
vector of D we have that the lowest entry of xl corresponds to the
point at the center of the object. As far as we add more points, for
instance in different orbits from the center, the points more distant
from the center will have larger values of their sum rows in M and
of its components in xl. Then, it is clear that xl is the relative depar-
ture of the points from the centre of the cluster weighted in a way
that the most distant points are weighted more strongly than the
768 Estrada • Vol. 28, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
points closest to the centre. This measure is not appropriate for
comparing points in different clusters because points which are
equidistant from the centre in two different clusters will give the
same values of xl. For instance, the value of xl for the arrangement
of unit spheres in a square and in a tetrahedron are identical, e.g.,
x1 ¼ (0.5 0.5 0.5 0.5) for both objects. In these objects the points
are equidistant from the respective centers, but the distances from
the center to the points in both objects are different, which makes
xl inappropriate for comparing points in the square to points in the
tetrahedron. Consequently, the use of the following measure is
more appropriated for comparing points in different objects. Since
xl is a �1-eigenvector, one has x1(i) ¼ Sjrijx1(j)/�1, so that
SðiÞ � x1 ið Þ�1ðDÞ (4)
can be considered as a measure of local geometric invariant for
the points of the cluster.
Another consequence of the current representation of discrete
objects is that M2 can also be calculated from the spectrum of D. It
can be shown thatM2 is equal to the half sum of the diagonal entries
of D2, i.e., M2 ¼ 12nTrðD2Þ ¼ 1
2n
Pni¼1ðD2Þii, where Tr stands for
the trace of the matrix. It is well-known that Tr Dk� � ¼ Pn
j¼1 �j
� �k,
from which it follows that M2 ¼Pn
j¼1 �j
� �2. The radius of gyra-
tion, Rgyr is then easily expressed in terms of the spectrum of D
as:Rgyr ¼ ½1n
Pnj¼1 �j
� �2�1=2.Computational Methods
The calculations of the point scattering measure for the different
objects described in this work were carried out using a Matlab1 pro-
gram developed in-house. The input for the program is the distance
matrix, which is previously obtained from the Cartesian coordinates of
the points using an implementation in MODESLAB (www.mode-
slab.com). The output of the Matlab1 program are the principal eigen-
value and eigenvector of the distance matrix, which correspond to the
geometric invariants introduced here. Other geometric invariants were
also calculated in this work for the sake of comparison. The first of
them is the second moment of mass distribution, which was calculated
using the squared eigenvalues of the distance matrix obtained from the
Matlab1 program according to: M2 ¼P
nj¼1 �j
� �2. Surface areas and
volumes (SA or vdW) were calculated from the grid method
according to the implementation of the Bodor et al.42 approach
used in HyperchemTM. This method uses the atomic radii of Gave-
zotti.43 In SA calculations the solvent probe radius used was 1.4 A
and we always used 50 points on cube side. The calculations of SA
surface area and volumes of the square grid with holes were car-
ried out using the same grid approach but using hydrogen atoms
with radius equal to 1 A (unit radius spheres) instead of using the
Gavezotti radius for hydrogen, which is 1.17 A (this value can be
modified in the file VDWGRID.TXT of HyperchemTM).
Results and Discussion
Point Scattering in Clusters and Crystals
Clusters of hard spheres can represent a variety of structures
found in nature ranging from pollen grains to crystals and virus
capsids. When applicable, the examples of clusters studied here
can also be considered as crystals in which the spheres are occu-
pying the lattice points of space lattice.44 When considering clus-
ters of hard spheres we will take the centers of the spheres as the
points forming the object. Here we will consider hard spheres,
i.e., noninterpenetrating spheres, of unit radius. Thus, rij � 2 for
every pair of nodes in G.
Scattering in Clusters with Degenerated Second Moments
As a first example we will consider the optimal clusters in two-
dimensions (2D). Graham and Sloane conjectured 15 years ago
that for n = 4 every optimal packing is (up to limits imposed by
symmetry) a subset of the hexagonal lattice A2, which is gener-
ated by (1,0) and �1=2;ffiffiffi3
p=2
� �.29
The exclusion of the four point cluster appears to be due to the
well known fact that the second moment of mass distribution is
unable to differentiate a square and a rhombic arrangements of
points as they both have: M2 ¼ 8a2, where a is the radius of the
circles. This result contradicts the intuitive idea that the rhombus,
which is a subset of A2, is more tightly packed than the square,
which is not a subset of the hexagonal lattice. This situation is found
not only for this pair of clusters but for several other pairs of clus-
ters. In Figure 1 we illustrate some of these examples. On the third
and fourth lines of Figure 1 we show two examples that extend this
observation to 3D clusters. They correspond to a pair of clusters of
six spheres forming a rhombic bipyramid (rhombic octahedron) and
a triangular prism. The last example is provided by the pair of clus-
ters with eight spheres forming a cube and a triangular biprism.
The consideration of S(O) clearly differentiates all these pairs
of different clusters according to their packings. First, S(O) indi-cates that the rhombus has lower point scattering than the square
as expected from the fact that the first is a subset of the hexagonal
lattice. It also indicates that the prism and the biprism are less
scattered than the rhombic pyramid and the cube, respectively
(Fig. 1). In all cases the clusters with optimal packing (according
to second moment of mass distribution)26 having 4 (tetrahedron),
5 (triangular bipyramid), 6 (octahedron or square bipyramid), and
8 (dodecadeltahedron or snub disphenoid) nodes show the lowest
values of S(O) (see last column of Fig. 1).
Square Lattices with Holes
As a second example we build a toy model consisting of square
lattice of 16 spheres in which we remove two spheres at random.
In total there are 21 different configurations of 14 spheres plus
two holes in this square lattice. In Figure 2 we illustrate the
square lattice with sphere numbering as used here and three con-
figurations of 14 spheres and two holes. In Table 1 we give the
values of the several geometric invariants: second moment, SA sur-
face areas and volumes and the values of point scattering, for these
21 configurations of 14 spheres. The vdW surface area and the vol-
umes are the same for all these objects: 175.80 A2 and 58.91 A3,
respectively. As can be seen, there are four pairs and a triple of
clusters having identical values of SA surface areas, one pair, two
triples, and one quadruple of clusters with identical values of the
SA volumes and three pairs of clusters with degenerate values of
M2. However, there is not a single pair of clusters with identical
values of S(O). The geometric invariants show good linear correla-
769New Geometric Invariant with Applications From Clusters to Biomolecules
Journal of Computational Chemistry DOI 10.1002/jcc
Figure 1. Clusters of hard spheres with identical values of the second moment of mass distribution (first
two columns), which are differentiated by the point scattering. The last column shows the optimal pack-
ing for the corresponding number of spheres according to Sloane et al.30
770 Estrada • Vol. 28, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
tions between each other. The best linear correlation is observed for
M2 and S(O) with a correlation coefficient > 0.99. In general, S(O)shows excellent correlations with the other geometric invariants
(Table2). For instance, S(O) has larger correlation coefficients with
the SA surface areas and the volumes thanM2.
Hollowed Clusters: Fullerenes and Virus Capsids
The third example of clusters to be studied here is that of carbon
clusters or fullerenes. A fullerene is a closed cage formed by ncarbon atoms distributed in a Degree-3 network of pentagons and
hexagons on the surface of a spheroid.45 Consequently, fullerenes
are good representatives of hollow clusters in which the carbon
atoms on the surface create nonpolar cavities of different sizes in
the interior of the cluster.46 In fact, these cavities have been used
to model the penetration of water molecules to nonpolar interior
of proteins. Vaitheeswaran et al. have shown that up to nine water
molecules can be easily accommodated in the internal cavity of
C180.47 They have shown that the stability of encapsulated water
clusters depends critically on cavity size. The existence of other
endohedral complexes with fullerenes has been observed experi-
mentally. Among the most significant, one can cite the existence
of ‘‘bucky-onions,’’ in which one fullerene is encapsulated inside
another one, such as C60 @ C240 @ C540 . . . ,48 or the existence
of endohedral metal encapsulated fullerenes, such as Sc2 @ C84
or Sc3 @ C82.49 Thus, the study of the geometric invariants of
these carbon clusters is of great importance for understanding
these inclusion phenomena, which critically depend on the rela-
tion between the size of the cluster and the size of the interior cavity.
Fullerenes also represent interesting models of viral protective pro-
Figure 2. Square grids of spheres of unit radius numbered from 1 to 16 (A). Three clusters of 14
spheres, which are subsets of the square grid, having the lowest (B) (1,4-cluster), intermediate (C) (1,7-
cluster), and the largest scattering (D) (6,11-cluster) among the 21 clusters. The numbers indicate the
spheres which were removed according to the numbering given in A.
771New Geometric Invariant with Applications From Clusters to Biomolecules
Journal of Computational Chemistry DOI 10.1002/jcc
tein shells (capsids), as it is well known that there are numerous
roughly spherical viruses whose capsids display perfect icosahedral
symmetry.50 For instance, the tobacco ringspot virus capsid is com-
posed of 60 copies of a 513-amino-acid capsid protein, each of which
corresponds to one of the atoms in C60 ‘‘buckminsterfullerene.’’51
Here we study the areas and the volumes (vdW and surface
accessible) as well as the second moment of mass distribution
and point scattering of 19 fullerenes ranging from C20 to C540. In
Table3 we give the values of the geometric invariants for all these
carbon clusters.
In general there is very good linear correlation between the
different pairs of measures. However, plots of the geometric
invariants versus the size of the carbon clusters reveal interesting
characteristics of the different invariants. In Figure 3 we illustrate
these plots for S(O) and the vdW area and volume.
The behavior of M2 is similar to that of S(O). The surface acces-sible area and the volume plots are similar to those of the vdW ana-
logues. As can be seen in Figure 3, surface area and volume increase
linearly with the size of the clusters (these values are plotted on an
inverse scale in Fig. 3). This means that the size of the interior cav-
ities of these fullerenes increase linearly with the size of the cluster.
However, a different picture is provided by S(O) (as well as by M2)
which shows a nonlinear increment of the scattering as a function of
the cluster sizes. This indicates that according to S(O) the size of theinterior nonpolar cavity of a fullerene increases nonlinearly with the
increase of the number of carbon atoms—an observation with im-
portant consequences for the study of inclusion complexes in fuller-
ene cavities. For instance, while the optimal number of water mole-
cules that can be encapsulated in C140 is four, giving (H2O)4 @
C140, this number increases up to eight for C180, which leads to
(H2O)8 @ C180.47 Torrens has calculated the volume and area of the
internal cavities of smaller fullerenes, C60, C70, and C82, in which it
is observed that the volume of the internal cavity increases nonli-
nearly with the size of the carbon cluster in agreement with our cur-
rent findings even for these small fullerenes.52 These results illus-
trate the utility of the point scattering as a geometric invariant which
contains information not duplicated by other invariants.
Point Scattering in Proteins
The previous analysis of S(O) in clusters of hard spheres provides
several suggestions as to how this measure quantifies the scatter-
ing of a set of points. Models based on sphere packing have been
used for studying optimal properties of proteins.53 In this sense,
the study of the toy model of spheres in a square lattice with two
holes clearly indicates that the least scattered structures are those
in which the holes are as far as possible from the centre of the
cluster. On the contrary, the structure with the largest scattering
(see D in Fig. 2) corresponds to the one in which the two holes
are adjacent to the middle point of the cluster. This is an impor-
tant characteristic for the study of proteins, which are believed to
have efficiently packed interiors. The existence of internal cav-
ities has also been considered to be an important characteristic of
these macromolecules, which in general are well accounted for
by the S(O) index (see structure B in Fig. 2). Consequently, we
will study here the relationships between S(O), as well as other
geometric invariants, and the thermodynamic parameters of bind-
ing between peptides and proteins in the complex RNase-S. In
this case, point scattering can be referred to as the atomic scatter-
ing as it is calculated for all atoms in the proteins.
The protein–peptide complex RNase-S is obtained by cleavage
of bovine pancreatic ribonuclease A (RNase A) with subtilisin to
give an ‘‘S protein’’ and an ‘‘S peptide.’’ These two fragments can
be reconstituted to give rise to RNase-S, which is catalytically
active with a structure very similar to that of ribonuclease-A
(RNase-A).54,55 The S peptide consists of the first 20 amino acids
of RNase-A, but it has been shown that a truncated version formed
by Residues 1–15 forms a complex with protein S which is struc-
turally identical with RNase-S. There are two hydrophobic resi-
dues in the S peptide that contribute significantly to the stability of
RNase-S. These residues, methionine 13 (M13) and phenylalanine
8 (F8), are buried inside the RNase-S core.55,56 Mutation experi-
ments have led to their replacement by several other smaller
hydrophobic amino acids of different sizes resulting in mutated S
peptides of the type F8X and M13X, where X represents alanine
(A), methionine (M), norleucine (Nle), �-aminobutyric acid
(ANB), valine (V), leucine (L) or isoleucine (I). The structures of
Table 1. Values of Geometric Invariants for the 21 Clusters of
Spheres of Unit Radius Which are Subsets of the Square Grid
as Explained in Figure 2.
Cluster S(O) M2 SASA (A2) VSA (A3)
1,4 53.239 121.43 301.68 386.13
1,16 53.633 123.99 306.13 388.62
1,2 54.587 128.28 301.69 386.13
1,3 54.927 129.14 316.03 399.08
1,8 55.254 130.85 315.52 399.08
1,12 55.362 131.71 315.52 399.08
2,3 56.620 137.43 323.42 402.26
1,6 56.648 137.71 316.38 404.09
1,7 56.704 138.57 316.40 404.09
2,5 56.738 137.71 329.87 412.01
1,11 56.823 139.43 316.40 404.09
2,8 56.937 138.57 330.40 412.05
2,12 57.063 139.43 330.40 412.05
2,14 57.098 139.71 329.85 412.05
2,15 57.129 139.99 329.85 412.05
2,6 58.339 146.57 332.28 415.51
2,7 58.422 146.85 330.73 417.06
2,10 58.529 147.43 330.73 417.06
2,11 58.568 147.71 330.73 417.06
6,7 59.980 155.71 332.19 420.54
6,11 60.042 155.99 331.06 422.08
Table 2. Correlation Coefficients for the Pairs of Geometric Invariants
Analyzed for the 21 Clusters of 14 Spheres of Unit Radius Which are
Subsets of the Square Grid.
M2 SASA VSA
S(O) 0.998 0.867 0.952
M2 0.838 0.936
SASA 0.963
SASA, solvent accessible solvent area; VSA, solvent accessible volume.
772 Estrada • Vol. 28, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
the RNase-S complexes with S peptide mutants were determined
by X-ray crystallography and the free energies (DG8) and enthal-
pies (DH8) of the S-peptide-protein binding were determined using
titration calorimetry.57 In Table 4 we show the values of the differ-
ences in the thermodynamic parameters upon mutation: DDG8 andDDH8, e.g., DDG8 ¼ DG8 (mutant) � DG8 (wild type).
We calculated the values of S(O) as well as areas and vdW
and SA volumes for eight protein–peptide complexes. The values
are given in Table 4 where we also provide the correlation coeffi-
cients of the linear fits between the thermodynamic parameters
and the geometric measures. As can be seen S(O) gives the best
linear correlations for both thermodynamic properties, which has
Table 3. Values of Geometric Invariants for Several Fullerenes Studied Here.
Fullerene S M2 SAvdW (A2) SASA (A2) VvdW (A3) VSA (A3)
C20 (Ih) 53.68 82.22 170.27 318.28 186.06 525.78
C24 (D6) 71.51 121.32 195.21 353.05 221.84 598.12
C26 (D3h) 80.61 142.18 205.91 361.59 238.09 630.73
C28 (Td) 89.81 163.53 216.32 376.62 254.00 660.92
C30 (D5h) 100.68 192.98 227.75 391.77 270.24 696.19
C32 (D3) 109.86 213.92 239.76 398.30 287.05 726.42
C36 (D6h) 131.20 271.05 262.47 425.65 320.10 791.65
C50 (D5h) 214.86 521.84 342.04 500.42 436.83 1,016.75
C60 (Ih) 280.98 742.29 397.31 552.89 516.99 1,169.50
C76 (D2) 405.93 1,223.88 499.18 637.41 662.44 1,453.72
C78 (D3h) 420.25 1,276.38 507.84 651.25 676.12 1,478.92
C78 (D3h) 421.07 1,283.57 508.42 648.06 676.15 1,477.48
C78 (D3) 421.14 1,283.94 508.27 648.89 675.92 1,478.12
C78 (C2v) 420.40 1,277.84 508.27 648.89 675.92 1,478.12
C80 (Ih) 438.35 1,353.59 522.74 668.90 697.89 1,522.54
C180 (Ih) 1,470.1 6,756.38 1,097.36 1,200.40 1536.2 3,139.81
C240 (Ih) 2,277.8 12,161.70 1,472.53 1,542.21 2,070.28 4,177.71
C320 (Ih) 3,476.6 21,244.27 1,917.71 1,952.61 2,722.84 5,430.73
C540 (Ih) 7,596.0 60,084.84 3,196.6 3,131.93 4,571.25 9,004.91
Figure 3. Plot of geometric measures (in reverse scale) versus number of carbon atoms in fullerenes.
[Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]
773New Geometric Invariant with Applications From Clusters to Biomolecules
Journal of Computational Chemistry DOI 10.1002/jcc
correlation coefficients identical to those of VSA. The best linear
correlation previously obtained for DDH8 was by using the
occluded surface area, which measures the internal packing of a
protein, and shows a correlation coefficient identical to those
obtained here with S(O) and VSA, i.e., 0.98. Occluded surface
area shows a correlation coefficient of 0.95 for DDG8, which is
slightly lower than the one obtained here by using S(O) and VSA.
In contrast, other geometric invariants, such as cavity volumes
and SA area, show very poor correlations, while the packing mea-
sure called ‘‘depth’’ shows correlation coefficients of approxi-
mately 0.90 for both thermodynamic parameters of binding.
As previously indicated, the values of S(i) can be used as a
local scattering parameter for the points of a cluster. In the partic-
ular case of proteins, S(i)corresponds to atomic contributions to
protein scattering. It is also possible to consider the average val-
ues of S(i) for the atoms in an amino acid as the scattering of
amino acid residues in the protein: S(res). Using this approach we
have calculated the values of S(res) for all the amino acids in the
wild type RNAse A solved at 1.6 A (2rns). In Figure 4 we plot
the values of S(res) for all residues in 2rns, where the lowest val-
ues of S(res) correspond to the least scattered residues and the
higher values to the most scattered ones. In the same figure we
illustrate the ten least scattered residues as well as the ten most
scattered ones. As can be seen, the least scattered residues corre-
spond to those which are located in the interior of the protein in a
region close to the centre of the structure, which is commonly
identified as the most packed region. The most scattered residues
are located far away from the middle of the protein, in the periph-
eral regions of the protein where residues 1, 68, 88–94, and 113
are found.
In a similar way S(res) can be used to analyze the changes on
residue scattering produced by external factors such as temperature.
As an example we calculate the values of S(res) for all the amino
acids in the RNAse-A, which is a kidney-shaped monomeric
enzyme of 124 residues. The structure of this protein has been
determined by X-ray diffraction at a resolution of 1.5 A at nine dif-
ferent temperatures ranging from 98 to 320 K.53 In previous studies
it has been shown that the protein molecule expands slightly with
increasing temperature which affects principally the degree of fold-
ing of the main backbone of the protein.6,58–60 It was also previ-
ously observed that the 3D structure of RNAse-A undergoes no
dramatic change over the range of temperatures analyzed. Here we
analyze the changes in scattering of individual amino acids with
changing temperature. In Figure 5 we plot the change in the values
of S(res) during step-by-step ‘‘heating’’ of RNAse-A. The first plot
shows the difference in S(res) for the amino acids in the structures
determined at 98 and 130 K, i.e., DS(res) ¼ S130 K(res) �S98 K(res). As can be seen in this figure, most of the amino acids
suffer small changes in their scattering when analyzed in this step-
by-step method. However, a dramatic change in scattering is
observed for residue Gln101, which appears as an intense peak in
most of the plots. For instance, in the 130–98 K plot this residue
appears with an intense positive peak, which indicates that Gln101
is least scattered at 98 K than at 130 K. However, this situation is
inverted in the next step, which indicates that Gln101 returns to a
least scattered conformation when passing from 130 to 160 K. In
Figure 6 we illustrate these two conformations for Gln101. This
alternating change of conformation of Gln101 is observed for all
RNAse-A structures below 220 K and it still appears for the changes
at the highest temperatures, i.e., 120–260 K and 320–280 K. This
residue is located in one of the extended loops of the protein, which
have been previously observed to undergo the most intense move-
ments. In general, the most packed regions at the interior of the pro-
tein structure do not suffer large variations in the scattering com-
pared to the atoms which are in the protruding loops. An exception
is the 180–220 K transition where the scattering of most of the resi-
dues is altered. This change coincides with the known fact that in
the neighborhood of 200 K changes occur in the dynamic properties
of many proteins in solutions and in the crystal state.58 These
dynamic changes are believed to be produced primarily in the coor-
dination shells of water that are bound to the surface of the protein,
which in this case is expressed as an increment in the packing
changes of amino acid residues in the whole protein.
These results clearly indicate that the point scattering appears
to be a convenient geometric invariant, which can account for the
vdW interactions between peptides and proteins as well for the
effects of temperature on protein structure and dynamics. In con-
sequence, this measure can be useful as a geometric parameter
for empirical potentials of energy minimization algorithms for the
study of the 3D structure of proteins.
Table 4. Values of Geometric Invariants as well as Thermodynamic Binding Parameters for the Protein–Peptide Complexes in RNAse-S.
PDB S SAvdW (A2) SASA (A2) VvdW (A3) VSA (A3) DDH8 (258 C) DDG8 (kcal mol�1)
2rln 18,137 11,753.2 6,653.4 10,343.8 22,508.9 �7.9 �0.8
1rbg 18,029 11,758.4 6,586.4 10,324.9 22,391.9 �2.5 �0.7
1rbh 18,030 11,755.7 6,596.6 10,324.9 22,380.3 �2.2 �0.5
1rbi 18,031 11,745.4 6,577.7 10,311.2 22,307.4 �1.9 �0.2
1rbd 18,019 11,719.5 6,539.8 10,292.7 22,294.0 1.3 0.7
1d5e 17,618 11,438.7 6,653.3 10,113.1 22,117.1 10.0 2.9
1d5d 17,554 11,384.2 6,674.8 10,083.6 22,073.1 14.7 3.6
1d5h 17,543 11,447.9 6,593.7 10,117.4 22,037.2 17.7 5.1
r(DDH8) �0.98 �0.94 �0.28 �0.96 �0.98
r(DDG8) �0.96 �0.94 �0.31 �0.95 �0.96
The correlation coefficients between the geometric invariants and thermodynamic parameters are given in the last two rows.
774 Estrada • Vol. 28, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
Conclusions
We have analyzed the main deficiencies of some of the most
widely used ‘‘classical’’ geometric invariants for the analysis of
clusters, molecules, and macromolecular systems, such as pro-
teins. This analysis has led to the introduction of a new measure,
which is based on first principles, and resolves the deficiencies
previously observed associated with other geometric measures,
such as second moment of mass distribution, surface areas, and
volumes. This new geometric invariant accounts for the scattering
of points from the center of the object, e.g., cluster, molecule,
protein, etc. We have proved that the principal eigenvalue of the
Figure 4. Values of residues scattering in ribonuclease-S (PDB 2rns). The largest values of the scatter-
ing correspond to most packed residues, e.g., 88–94 and the lowest values of scattering correspond to
the least scattered residues, e.g., 44–47.
775New Geometric Invariant with Applications From Clusters to Biomolecules
Journal of Computational Chemistry DOI 10.1002/jcc
Euclidean distance matrix represents the point scattering of the
object studied. The Euclidean distance matrix is built by consider-
ing the distances between all pairs of points in the discrete object,
such as the center of spheres in clusters or interatomic distances
in molecules. This matrix can be obtained experimentally, for
example by X-ray diffraction, or by empirical or ab initio optimi-
Figure 5. Change of residues scattering in RNAse-A as an effect of temperature.
Figure 6. Change in conformation of residue Gln101 at two different temperatures which produces the
change in scattering observed in Figure 5. At 98 K this residue is least scattered than at 130 K, which
produces a large positive peak in the first plot of Figure 5. [Color figure can be viewed in the online
issue, which is available at www.interscience.wiley.com.]
776 Estrada • Vol. 28, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
zation procedures, such as molecular mechanics or quantum
chemical calculations.
In this work we have illustrated some of the advantages of the
point scattering for studying hypothetical and real clusters, such as
those that can arise from dense packings of colloidal microspheres
as well as carbon clusters or fullerenes. We have also studied
‘‘atomic’’ scattering in proteins by illustrating the relationship
between this measure and the binding energetics of peptide–pro-
tein interaction as well as the effects of temperature on protein
structure and dynamics. In all cases point scattering appears to be
a convenient geometric invariant, which is easily and exactly com-
puted for any discrete configuration of points, accounting for im-
portant structural characteristics of the objects studied, which are
invariants to the group of rotation and translation of the object.
One of the most significant applications of geometric invari-
ants in chemistry is that they are used in measuring packing.61
For instance, packing efficiency of a given atom is simply defined
as the ratio of the space it could minimally occupy to the space
that it actually does occupy.61 In a similar way as volume and
surface area are applied to define packing efficiency in different
ways, the new geometric invariant we have introduced here can
be used for similar purposes giving it a wide spectrum of applica-
tions in different areas of research.
Acknowledgments
The author thanks Ms. Y. Gutierrez for assistance in developing a
computer program to calculate the point scattering. Prof. D. J.
Klein and Dr. J. A. Rodrıguez-Velazquez are also acknowledged
for useful comments and clarifications.
References
1. Mumford, D.; Forgarty, J.; Kirwan, F. Geometric Invariant Theory;
Springer: New York, 1994.
2. Thompson, D. On Growth and Form; Cambridge University Press:
Cambridge, 1961.
3. Tarnai, T. Struct Topol 1984, 9, 39.
4. Hofinger, S.; Zerbetto, F. Chem Soc Rev 2005, 34, 1012.
5. Liang, J.; Dill, K. A. Biophys J 2001, 81, 751.
6. Fleming, P. J.; Richards, F. M. J Mol Biol 2000, 299, 487.
7. Tsai, J.; Taylor, R.; Chothia, C.; Gerstein, M. J Mol Biol 1999, 290, 253.
8. Pintar, A.; Carugo, O.; Pongor, S. Biophys J 2003, 84, 2553.
9. Knupp, C.; Squire, J. M. Adv Protein Chem 2005, 70, 375.
10. Niu, S. L.; Mitchell, D. C. Biophys J 2005, 89, 1833.
11. Tsukita, S.; Furuse, M. Trends Cell Biol 1999, 149, 268.
12. Tsukita, S.; Furuse, M. J Cell Biol 2000, 149, 13.
13. Yang, B.; Brown, D.; Verkman, A. S. J Biol Chem 1996, 271, 4577.
14. Odijk, T. Biophys J 1998, 75, 1223.
15. Maritan, A.; Mincheletti, C.; Trovato, A.; Banavar, R. B. Nature
2000, 406, 287.
16. Stasiak, A.; Maddocks, J. H. Nature 2000, 406, 251.
17. Banavar, J. R.; Maritan, A. Rev Mod Phys 2003, 75, 23.
18. Manoharan, V. N.; Elsesser, M. T.; Pine, D. J. Science 2003, 301, 483.
19. Yi, G.-R.; Manoharan, V. N.; Michel, E.; Elsesser, M. T.; Yang, S.-M.;
Pine, D. J. Adv Mater 2004, 16, 1204.
20. Liang, J.; Edelsbrunner, H.; Fu, P.; Sudhakar, P. V.; Subramanian, S.
Proteins: Struct Funct Genet 1998, 33, 1.
21. Voss, N. R.; Gerstein, M. J Mol Biol 2005, 346, 477.
22. Kuszewski, J.; Gronenborn, A. M. M.; Clore, G. M. J Am Chem Soc
1999, 121, 2337.
23. Baker, B. M.; Murphy, K. P. Methods Enzymol 1998, 295, 294.
24. Luque, I.; Freire, E. Methods Enzymol 1998, 295, 100.
25. Wales, D. J.; Scheraga, H. A. Science 1999, 285, 1368.
26. Conway, J. H. M.; Sloane, N. J. A. Discrete Comput Geom 1995, 13,
282.
27. Conway, J. H.; Sloane, N. J. A. Sphere Packing, Lattices and Groups,
3rd ed.; Springer: New York, 1999.
28. Chow, T. Y. Combinatorica 1995, 15, 151.
29. Graham, R. L.; Sloane, N. J. A. Discrete Comput Geom 1990, 5, 1.
30. Sloane, N. J. A.; Hardin, R. H.; Duff, T. D. S.; Conway, J. H. Discrete
Comput Geom 1995, 14, 237.
31. Richards, F. M. Ann Rev Biophys Bioeng 1977, 6, 151.
32. Lee, B.; Richards, F. M. J Mol Biol 1971, 55, 379.
33. Connolly, T. J Appl Cryst 1985, 16, 548.
34. Campopiano, C. N.; Blazer, B. G. IEEE Trans Commun 1962, 10, 90.
35. Foschini, G. J.; Gitlin, R. D.; Weinstein, S. B. IEEE Trans Commun
1974, 22, 28.
36. Flory, P. J. Statistical Mechanics of Chain Molecules. Interscience:
New York, 1969.
37. Hotta, S.; Inoue, K.; Urahama, K. Electron Commun Jpn 2003, 86,
80.
38. Inoue, K.; Urahama, K. Pattern Recognit Lett 1999, 20, 699.
39. Ng, A. Y.; Jordan, M. I.; Weiss, Y. In Advances in Neural Informa-
tion Processing Systems; Dietterich, T. G.; Becker, S.; Ghahramami,
Z., Eds.; MIT Press: Cambridge, MA, 2002; Vol. 14.
40. Bogomolny, E.; Bohigas, O.; Schmit, C. J Phys A: Math Gen 2003,
36, 3595.
41. Horn, R. A.; Johnson, C. R. Matrix Analysis; Cambridge University
Press: Cambridge, 1990.
42. Bodor, N.; Gabanyi, Z.; Wong, C. J Am Chem Soc 1989, 111,
3783.
43. Gavezotti, A. J Am Chem Soc 1983, 100, 5220.
44. Patterson, A. L. Rev Sci Instrum 1941, 12, 206.
45. Kroto, H. W.; Walton, A. R. M. The Fullerenes: New Horizons for
the Chemistry, Physics and Astrophysics of Carbon; Cambridge Uni-
versity Press: Cambridge, 1993.
46. Cioslowski, J. Electronic Structure Calculations on Fullerenes and
Their Derivatives; Oxford University Press: Oxford, 1995.
47. Vaitheeswaran, S.; Yin, H.; Rasaiah, J. C.; Hummer, G. Proc Natl
Acad Sci USA 2004, 101, 17002.
48. Ugarte, D. Nature 1992, 359, 707.
49. Shinohara, H. Prog Phys 2000, 63, 843.
50. Richman, D. D.; Whitley, R. J.; Hayden, F. G. Clinical Virology, 2nd
ed.; ASM Press: Washington, DC, 2002.
51. Chandrasekar, V.; Johnson, J. E. Structure 1998, 6, 157.
52. Torrens, F. Int J Mol Sci 2001, 2, 72.
53. Shen, M.; Davis, F. P.; Sali, A. Chem Phys Lett 2005, 405, 224.
54. Raines, R. T. Chem Rev 1998, 98, 1045.
55. Richards, F. M.; Wyckoff, H. W. Enzymes 1971, 4, 647.
56. Hearn, R. P.; Richards, F. M.; Sturtevant, J. M.; Watt, G. D. Biochem-
istry 1971, 10, 806.
57. Ratnaparkhi, G. S.; Varadarajan, R. Biochemistry 2000, 39, 12365.
58. Tilton, R. F.; Dewan, J. C.; Petsko, G. A. Biochemistry 1992, 31,
2469.
59. Estrada, E. Proteins: Struct Funct Bioinformat 2004, 54, 727.
60. Estrada, E. J Chem Inf Comput Sci 2004, 44, 1238.
61. Gerstein, M.; Richards, F. M. In the International Tables for Crystal-
lography; Rossman, M.; Arnold, A., Eds.; Kluwer: Dordrecht, 2001;
Vol. F, Ch. 22.1.1, p. 531.
777New Geometric Invariant with Applications From Clusters to Biomolecules
Journal of Computational Chemistry DOI 10.1002/jcc