Download - Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family
![Page 1: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/1.jpg)
Spatial Autocorrelation of Amino Acid Replacement Ratesin the Vasopressin Receptor Family
Lorraine Marsh
Received: 3 June 2008 / Accepted: 11 November 2008 / Published online: 4 December 2008
� Springer Science+Business Media, LLC 2008
Abstract Evolutionary rates of sites can be independent
of one another or correlated in some fashion. Significant
spatial autocorrelation was observed for site amino acid
replacement rates in vasopressin receptor family proteins
(VPRs). Spatial autocorrelation of rates is the propensity of
residues to lie near other residues of similar rate in the
folded protein structure. Optimal correlation occurred at a
distance suggesting that residues in contact had correlated
rates. As another way to study the same phenomenon, VPR
was partitioned into [40 9 10 A3 contiguous spatial
clusters for amino acid replacement rate estimation. Parti-
tioning was done without preconception of functional
regions of the protein and with a random partition control.
Cluster rates exhibited an overdispersed distribution sug-
gesting that rates were not randomly distributed in the
spatial partitions. In tests, cluster partitioning improved
maximum likelihood and Bayesian likelihood models for
VPR evolution. Spatial clusters with outlier rates, or line-
age-specific clusters differing in rate, proved to contain
VPR features likely to be under selection. Thus the spatial
autocorrelation observed is probably not just a statistical
finding, but likely has an evolutionary basis in protein
function.
Keywords Rate variation � Autocorrelation � Clustering �Vasopressin receptor � Bayesian phylogenetic inference �Gamma rate distribution
Introduction
The study of amino acid replacement rate variation allows
better prediction of rates, may improve phylogenetic infer-
ence, and gives insight into selection. There are many
sources of amino acid replacement rate variation among
sites. A number of sophisticated models have been proposed
to correlate site rate with tertiary structure of a protein (Dean
et al. 2002; Choi et al. 2007; Robinson et al. 2003; Marsh
and Griffiths 2005). Many of these are based on secondary
structure, solvent accessibility, and functionality of each site.
In some models protein domains are allowed to evolve at
independent rates (Van Damme et al. 2007). A general
perspective has been that residues that are constrained by
functional or by structural roles are less free to evolve and,
hence, exhibit a lower rate of change. Such roles are typically
spatially limited in proteins. However, because of the folded
structure of proteins, sites that make up a single spatial
domain may not be contiguous in the primary sequence.
A variety of approaches to integrating structural data
into evolutionary models has been proposed. One class of
approaches involves models assigning amino acid
replacement rates to specific classes of structural sites.
Examples are sites located at the surface of the protein or
part of a specific secondary structure or part of a functional
site (Dean et al. 2002; Choi et al. 2006). Methods for
evolutionary analysis based on the folded structure of
proteins have, however, found limited use, in part because
of their complexity and the lack of programs that accept
these analyses as input. However, the advent of major
efforts to solve protein structures provides a widening set
of X-ray crystallographic templates which have great
potential for evolutionary studies.
An alternative approach to site variation is to ignore
protein and DNA origins of rate differences and, instead, to
L. Marsh (&)
Department of Biology, Long Island University, Brooklyn,
NY 11201, USA
e-mail: [email protected]
123
J Mol Evol (2009) 68:28–39
DOI 10.1007/s00239-008-9183-4
![Page 2: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/2.jpg)
look at variation as a purely statistical problem. A number
of different functions have been used to fit site variation,
but the most common is the discretized gamma rate dis-
tribution (Yang 1994). This distribution takes parameters
often fit to the data. Despite the convenience, the gamma
rate distribution often underfits the observed data, and the
underlying problem of the source of rate variation remains.
The gamma rate distribution is essentially a curve-fitting
process which treats residue change as a black box. Though
it models the amino acid replacement rate variation and
significantly accommodates many patterns of rate varia-
tion, it does not attempt to capture the underlying biology
of amino acid replacement rate variation (Felsenstein
2001). Other models have been proposed to better fit rate
distributions (Ninio et al. 2007; Huelsenbeck and Suchard
2007). For example, a gamma mixture model fits many
proteins better than the single gamma model (Mayrose
et al. 2005). Some rate distributions are multimodal, for
instance, and poorly fit by the gamma rate distribution but
fit well by mixtures. Discrete models with higher numbers
of parameters that attempt to fit data better have also been
proposed (Susko et al. 2003). These are improvements on
the curve-fitting properties of the distribution but do not
address the biological basis of amino acid replacement rate
variation.
An approach to incorporating the concept of structure
implicitly is to consider that adjacent residues in the linear
sequence are spatially adjacent as well. Linear autocorre-
lation in primary protein sequence describes the tendency
of residues of like function to lie near each other in the
linear sequence and to share amino acid replacement rates
(Stern and Pupko 2006; Chakrabarti and Lanczycki 2007).
Linear autocorrelation of rates can be due to DNA features
(Elango et al. 2008) or protein structure (Mayrose et al.
2007). Linear autocorrelation provides a model to predict
amino acid replacement rate based on sequence position.
However, not all residues adjacent in the folded protein
structure are adjacent in the linear sequence.
The use of structure-based evolutionary models is lim-
ited by the requirement that the protein structure must be
known. Fortunately, protein structures form families and
related proteins share similar structures or folds. For pro-
teins sharing folds a model of a protein of interest can be
inferred using the structure of related (homologous) pro-
tein. Homology modeling is a method in which a protein of
unknown structure is modeled on a template protein of
known structure. Though homology-modeled proteins may
be only approximate at the atomic level, they tend to be
accurate in topology as long as the correct template is
chosen and alignment is correct (Fiser and Sali 2003). For
instance, the grouping of residues in the vicinity of active
sites is generally valid in properly prepared homology
models even if the model and structure differ in detail
(Chakrabarti and Sowdhamini 2004). For many evolu-
tionary purposes the accuracy of homology models may be
more than adequate and may extend the utility of structural
methods to most proteins (Marsh and Griffiths 2005).
The vasopressin receptor (VPR) is a nonapeptide
receptor of the G-protein-coupled receptor family (GPCR),
related to rhodopsin. The VPR is interesting because it
contains several distinct functional regions: peptide ligand
binding, G-protein binding, core protein stability, and
receptor switch domain. Residues that function together are
predicted to be scattered in the primary sequence of VPRs.
In addition, the VPR has a paralogue, the oxytocin receptor
(OR). The VPR and OR exhibit some sensitivity to each
others ligands (vasopressin and oxytocin differ at only two
positions). The VPR acts in blood pressure and fluid
homeostasis, whereas the OR mediates uterine contractions
in labor. Evolutionary selection may differ in these
receptors of different function, providing opportunities to
study paralogue specificity.
Here we describe approaches for analyzing amino acid
replacement rate variation based on spatial autocorrelation.
One method involved correlation of rates in space. The
other method was based on the determination of rates in
three-dimensional (3D) spatial clusters of residues set as
partitions. A variety of tests was used to show that amino
acid replacement rates of residues in these clusters are
correlated and significant to evolutionary analyses. The
method allowed identification of clusters of residues of
VPRs that might share a common selection.
Methods
Sequence Sampling
Sequences were retrieved from the National Center for
Biotechnology Information (NCBI). Diverse vertebrate
vasopressin receptor homologues and paralogues were
selected (Table 1). Octopus octopressin receptor was cho-
sen as an outgroup. Multiple alignment of sequences was
performed using Clustal W (Thomson et al. 1994) except
for the alignment with rhodopsin, which was performed as
described below. Clustal W settings with a gap opening
penalty of 10, a gap extension penalty of 0.2, and a Gonnet
series substitution matrix. Structures were retrieved from
the Protein Data Bank (PDB) (Berman et al. 2000). The
sequence from the rhodopsin structure was taken from the
appropriate PDB file.
Homology Modeling
Homology modeling was used to generate a structural
protein model for study. The human V1aR receptor (414
J Mol Evol (2009) 68:28–39 29
123
![Page 3: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/3.jpg)
amino acid residues) was modeled using a rhodopsin
template. Rhodopsin is one of only two possible templates,
and is the best-studied template for modeling G-protein-
coupled receptors. V1aR and rhodopsin (PDB code: 1f88)
were aligned using a dynamic programming method with
gaps suppressed in transmembrane domains. Transmem-
brane segments in GPCRs cannot tolerate gaps or insertions
and must occur in a fixed order. Thus weakly similar
segments can be aligned. The alignment of conserved
GPCR motifs (Shi and Javitch 2002) in each of the seven
transmembrane domains was confirmed by eye. Model-
ler8v1 (Fiser and Sali 2003) was used for homology
modeling with the Automodel setting. No attempt was
made to refine the loops of the receptor, which were also
modeled on rhodopsin and contained conserved Cys resi-
dues that constrain rhodopsin loops. Visually, the modeled
loops appeared to be located in appropriate regions of the
receptor.
Autocorrelation Calculations
To understand the role of clustered amino acid replacement
rates, autocorrelation, the tendency of such rates to cluster,
was studied. Autocorrelation of site variation was deter-
mined by finding the correlation of residue amino acid
replacement rate with rates of neighboring residues. This
was accomplished by quantifying the similarity of an index
residue amino acid replacement rate to the rates of residues
within a threshold distance in the context of the folded
protein. All residues in turn were used as index residues so
the analysis was not biased.
The autocorrelation approach adopted here would be
compatible with either parsimony- or model-based meth-
ods of determining rate. Though parsimony has some
limitations, it was adopted for this study. Evolutionary
amino acid replacement rates for each residue of the
protein were determined on a parsimony tree inferred by
Paup* 4.0 (Swofford 1998) using the category ‘tree steps’
as a measure of site rate. Moran’s (1950) I was determined
to calculate spatial autocorrelation. Moran’s I takes values
between -1 (negative autocorrelation) and 1 (autocorre-
lation), with 0 representing no autocorrelation. Distances
were measured from the C-beta carbon of all residues
except glycine, for which the C-alpha carbon was used.
Significance levels were determined by nonparametric
bootstrap replication. Sites within the distance threshold
were resampled to test if the value of Moran’s I remained
above 0. To eliminate the effect of linear autocorrelation,
only sites separated by more than seven residues were
included in some calculations.
Spatial Partitions
Using spatially clustered sites, spatial dependence of rates
could be determined. VPRs were partitioned into 3D spa-
tially contiguous clusters. These clusters were not
contiguous in linear sequence. Using the program Contact
(Marsh 2006) with settings to output clusters, (10 A)3 cubic
regions were defined. A limit of a minimum of six residues
per cluster was set and residues not associated with a valid
cluster were placed in a spare, nonspatial cluster. The total
number of clusters was 47 (46 spatial clusters averaging
8.43 sites and 1 spare cluster of 26 sites). Residues falling
into more than one spatial cluster were randomly assigned
to one or the other. The net result was an assignment of
each residue to one and only one spatial grouping. New
data matrices were generated by rearranging sites of the
data matrix to make contiguous partitions containing sites
of each cluster. Rates were estimated independently for 46
partitions.
An advantage of the 47 clusters was that they allocated
each residue once and only once. A disadvantage was that
Table 1 Receptor proteins used in analysis
Source Abbreviation(s) Ligand
Homo sapiens, human VPR1a, V1BRhum, V2Rhum Vasopressin
OXYRhum Oxytocin
Bos taurus, cow VPR1Cow, V2Rbovin Vasopressin
OXYRbovin Oxytocin
Rattus norvegicus, rat V1ARrat, V1BRrat, V2Rrat Vasopressin
OXYRrat Oxytocin
Gallus gallus, chicken VTR1chick, VTRpitchick Vasotocin
Rana catesbeiana, frog VTRrana Vasotocin
Bufo marinus, toad MTRbufma Mesotocin
Takifugu rubripes, fish VTR1Ataki, VTR1Btaki Vasotocin
Catostomus commersoni, fish ITRcatos Isotocin
Octopus vulgaris, octopus OPRoct Octopressin
30 J Mol Evol (2009) 68:28–39
123
![Page 4: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/4.jpg)
if a special protein feature lay at the boundary of two or
more clusters, it would be split. A second set of clusters
was defined, also 10-A3 cubes, but now comprised of 380
overlapping clusters. This set was used for evolutionary
feature detection and to find individual clusters with
anomalous behavior. To determine amino acid replacement
rates for the 380 clusters, an iterative analysis was used.
One cluster was selected at a time to make a partition, with
the remainder of the protein comprising a second partition.
The amino acid replacement rate of the cluster partition
was recorded and the likelihood of the model with the
partitions was noted.
Sites were also partitioned based on surface accessibil-
ity. The VP1aR protein was analyzed as described for
accessibility (Marsh and Griffiths 2005). Sites were divided
into surface (C4.9-A2 accessible surface) or core (\4.9-A2
accessible surface). The cutoff value for categorizing as
core or surface was determined by optimization of maxi-
mum likelihood (ML) trials. The estimated evolutionary
rate for the surface partition was 2.21 times that of the core
partition. It should be noted that in shape and scale, these
accessibility partitions differed greatly from the spatial
partitions described above.
Maximum Likelihood Analysis
ML studies were carried out to compare cluster methods to
other evolutionary models. In particular, we wanted to
understand whether clustered rate variation based on
location in the 3D structure of the protein fit evolutionary
data better than alternative models for the data. Models
tested included a single rate model, the gamma rate dis-
tribution model, and mixed models. ML studies used the
Codeml module of PAML3.14 application. Mega3.1 was
used to generate a VPR topology using the neighbor-join-
ing (N-)J method and Poisson correction setting. Variations
in tree topology are predicted not to have a large effect on
the analysis used here. Indels in alignment files and the
structural files were removed by the ‘complete deletion’
method. Files for Paml were rearranged during partitioning
to allow noncontiguous sites of each cluster to be grouped.
Simulated unpartitioned proteins were generated with
Evolver in the Paml package with a JTT evolutionary
model and a gamma rate distribution model (a = 0.78,
based on a VPR tree analysis). Sites of simulated protein
were randomly reordered to remove latent linear
autocorrelation.
Model Comparison
The ML analyses provided information about how well a
model fit a set of evolutionary data. It was important to
compare models and test whether differences were
significant. For instance, we wanted to know if the cluster
model was significantly better than a single amino acid
replacement rate model on the VPR data set. Models were
compared using the Akaike Information Criterion (AIC).
AIC = 2(L(h2) - L(h1)) - 2(p2 - p1), where L(h1) is the
likelihood given inferred parameters, p, of one model and
(L(h2) is the likelihood of an alternative, not necessarily
nested, model. Alternatively, Bayesian models were com-
pared using a Bayes factor. To calculate the Bayes factor,
the harmonic mean of posterior probabilities was taken as
an estimate of average likelihood (Huelsenbeck and Ron-
quist 2001; Newton and Raftery 1994). The Bayes factor
was calculated as the ratio of likelihoods of two models.
Amino Acid Replacement Rate Dispersion Between
Clusters
Clusters were analyzed for overdispersion of rate, that is,
greater variation in rate between clusters than expected by
chance. The mean number of evolutionary steps per cluster
was used to classify clusters into rate categories and the
distribution of rate classes was tested statistically. A par-
simony tree generated by Paup* 4.0 was used to determine
evolutionary steps for each site in a cluster. Steps were then
summed for each cluster and adjusted for cluster size. For
Poisson tests of overdispersion the variability of number of
sites for each cluster necessitated a different approach.
Clusters of different sizes could not be mixed. Instead the
problem was divided and groups of clusters with the same
number of residues were analyzed together. For each size
class of cluster, a comparison was made to the model
Poisson distribution for the data mean and sample number
appropriate for that size cluster. The results of each size
class analysis were then merged using weighted sums cor-
responding to the number of clusters in each size class. The
net result was a comparison of the rates of clusters to a
model Poisson-based curve. To determine significance the
calculations were repeated using jackknife (50%) replicas,
dropping sites out of clusters. Site rates can be thought of as
having variability due to autocorrelation effects and vari-
ability due to random error. This test only analyzed
variability of rates due to autocorrelation, but random error
was predicted to be Poisson-distributed and hence not
interfere with the analysis. The statistic variance/mean was
evaluated, with a value [1 indicating overdispersion. The
Poisson distribution is expected for randomly distributed
amino acid replacement rates, whereas overdispersed or
autocorrelated rates will exhibit broader distribution curves.
Simulated Spatially Autocorrelated Trees
The ability of the cluster model to function in phylogenetic
inference was studied. Simulation was used to test the
J Mol Evol (2009) 68:28–39 31
123
![Page 5: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/5.jpg)
ability of various models to detect the correct topology of a
tree. The goal was to assess the potential of the structural/
spatial model in phylogenetic reconstruction. Phylogenies
were simulated using the Evolver program in the PAML
3.14 package (Yang 1997). To simulate autocorrelated
protein phylogenies, an unrooted tree of four proteins with
unequal branch lengths was used, with no molecular clock.
Twenty partitions were made in a simulated protein of 400
residues. Each partition was set to a fixed rate to mimic a
spatial cluster. For trials the distribution of amino acid
replacement rates among partitions during simulation was
set by a gamma rate distribution with a = 1 or 2. All trees
simulated for this test emulated spatial autocorrelation but
the analysis models varied and included spatial autocorre-
lation, gamma model, and single rate. More than 100 trials
with randomly simulated trees were analyzed by ML with a
given model and the tree topology with the highest likeli-
hood was noted. The same set of data was analyzed by each
model. The proportion of correct topologies was deter-
mined for each method.
Bayesian Evolutionary Analysis
To test the utility of the spatial partition method in phy-
logenetic analysis, Bayesian phylogenetic reconstruction
was carried out. Evolutionary analysis was carried out with
the VPR data set. Partitioned (clustered) and nonpartitioned
(gamma, single rate) models were tested. MrBayes1.1.1
(Huelsenbeck and Ronquist 2001) was used. For the cluster
model, data matrix partitions were set to VPR spatial
clusters. Partitions were derived from 47 (10 A)3 clusters,
each of which contained a mutually exclusive set of resi-
dues and at least six residues. Each partition was allowed to
equilibrate to its own rate.
An unpartitioned analysis using the gamma model
served as a control, with four amino acid replacement rate
categories used with the discrete gamma option. Bayesian
reconstruction was carried out using four independent
MCMC processes, three heated and one cold. The protein
model was JTT. For Bayes factor analysis, the harmonic
mean of each posterior probability was calculated and used
in model selection as described above.
Results and Discussion
Spatial Autocorrelation of Amino Acid Replacement
Rates and Evolution
Structural influences play an important role in protein
evolution. One of the main goals of this work was to study
whether amino acid replacement rates in VPRs (Table 1)
were spatially clustered and, if so, whether those clusters
had functional significance. The first test was to assay
whether residues positioned next to one another in the
folded structure had, on average, similar rates of evolu-
tionary change. It is known that amino acid replacement
rates of residues near one another in the linear protein
sequence are autocorrelated (Stern and Pupko 2006;
Mayrose et al. 2007). We wanted to determine whether that
correlation could be extended to 3D structure.
Spatial autocorrelation of amino acid replacement rates
was tested on different spatial scales. Moran’s I was used to
quantitate autocorrelation using steps on a parsimony VPR
tree as a surrogate for site rates.
I ¼ RiRjsisj
� �= Ris
2i
� �ð1Þ
where si was the difference between the number of steps at
a site i and the mean number of steps for the population of
sites. Autocorrelation and clustering, by their natures,
depend on distance. The analysis was thresholded to
include only residue pairs with a through-space Euclidian
distance less than some test value representing spatial
scale. Correlation of the amino acid replacement rates of
residues separated by different distances was analyzed. As
shown in Fig. 1, the peak distance for autocorrelation of
rates was 7 A, which is approximately the range of inter-
action of an amino acid residue with surrounding residues.
One interpretation of this result is that rates of amino acid
residues that are in contact in the folded protein tend to be
similar. Autocorrelation values were modest, but they were
significant (P \ 0.01) by bootstrap analysis and remained
significant when linear autocorrelation was removed from
the calculation. At a 7-A spatial distance, Moran’s I with
linear autocorrelation removed was 0.497 (bootstrap
P \ 0.005), higher than with linear autocorrelation inclu-
ded. Thus there is a significant chance that any amino acid
0
0.1
0.2
0.3
0.4
0.5
0 5 10 15 20 25 30
Distance (Angstroms)
Au
toco
rrel
atio
n
Fig. 1 Spatial autocorrelation of amino acid replacement rates in
VPRs. Autocorrelation (Moran’s I) was calculated for different
residue distances to test for possible clustering of rates. Each distance
was inclusive, that is, it included all intermediate distances. The rates
were significantly autocorrelated, using a nonparametric bootstrap
replication test (P \ 0.01)
32 J Mol Evol (2009) 68:28–39
123
![Page 6: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/6.jpg)
residue of VPR will be in contact with residues of similar
rate.
Testing a Spatial Cluster Model of Amino Acid
Replacement Rate Variation
Since spatial autocorrelation occurred, we wanted to
determine whether a model incorporating these results
would be practical. A pure autocorrelation model proved
too unwieldy to be useful. Instead, a simpler, related model
based on partitioning proteins by spatial criteria was tested
to see if it exhibited an improved fit for protein evolution.
A distance of 7 A (the peak from the previous analysis) is
approximately the average distance between points in a
(10-A)3 cube. VPR was divided into 46 contiguous (10-A)3
clusters (plus 1 cluster for stray residues from poorly
occupied clusters), which were used to partition the pro-
tein. This partitioning of data into amino acid replacement
rate classes based on spatial clusters is termed the ‘cluster
model.’ Each cluster had on average 8.4 residues and a
minimum of 6 residues.
To determine if a cluster model improved the descrip-
tion of VPR evolution, ML analysis was used. The goal
was to determine which model fit the evolutionary data
better. For this analysis a fixed tree was used, with branch
length estimation. The amino acid replacement rate of each
spatial cluster partition was independently estimated. ML
analysis with the spatial partitions led to a significantly
better model fit than the single amino acid replacement rate
model (likelihood ratio test [LRT], P \ 0.001). Table 2
shows model comparisons as determined by AIC. These
model comparisons were not nested and could not use the
LRT. AIC model comparison is a method in which addition
of parameters is penalized (Akaike 1974). For the cluster
model, amino acid replacement rate parameters had to be
estimated for 46 additional partitions. But despite the
penalization incurred for estimation of these parameters,
the model was supported. The gamma rate distribution was
also tested. As expected the gamma distribution model of
rates was supported as being superior to the single rate
model. A double model with the cluster model plus the
gamma model was somewhat better than the gamma model
alone or the cluster model alone. The observation that the
double model was more likely than either single model
suggested that the gamma and spatial models might capture
distinct features of protein evolution for this data set.
A number of models for spatial effects on evolution
consider differences in amino acid replacement rate
between inaccessible core residues of a protein and
accessible surface residues. When the VPR proteins were
partitioned based on accessibility, the AIC improved
(Table 2). The magnitude of the improvement, however,
was unexpectedly low. This result probably reflected the
fact that the VPR is a membrane protein. Membrane pro-
teins can exhibit lower accessibility effects than soluble
globular proteins (Goldman et al. 1998; Choi et al. 2007).
Accessibility and spatial clustering are very different ways
of partitioning structure, but each captured significant
correlations with the amino acid replacement rate. The
relative importance of accessibility and spatial effects
could not be generalized by these experiments on a single
protein.
Not all rate partitionings of the protein are equal. When
partitions were generated randomly, rather than being
based on spatial proximity, the AIC test showed that the
random partitionings were inferior to spatial partitioning
(Table 2). Thus success with the partitioning method
appeared to require partitions of clustered residues. It is
reasonable to imagine that these clusters often contain
groups of residues that comprise spatially discrete natural
features. The poor AIC tests of random partitions also
support the validity of the homology model used here as a
structural reference. A poor structural model would gen-
erate partitions no different than random partitions.
Variance of Cluster Amino Acid Replacement Rates
Further tests were performed to examine the nature of the
rate clusters. It was of interest to determine whether the
distribution of amino acid replacement rates of clusters in
proteins was overdispersed. Overdispersed rates would be
more scattered than a Poisson distribution which describes
random clusters. Overdispersion is a predicted feature of
autocorrelation and would be an important confirmation
that the spatial autocorrelation was present. Also, if clusters
with outlying rates existed, these clusters would be can-
didates for regions in which selection of some sort is
Table 2 ML comparison of cluster and gamma models with VPR
evolutionary data
Model 1 (no.
parameters)
Model 2 (no. parameters) AICa
One rate (0) 47 spatial partitions (46) 482.2
One rate (0) Gamma rate distribution (1) 619.6
One rate (0) Core/surface partition (1) 158.6
Gamma model (1) 47 spatial partitions (46) –197.4
47 spatial partitions (46) 47 spatial partitions ? gamma
(47)
264.6
Gamma model (1) 47 spatial partitions ? gamma
(47)
73.2
47 spatial partitions (46) 47 random partitions (46) -317.8
’’ ’’ -380.0
’’ ’’ -353.6
a Akaike Information Criterion. A positive value indicates that model
2 fits better than model 1
J Mol Evol (2009) 68:28–39 33
123
![Page 7: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/7.jpg)
occurring. The VPR protein was subjected to cluster
analysis to determine dispersion (Fig. 2). The results
indicated that rates of spatial partitions are overdispersed in
VPR. Compared to a Poisson distribution, there were sig-
nificantly more low rate and high rate clusters in the VPR
amino acid replacement rate distribution. This analysis
used clusters of about eight residues, and larger or smaller
clusters could give different results. Since the chosen
cluster size (10 A)3 was about one-fourth to one-third of
the protein diameter, larger clusters might become larger
than functional regions of the protein. From a biological
perspective, autocorrelation might be highest in clusters
with specialized function (negative selection, low rate) or
lack of function (lack of negative selection, fast rate). It is
possible that some clusters were more autocorrelated than
others and that the variance analysis included a mixture of
strongly and weakly autocorrelated clusters. Nonetheless,
the analysis was significant for overdispersion, supporting
autocorrelation.
To further study the nature of spatial differences in
evolutionary amino acid replacement rate, variance in rate
for simulated VPRs and real VPRs was compared. Simu-
lated proteins were generated, without structural input,
under a gamma rate distribution with the gamma a param-
eter set to that estimated from the VPR data set. The amino
acid replacement rates of 380 overlapping spatially defined
(10-A)3 cubic clusters were tested. The distribution of rate
estimates for the natural and simulated VPRs is shown in
Fig. 3. The amino acid replacement rates of clusters in
simulated proteins generated under a gamma rate distribu-
tion (which is overdispersed) exhibited little variance, while
rates for real VPRs were significantly more scattered. This
result shows that spatial autocorrelation is a selected feature
of natural proteins not present in typical simulated proteins.
The gamma distribution assures overall site rate overdis-
persion ut does not cluster the rates spatially. Simulated
proteins, in their simplest form, cannot be used to test for
evolution based on spatial features of proteins.
Simulated Evolution with Spatial Autocorrelation
A variety of other evolutionary tests was used to examine
the cluster model. One test of a model is its ability to
reconstruct phylogenies that were generated using the
model to simulate protein trees. As described above, simple
simulated proteins do not capture the features of folded
proteins. To simulate proteins with structure, simulated
proteins were generated with partitions equivalent to
clusters in a protein structure. Simulations were carried out
using the cluster model to generate mock 3D-based trees.
Trees of four proteins were generated, with long external
branches and a short internal branch. These conditions have
been shown to add complexity to phylogenetic inference
(Felsenstein 1978). The gamma distribution was used to
simulate the amino acid replacement rates of the autocor-
related partitions and the a value was varied (Table 3) (see
Methods). Alternative evolutionary models (single rate,
gamma, cluster) were used with the PAML3.14 application
to attempt to predict the generating topology. The cluster
model was significantly better than the gamma model for
one of the tested set of conditions (Table 3, row 3).
However, overall the cluster model and gamma model had
similar predictive performances on the cluster simulated
trees. These results suggest that for some trees the cluster
-5
0
5
10
15
20
25
1 3 5 7 9 11 13 15 17 19
Rate
Fre
qu
ency
Fig. 2 Amino acid replacement rate variation in clusters. Rates in
clusters set as partitions were compared to a model Poisson
distribution to determine if cluster rates were overdispersed. Dashed
lines, Poisson model expected for normally dispersed observations;
solid line, observed distribution of rate. Poisson model was rejected
by jackknife test for dispersion (P \ 0.001). Analysis performed on
VPR data set
0
50
100
150
-3 -2 -1 0 1 2
Ln Rate
Nu
mb
er o
f C
lust
ers
Fig. 3 Differential distribution of amino acid replacement rates in
VPRs and simulated proteins. The estimated amino acid replacement
rates of 380 overlapping spatial clusters were analyzed for distribu-
tion of rates. As a comparison, a simulated protein, for which spatial
information was irrelevant, was analyzed in the same way. Dashed
line, distribution of rates for VPR proteins; solid line, distribution of
rate in the simulated protein. The differences in variances were
significant (P \ 0.01) using a Fligner nonparametric test
34 J Mol Evol (2009) 68:28–39
123
![Page 8: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/8.jpg)
model would produce more accurate results. However, no
single model was best under all conditions. Under some
conditions the misspecified single rate model performed
better than the cluster model that generated the data. This
occurred with a tree known to cause difficulty for ML
inference (Yang 1996).
Bayesian Phylogenetic Inference with Cluster Partitions
Spatial autocorrelation was applied to a VPR phylogenic
analysis to see if the cluster method could create an
accurate phylogeny. A model based on the cluster method
was used for Bayesian phylogenetic inference. The VPR
matrix was divided into 47 spatial partitions. Based on ML
analysis the phylogenetic tree contained short internal
branches between the major groupings of VPRs (VPR1a,
VPR1b, VPR2, OTR) as the trees in the simulation did (not
shown). The analysis was also performed with the gamma
rate distribution model. As shown in Fig. 4 (cluster) and
Fig. 5 (gamma model), the cluster method was able to
produce a consistent tree with resolution similar to or better
than that with the gamma model. With the cluster model
the placement of teleost and amphibian receptors on the
oxytocin receptor lineage was also more in accord with
accepted evolutionary relationships for lower vertebrates
and mammals. It is notable that the cluster method is
compatible with existing phylogenetic applications once
spatial partitions have been assigned. The model should be
applicable to many proteins that have spatially defined
features expected to evolve at a rate different from the bulk
Table 3 Topology inference on
trees simulating spatial
autocorrelation
* P \ 0.005, comparing cluster
model to gamma rate
distributiiona Discrete gamma a value for
distribution of partition rates
during simulation
Tree Partition rate
distribution, aInference model: % correct topology
Cluster Gamma Single rate
1. ((A:3,B:3),.1(C:3,D:3)) 1a 47.6* 32.9 40.5
2. ’’ 2 87.3 87.3 86.4
3. ((A:3,B:.1),.1(C:3,D:.1)) 1 78.1 75.2 5.2
4. ’’ 2 67.3 65.5 65.5
5. ((A:3,B:3),.1(C:.1,D:.1)) 1 49.1 49.1 99.1
6. ’’ 2 66.4 63.6 80.0
OPR OCT
ITR CATOS
MTR BUFMA
OXYR RAT
OXYR BOVIN
OXYR HUMAN0.76
1.00
0.75
1.00
VTR1 CHICK
V2R RAT
V2R BOVIN
V2R HUMAN0.69
1.00
1.00
VTRpit CHICK
V1BR HUMAN
V1BR RAT
1.00
1.00
VTR1b TAKI
VTR1a TAKI1.00
VTR RANA
VPR1 COW
hVPR1a
V1AR RAT1.00
1.00
1.00
1.00
0.55
0.67
Fig. 4 VPR phylogeny
generated with a cluster
partition model. Bayesian
phylogenetic inference with
VPR-related receptors. Forty-
seven spatial partitions were
used, with amino acid
replacement rates allowed to
vary between partitions. The
reconstruction shown is a 50%
majority-rule consensus tree.
Numbers at nodes represent the
posterior probability that the
node is supported
J Mol Evol (2009) 68:28–39 35
123
![Page 9: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/9.jpg)
of the protein. A special feature of the cluster method, as
presented, is that the location of spatial features need not be
defined in advance.
As described above, ML analysis supported the cluster
model. That analysis involved only estimation of branch
lengths and not tree topology. With the Bayesian phylo-
genetic analysis there was an opportunity to test the cluster
model in the broader context of tree inference. Bayes factor
analysis of Bayesian phylogenetic analysis exhibited a
pattern of model support similar to the ML analysis
(Table 4). In particular, both the cluster model and the
gamma rate distribution model were better than a single
rate model by Bayes factor analysis. The addition of the
cluster model to the gamma model (gamma rate distribu-
tion within each cluster partition) significantly improved
the likelihood, again supporting the concept that these two
approaches to among-sites amino acid replacement rate
variation capture distinct features of the VPR data set. The
cluster model with 47 partitions was superior in compari-
sons to a model with 47 random partitions unrelated to
spatial clustering. The posterior probabilities of the spatial
autocorrelation and the double gamma autocorrelation
models converged after \1 million cycles and 2 million
cycles, respectively, of the MCMC process. The cluster
models, despite their size and a certain level of ineffi-
ciency, did not require special treatment other than
allowing sufficient time for MCMC equilibration.
Finding Spatial Regions of Anomalous Amino Acid
Replacement Rates
One goal for analysis of autocorrelated clusters was to
identify candidates for selection. The justification for the
cluster model was the concept that amino acid replacement
rate variation for protein sites might have a biological
underpinning. Functional features of a protein may evolve
at rates different from the protein as a whole. If biological
function supports spatial autocorrelation, then clusters
identified by ML as having anomalous amino acid
replacement rates might be associated with selection.
Regions of proteins evolving at a slower than normal or
faster than normal rate were identified. The most important
regions were identified by the increase in likelihood that
occurred when the region was set as a partition against the
bulk of the protein. These partitions were not necessarily
the same as those with the fastest or slowest amino acid
replacement rate. Instead they seemed to be the partitions
with a large number of sites of uniformly high or low rates.
From the list of the effect of each individual cluster par-
tition on ML, the partition with the largest effect was
chosen (Fig. 6). This low amino acid replacement rate
cluster mapped to the VPR core including portions of
transmembrane (TM) domain 1, TM2, and TM7. This
cluster had, by far, the largest effect on likelihood when set
as a partition during analysis. A high amino acid replace-
ment rate cluster with a large effect on ML (not shown)
contained a portion of the intracellular C-terminal domain
of VPR including cytoplasmically facing residues of helix
8. Both the low rate and the high rate clusters were rea-
sonable candidates for regions whose amino acid
replacement rates would be correlated for functional rea-
sons. The low rate cluster included a portion of an H-
bonded network thought to stabilize GPCRs in the off state
OPR OCT
VTR RANA
VPR1 COW
hVPR1a
V1AR RAT
0.53
1.00
1.00
VTR1b TAKI
VTR1a TAKI1.00
1.00
VTRpit CHICK
V1BR HUMAN
V1BR RAT1.00
1.00
VTR1 CHICK
V2R RAT
V2R BOVIN
V2R HUMAN0.71
1.00
1.00
0.63
MTR BUFMA
ITR CATOS
0.62
OXYR RAT
OXYR BOVIN
OXYR HUMAN0.67
1.00
1.00
Fig. 5 VPR phylogeny with gamma amino acid replacement rate
distribution. A control Bayesian phylogenetic analysis with a gamma
rate distribution model. A 50% majority-rule consensus tree is shown.
Numbers at nodes represent posterior probability that the node is
supported. The gamma a value estimate was 0.891 in this analysis
Table 4 Assessing model quality by Bayes factors derived from
Bayesian phylogenetic reconstructions of a VPR phylogeny
Model Likelihood (Ln) Bayes factora (Ln)
Comparison to single rate model
Single rate -11,041 [0]
Cluster -10,711 330
Gamma -10,624 417
Comparison to gamma model
Gamma ? cluster -10,518 106
a Model compared to model (Gelman et al. 2004)
36 J Mol Evol (2009) 68:28–39
123
![Page 10: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/10.jpg)
prior to activation (Okada et al. 2002). Mutation of resi-
dues in this region constitutively activates some GPCRs
(Robinson et al. 1992). The C-terminal high rate region, by
contrast, is potentially one of the areas of the receptor
involved in coupling to G protein. Since the receptors
included in this phylogeny include types coupling to sev-
eral different G-protein types, it is reasonable that rates in
this region would be highly variable (Strader et al. 1994).
Presumably there has been selection for change in this
region as the G-protein type involved in coupling changed.
Specific residues in this region have been shown to be
essential for individual GPCR family members to function
and couple to their cognate G-protein type.
Earlier it was shown that the amino acid replacement
rate distribution of spatial clusters was significantly over-
dispersed. One interpretation of that analysis was that rates
in VPRs were randomly distributed according to some
distribution with overdispersed properties. A more bio-
logical explanation might be that VPRs contain functional
regions and that the differing rates associated with selec-
tion on those regions contribute to overdispersion and
spatial autocorrelation. This second perspective suggests
that the extreme rate clusters are not outliers of an over-
dispersed statistical distribution. Instead it appears that they
represent, as above, biologically significant regions.
Lineage Specific Amino Acid Replacement Rate
Clusters
To this point the methods described have focused on
cluster rates across the entire VPR phylogeny. Defining
clusters of residues whose selection differs by lineage is an
interesting problem. By specifically seeking clusters that
differed in amino acid replacement rate by lineage, the
focus was on structural regions whose function had chan-
ged over the tree. For this analysis the paralogue OTR and
VP1aR lineages were separated and ML analysis was
performed separately on each grouping. Figure 7 shows a
cluster with a high estimated rate difference between the
OTR and the VP1aR lineages. This cluster includes part of
the extracellular loop region which has been implicated in
ligand binding (Shi and Javitch 2002). The OTR amino
acid replacement rate was lower than the VP1aR rate for
this cluster (the lineage cluster). The most likely explana-
tion for the lineage cluster is that evolutionary constraints
are relaxed for VP1aR receptors. A change in evolutionary
constraints for other VPR receptors (due to a change in
ligand) has been associated with changes in the ligand
binding site (Cho et al. 2007). Oxytocin and vasopressin,
the principal ligands of OTR and VP1aR, respectively, are
similar, but not identical ligands. Each receptor binds the
receptor of the other to some extent, and this might have
physiological significance. So lineage-specific changes in
the ligand-binding region are of interest. Though this
technique cannot detect selection per se, it can highlight
regions that are candidates for various types of selection.
Conclusion
A significant amount of protein amino acid replacement
rate variation is correlated with the location of the site in
the folded protein structure. For sites in VPRs, the rate was
spatially autocorrelated or clustered. This clustering of
amino acid replacement rates was supported by several
independent approaches including ML and Bayesian anal-
yses. The cluster model is conceptually simple and can be
applied to a number of evolutionary analyses. This method
allowed useful partitioning of sites into spatial clusters. The
current model focused on clusters of residues in the protein
which might be described as functional units. These clus-
ters of amino acids may have shared rates because of
shared selection and shared function. The cluster method
captured the fact that amino acid replacement rates are
Fig. 6 Outlier amino acid replacement rate cluster of the VPR
receptors. Three hundred eighty overlapping spatial clusters were
tested for those that improved the likelihood most when set as a
separate rate partition. A structural model of human V1aR is shown.
The low-rate cluster is indicated in gray. Residues in the cluster are
presented as spacefilling spheres to emphasize that the amino acid
residues in the cluster are spatially contiguous. The rest of the protein
is depicted by a white ribbon schematic display with the ligand
binding domain toward the top of the figure and the G-protein binding
domain at the bottom
J Mol Evol (2009) 68:28–39 37
123
![Page 11: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/11.jpg)
clustered without requiring that the functional regions of
the protein be identified. When functional regions were
known, that information was incorporated. The ability to
identify the rate of clustered sites allowed identification of
regions with differing rates of evolution. As an example,
we were able to identify a region of the VP1aR ligand
binding site with a rate different from that of the corre-
sponding region of OTR, suggesting differences in
selection for peptide ligand interaction on the two para-
logue lineages. Thus this autocorrelation/cluster model
helped provide insight into the evolution of VPR functions.
It is likely that autocorrelation will apply to other proteins,
especially proteins with regions that carry out some spe-
cific function.
Acknowledgments I thank J. L. Thorne and an anonymous
reviewer for helpful suggestions. Carole Griffiths provided stimulat-
ing discussion and advice. The LIU Biocomputing facility provided
resources.
References
Akaike H (1974) A new look at the statistical model identifications.
IEEE Trans Automat Contr AC-19:716–723
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Wissig H,
Shindyalov IN, Bourne PE (2000) The protein data bank.
Nucleic Acids Res 28:235–242
Chakrabarti S, Lanczycki CJ (2007) Analysis and prediction of
functionally important sites in proteins. Protein Sci 16:4–13
Chakrabarti S, Sowdhamini R (2004) Regions of minimal structural
variation among members of protein domain superfamilies:
application to remote homology detection and modelling using
distant relationships. FEBS 569:31–36
Cho HJ, Acharjee S, Moon MJ, Oh DY, Vaudry H, Kwon HB, Seong
JY (2007) Molecular evolution of neuropeptide receptors with
regard to maintaining high affinity to their authentic ligands. Gen
Comp Endocrinol 153:98–107
Choi SS, Vallender EJ, Lahn BT (2006) Systematicallly assessing the
influence of three-dimensional structural context on the molec-
ular evolution of mammalian proteomes. Mol Biol Evol
23:2131–2133
Choi SC, Hobolth A, Robinson DM, Kishino H, Thorne JL (2007)
Quantifying the impact of protein tertiary structure on molecular
evolution. Mol Biol Evol 24:1769–1782
Dean AM, Neuhauser C, Grenier E, Golding GB (2002) The pattern
of amino acid replacements in alpha/beta-barrels. Mol Biol Evol
19:1846–1864
Elango N, Kim SH, Vigoda E, Yi SV (2008) Mutations of different
molecular origins exhibit contrasting patterns of regional
substitution rate variation. PLoS Comput Biol 4:e1000015
Felsenstein J (1978) Cases in which parsimony or compatibility
methods will be positively misleading. Syst Zool 27:401–410
Felsenstein J (2001) Taking variation of evolutionary rates between
sites into account in inferring phylogenies. J Mol Evol 53:447–455
Fiser A, Sali A (2003) Modeller: generation and refinement of
homology-based protein structure models. Methods Enzymol
374:461–491
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Model checking
and improvement. In: Gelman A, Carlin JB, Stern HS, Rubin DB
(eds) Bayesian data analysis. Chapman and Hall, New York,
pp 157–192
Goldman N, Thorne JL, Jones DT (1998) Assessing the impact of
secondary structure and solvent accessibility on protein evolu-
tion. Genetics 149:445–458
Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference
of phylogenetic trees. Bioinformatics 17:754–755
Huelsenbeck JP, Suchard MA (2007) A nonparametric method for
accommodating and testing across-site rate variation. Syst Biol
56:975–987
Marsh L (2006) Evolution of structural shape in bacterial globin-
related proteins. J Mol Evol 62:575–587
Marsh L, Griffiths C (2005) Protein structural influences in rhodopsin
evolution. Mol Biol Evol 22:894–904
Mayrose I, Friedman N, Pupko T (2005) A gamma mixture model
better accounts for among site rate heterogeneity. Bioinformatics
21(Suppl 2):ii151–ii158
Mayrose I, Doron-Faigenboim A, Bacharach E, Pupko T (2007)
Towards realistic codon models: among site variability and
dependency of synonymous and non-synonymous rates. Bioin-
formatics 23:i319–i327
Moran PA (1950) Notes on continuous stochastic phenomena.
Biometrika 37:17–23
Newton MA, Raftery AE (1994) Approximate Bayesian inference by
the weighted likelihood bootstrap (with discussion). J Roy Stat
Soc Ser B 56:3–48
Ninio M, Privman E, Pupko T, Friedman N (2007) Phylogeny
reconstruction: increasing the accuracy of pairwise distance
estimation using Bayesian inference of evolutionary rates.
Bioinformatics 23:e136–e141
Okada T, Fujiyoshi Y, Silow M, Naverro J, Landau EM, Shichida Y
(2002) Functional role of internal water molecules in rhodopsin
revealed by X-ray crystallography. Proc Natl Acad Sci USA
99:5982–5987
Fig. 7 Spatial clusters of amino acids evolving at different rates in
the oxytocin receptor and vasopressin 1a receptor lineages. Overlap-
ping spatial clusters were tested for regions with a different amino
acid replacement rate for two lineages. The best candidate is shown in
gray spacefill presentation. General features are similar to those in
Fig. 6. This region is evolving more slowly in oxytocin receptors than
V1a receptors. The selected residues comprise a portion of the region
of the predicted binding site for vasopressin/oxytocin
38 J Mol Evol (2009) 68:28–39
123
![Page 12: Spatial Autocorrelation of Amino Acid Replacement Rates in the Vasopressin Receptor Family](https://reader035.vdocuments.us/reader035/viewer/2022080105/57502ae61a28ab877ecee7c5/html5/thumbnails/12.jpg)
Robinson PR, Cohen GB, Zhukovsky EA, Oprian DD (1992)
Constitutively active mutants of rhodopsin. Neuron 9:719–725
Robinson DM, Jones DT, Kishino H, Goldman N, Thorne JL (2003)
Protein evolution with dependence among codons due to tertiary
structure. Mol Biol Evol 20:1692–1704
Shi L, Javitch JA (2002) The binding site of aminergic G protein-
coupled receptors: the transmembrane segments and second
extracellular loop. Annu Rev Pharmacol Toxicol 42:437–467
Stern A, Pupko T (2006) An evolutionary space-time model with
varying among-site dependencies. Mol Biol Evol 23:392–400
Strader CD, Fong TM, Tota MR, Underwood D, Dixon RAF (1994)
Structure and function of G protein-coupled receptors. Annu Rev
Biochem 63:101–132
Susko E, Field C, Blouin C, Roger AJ (2003) Estimation of rates-
across-sites distributions in phylogenetic substitution models.
Syst Biol 52:594–603
Swofford DL (1998) PAUP*: phylogenetic analysis using parsimony
(*and other methods. Version 4. Sinauer Associates, Sunderland,
MA
Thomson JD, Higgins DG, Gibson TJ (1994) Clustal W: improving
the sensitivity of progressive multiple sequence alignment
through sequence-weighting, position-specific gap penalties,
and weight matrix choice. Nucleic Acids Res 22:4673–4680
Van Damme EJ, Nakamura-Tsurata S, Smith DF, Ongenaert M,
Winter HC, Rouge P, Goldstein IJ, Mo H, Kominami J, Culerrier
R, Barre A, Hirabayashi J, Peumans WJ (2007) Phylogenetic and
specificity studies of two-domain GNA-related lectins: genera-
tion of multispecificity through domain duplication and
divergent evolution. Biochem J 404:51–61
Yang Z (1994) Maximum likelihood phylogenetic estimation from
DNA sequences with variable rates over sites: approximate
methods. J Mol Evol 39:306–314
Yang Z (1996) Phylogenetic analysis using parsimony and likelihood
methods. J Mol Evol 42:294–307
Yang Z (1997) PAML: a program package for phylogenetic analysis
by maximum likelihood. Comput Appl BioSci 13:555–556
J Mol Evol (2009) 68:28–39 39
123