edward n. trifonov and igor n. berezovsky- evolutionary aspects of protein structure and folding

Evolutionary aspects of protein structure and folding Edward N Trifonov and Igor N Berezovsky y The traditional reconstruction of molecular events of the past based on sequence conservation becomes very vague beyond one to two billion years ago. There are certain molecular features, however, such as polymer ¯exibility and loop closure, that are conserved merely because of their physical nature. This allows one to penetrate the earliest stages of protein evolution. Addresses Genome Diversity Center, Institute of Evolution, University of Haifa, Haifa 31905, Israel e-mail: [email protected] y Department of Structural Biology, The Weizmann Institute of Science, POB 26, Rehovot 76100, Israel; e-mail: [email protected] Correspondence: Edward N Trifonov Current Opinion in Structural Biology 2003, 13:110±114 This review comes from a themed issue on Folding and binding Edited by Jane Clarke and Gideon Schreiber 0959-440X/03/$ ± see front matter ß 2003 Elsevier Science Ltd. All rights reserved. DOI 10.1016/S0959-440X(03)00005-8 Introduction There are many important aspects of protein evolution and folding [1±4,5 ,6±9,10 ], each of them deserving a thorough review [11,12 ,13,14 ,15,16]. In this paper, we focus on the earliest stages of protein evolution [17 ] and their impact on the structure and folding of contemporary proteins. Biological systems are believed to have evolved en route from simple to complex, from small to large, guided by a multitude of laws of Nature. As both nucleic acids and proteins are polymers, they obey the laws of polymer physics. This generally neglected and recently revived association has challenged the very basics of our understanding of protein structure and evolution. From the perspective of polymer statistics, every polymer chain may occasionally return to itself. That is, some points of the free chain trajectory may come within a short reach of one another, forming a closed loop. The closed loops have a typical size characteristic of a given type of polymer. The more ¯exible the chain, the smaller the loops. Accordingly, the polypeptide chains of globular proteins contain numerous closed loops, with a contour length of 10±50 residues, depending on whether the loops are structured or unstructured. The majority of the loops comprise 25±35 amino acid residues. The same laws of chain statistics apply to DNA molecules; the optimal DNA loop (ring) has a contour length of 300±600 base pairs. The DNA molecules are substantially more rigid than polypeptides, which explains the larger contour length of the DNA loops (rings). These rings would encode a protein comprising 100±200 amino acid resi- dues, which is typical of modern protein folds. In this review, we focus on the implications of the above polymer-statistical considerations for protein structure, evolution and folding. Length increments in protein evolution An evolving protein chain may grow by increments of one, several or many residues inserted into the chain or added to its ends [16]. There are many molecular mechanisms by which these changes can be brought about. We believe that proteins (and their respective genes) have passed through several evolutionary stages, each with its own characteristic increment. The existence or nonexistence of such char- acteristic size increments had, unfortunately, never been a focus of studies on protein evolution, which consequently was not considered to be a process with distinct steps or stages. However, one candidate unit size increment was detected as early as in 1929 by Svedberg in his ultracen- trifugation experiments. He wrote: ``The proteins...can, with regard to molecular weight, be divided into four subgroups.... The molecular masses characteristic of the three higher sub-groups are - as a ®rst approximation - derived from molecular mass of the ®rst sub-group by multiplying by the integers two, three,...'' [18]. The ®rst estimate of this size increment, also by Svedberg, was about 160 amino acid residues. This is within range of recent estimates of protein domain sizes [19±21], 100±200 residues, irrespective of the type of fold (domain). Appar- ently, the observation of Svedberg re¯ects one of the latest stages of protein evolution (see below) Ð formation of multidomain protein structures. Another distinct size increment range, 25±35 residues, the contour length of the closed loops in proteins, as mentioned above, was ®rst detected only very recently [22 ,23±26]. The structural signi®cance and evolutionary implications of these two scale units of protein size are discussed below. Flexibility of polymer chains and loop closure A freely suspended ¯uctuating polymer chain may adopt numerous conformations in space. Its path may change direction at any point, so that, after some number of monomer steps, the ¯exible chain looses its original orientation. The ¯exibility is measured by the so-called persistence length (a), such that the average direction cosine drops e times after passing this length [27]. For example, according to experimental estimates by Flory [28], for mixed unstructured polypeptide chains a 4±5 110 Current Opinion in Structural Biology 2003, 13:110±114 www.current-opinion.com

