template for electronic submission to acs journals web viewconsidering the environment 4 billion...

36
Frozen, but No Accident – Why the 20 Standard Amino Acids were Selected Andrew J. Doig Department of Chemistry and Manchester Institute of Biotechnology, 131 Princess Street, University of Manchester, Manchester M7 1DN, UK Email: [email protected] Running Title: Why the 20 Standard Amino Acids were Selected 1

Upload: vanthien

Post on 23-Feb-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Frozen, but No Accident – Why the 20 Standard

Amino Acids were Selected

Andrew J. Doig

Department of Chemistry and Manchester Institute of Biotechnology, 131 Princess Street,

University of Manchester, Manchester M7 1DN, UK

Email: [email protected]

Running Title: Why the 20 Standard Amino Acids were Selected

Keywords: protein structure; evolution; RNA world; protein folding; protein stability

1

Page 2: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Abstract

The 20 standard amino acids encoded by the Genetic Code were adopted during the RNA World,

around 4 billion years ago. This amino acid set could be regarded as a frozen accident, implying

that other possible structures could equally well have been chosen to use in proteins. Amino

acids were not primarily selected for their ability to support catalysis, since the RNA World

already had highly effective cofactors to perform reactions, such as oxidation, reduction and

transfer of small molecules. Rather, they were selected to enable the formation of soluble

structures with close-packed cores, allowing the presence of ordered binding pockets. Factors to

take into account when assessing why a particular amino acid might be used include: its

component atoms, functional groups, biosynthetic cost, use in a protein core or on the surface,

solubility and stability. Applying these criteria to the 20 standard amino acids, and considering

some other simple alternatives that are not used, we find that there are excellent reasons for the

selection of every amino acid. Rather than being a frozen accident, the set of amino acids

selected appears to be near ideal.

2

Page 3: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Introduction

Why the particular 20 amino acids were selected to be encoded by the Genetic Code remains a

puzzle. Is this standard set the optimal choice or would many other amino acids be just as useful?

While we can expect some arbitrary choices, where a selected amino acids is as good as another

not used, how much luck was involved with this choice, leaving us with a frozen or historical

accident [1] that we are now unable to change? Here, I argue that there are excellent reasons for

using (or not using) each possible amino acid and that the set used is near optimal.

The RNA World

Protein synthesis is likely to have arisen following the RNA World [2], approximately 4 billion

years ago, a time between the origin of life and the Last Universal Common Ancestor (LUCA) of

all life on Earth. Life then used RNA, cofactors and metals to perform catalysis and DNA as the

genetic material. The planetary atmosphere was mildly reducing, with no free oxygen, so

metabolism was entirely anaerobic.

Cofactors, such as NAD, flavins, S-adenosyl methionine, pyridoxal phosphate and many

metals, are superb catalysts for a wide range of chemical reactions, notably redox reactions,

transfer of small groups and electron transfer. Their essentiality for metabolism and the presence

of RNA fragments in their structures, suggests that they were present in the RNA World [3-7].

Since redox chemistry and small group transfer is so well covered by cofactors, there was no

need for proteins to perform these functions. Amino acids were instead selected to promote

folding into close-packed structures, forming binding pockets bound by non-polar and charged

groups, and with oriented hydrogen bond donors and acceptors.

3

Page 4: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

In terms of the three simple properties of charge, size and hydrophobicity, the selected

amino acids show a high diversity compared to random alternative sets [8, 9]. While a wide

range of these properties is clearly desirable, additional factors must also play a role.

Weber & Miller considered the same question, of why these 20 amino acids, in 1981, making

many insightful points, though largely considering the ease of synthesis of each amino acid,

particularly in prebiotic conditions, and their susceptibility to unwanted chemical reactions [10].

If protein synthesis arose from the RNA world, however, life was already biochemically

sophisticated and the environment was substantially modified from the conditions prevailing

during abiogenesis. Arguments based on prebiotic conditions are thus not especially helpful in

rationalizing amino acid selection. Cleaves discussed the origin of the biologically coded amino

acids in 2010, with his review concentrating on prebiotic syntheses, stability, chirality,

biosynthetic accessibility and cost [11]. Here, I re-evaluate the amino acids, mostly from the

perspective of how they affect protein structure and stability. I will assume that proteins are only

composed of -amino acids, though alternative biopolymers is a fascinating field in itself.

Criteria for Selecting Amino Acids

Choice of Atoms. Are there alternatives to C, H, N, O and S? Amino acids need to be made

of atoms that are abundant on Earth and have useful properties. Atoms such as Selenium and

Antimony might have interesting chemistry, but their rarity precludes their use. Metals are too

soluble in water to be retained in a side chain. Halogens polarise bonds, due to their high

electronegativity, and can be prone to nucleophilic attack to give their ions. Silicon and

phosphorus are likely to be the only plausible additional atoms that could be used. Phosphorus is

not especially abundant, however, and was already heavily used in the RNA World. Protein

phosphorylation is widespread, though this takes place at selected sites only, perhaps to help

4

Page 5: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

avoid unnecessary use of this essential atom. Phosphoserine is also easily hydrolysed. Silicon is

abundant, but has a strong tendency to form four bonds to oxygen, rather than any other atom.

Organosilanes appear to have few functional groups that would have a clear advantage over

those formed by carbon.

Functional Groups. The choice of functional groups is rather limited in small

molecules when using only C, H, O, N or S. Amides, amines, hydroxyls, carboxyls and carbon-

nitrogen bonds, present in the standard set, are stable groups that can form hydrogen bonds and

electrostatic interactions. Esters, anhydrides, nitriles and many other groups are too readily

hydrolysed in water. Ketones and aldehydes are too easily reduced or oxidised, and are

susceptible to nucleophilic attack. Carbon-carbon double and triple bonds are also more reactive

than single bonds.

Biosynthetic Cost. Protein synthesis takes a major share of the energy resources of a cell [12].

Table 1 shows the cost of biosynthesis of each amino acid, measured in terms of number of

glucose and ATP molecules required. These data are often non-intuitive. For example, Leu costs

only 1 ATP, but its isomer Ile costs 11. Why would life ever therefore use Ile instead of Leu, if

they have the same properties? Larger is not necessarily more expensive; Asn and Asp cost more

in ATP than their larger alternatives Gln and Glu, and large Tyr costs only two ATP, compared

to 15 for small Cys. The high cost of sulphur containing amino acids is notable.

Burial and Surface. Proteins have close-packed cores with the same density as organic

solids and side chains fixed into a single conformation [13]. A solid core is essential to stabilize

proteins and to form a rigid structure with well-defined binding sites. Non-polar side chains have

therefore been selected to stabilise close-packed hydrophobic cores. Conversely, proteins are

5

Page 6: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

dissolved in water, so other side chains are used on a protein surface to keep them soluble in an

aqueous environment.

Solubility. Amino acids need to be soluble in the highly concentrated aqueous environment

inside the cell. The least soluble amino acid at pH 7 in water is Tyr

(https://www.anaspec.com/html/amino_acids_properties.html), so any less soluble than this may

not be acceptable.

Stability. Even with stable functional groups, some amino acids are prone to unwanted

reactions, such as cyclisation or acyl transfer, that can lead to decomposition or racemisation.

Which Amino Acids Came First?

It is plausible that the first proteins used a subset of the 20 and a simplified Genetic Code,

with the first amino acids acquired from the environment. Several studies agree on a consensus

set of these amino acids, comprising Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr, and Val [14-17].

It is likely that folded proteins, with some desirable function, can be produced from a subset of

the 20 [15, 18, 19], though this has not yet been demonstrated for this set of 10. For example,

Longo et al., showed that β-trefoil proteins can be produced using just these 10 plus Asn, Gln

and Arg in high salt [20]. Their set is notable for containing amino acids with a high propensity

for -helices (e.g. Ala and Leu) and -sheets (e.g. -branched Val, Ile and Thr), though no

positively charged side chains. Additional amino acids required the evolution of metabolic

pathways, increasing the set to 20. As additional amino acids are added to the code, the

advantage of adding a further amino acid decreases compared to the risk of adding too many

deleterious mutations simultaneously [21].While these 10 might be readily available for the first

proteins, they were not the only amino acids present. The question still remains as to why these

10 were selected from a much larger pool of amino acids.

Energetics of Protein Folding

6

Page 7: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Folded proteins are stabilised by hydrogen bonding, removal of non-polar groups from water

(hydrophobic effect), van der Waals forces, salt-bridges and disulphide bonds. Folding is

opposed by loss of conformational entropy, where rotation around bonds is restricted, and

introduction of strain. These forces are well balanced, so that the overall free energy changes for

all the steps in protein folding are close to zero.

Protein folding can be broken down into three stages (Table 2). Whether a protein actually

folds by this mechanism (though it may well do [22, 23]) does not matter here, as I am simply

showing alternative environments for a side chain. Firstly, an unfolded protein can form isolated

secondary structure. G for forming a poly(Ala) -helix in water is close to zero [24], so the

penalty for restricting the backbone to a helical conformation is nearly equal and opposite to the

benefits of forming amide-amide hydrogen bonds. Secondly, non-polar surfaces on secondary

structural elements meet, excluding water, and forming a fluid, non-polar, liquid-like core.

Formation of these dry molten globules is driven by the hydrophobic effect. Hydrogen bonds

may also strengthen as the dielectric is lower in a non-polar environment. Finally, the molten

globule freezes, forming a folded protein. Side-chains that were able to rotate in the molten

globule are now locked into a single rotamer, costing conformational entropy [25], offset by

improved van der Waals bonding [22]. A rough cost for freezing a side chain is given by the

number of rotatable bonds (Table 1; Fig. 1). Strain is also introduced, as many side chains are

forced to adopt conformations that differ from their most stable state [26-28]. Finally, all stages

are at risk of aggregation, giving useless or even toxic aggregates.

Isolated secondary structure, molten globules and aggregates are potential traps that proteins

must avoid. Protein folding may stall at the isolated secondary structure stage, if non-polar

surface area to be buried is too small, and at the molten globule stage, if the cost of freezing the

7

Page 8: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

side chains is too high. Non-polar side chains should therefore have a high surface area to

maximise bonding, while not be too flexible, which would make the loss of conformational

entropy in the freezing step too high. A protein surface must have polar groups to keep the

protein soluble. If -sheets are overly favoured, non-functional and possibly toxic amyloid may

form.

Proteins are also stabilised by hydrogen bonds using side chains, either to other side chains or

the backbone. While a hydrogen bond might be intrinsically strong, formation of the bond

requires fixing the side chain, costing conformational entropy, and strain if the conformation is

non-optimal.

The 20 Standard Amino Acids

Considering the environment 4 billion years ago, and general reasons for choosing side chains,

we can now consider the 20 individually:

Gly Gly is tiny, cheap and has the most flexible backbone. The complete lack of a side chain

means it can adopt conformations unreachable by other amino acids and its lack of side chain

means it can enter narrow spaces.

Ala Ala is small and cheap. Its CH3 side chain has little surface to form van der Waals bonds,

but it has a profound effect on the backbone, restricting it to either the or conformations. It

is therefore more energetically favourable to put Ala into secondary structure than Gly. The

formation of an isolated -helix in water restricts rotations in side chains in all amino acids,

except Ala and Gly [29]. Hence, Ala has the highest preference to be in isolated -helices [30].

Val, Leu, Ile, Phe These hydrocarbon side chains are clearly present to drive protein folding

by forming hydrophobic cores, yet why exactly these were selected deserves an explanation.

Firstly, why are there so many hydrocarbon side chains, rather than just, say, Leu? Multiple

8

Page 9: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

hydrocarbon side chains may be used as they are required to fill a protein core with no clashes

and no holes. A variety of pieces are required to fit all the gaps within a protein core. Each can

adopt a number of rotamers with similar energies [31], thus giving a range of possible shapes for

each side chain. Thus Leu and Ile are both needed to increase the range of possible hydrocarbon

3D shapes, despite Ile being more costly to synthesise (Table 1).

Val, Leu, Ile and Phe are striking in that they all have branched carbons, rather than straight

chains. Using a branch gives one fewer dihedral angle to be fixed compared to a straight chain

[32]. Side chains therefore enter a protein core not just because they are hydrophobic, but also

because they do not lose too much conformational entropy when they fold. Similarly, rings are

rigid, so Phe has a large hydrocarbon surface and only two bonds to be fixed.

Hydrocarbon side chains larger than Phe, Ile or Leu may not be used, as they would be less

soluble as amino acids. A cyclohexane ring is also less soluble than a benzene ring [33], so Phe

is used, rather than cyclohexylalanine.

Ile and Thr differ from the other 18 in that they have centres of chirality in their side chains.

Alloisoleucine has the opposite chirality at its C. The selection of isoleucine over alloisoleucine

seems to be chance.

Met The explanation above for why branched side chains are preferred over straight chains

fails for Met. Met has three bonds to be fixed if it is used in a protein core (Table 1), giving less

stabilisation then one might expect for its size. The first use of Met in protein is likely not to

have been for protein stability, however, but rather for initiation of protein synthesis, using the

AUG codon and formylMet. Occasional use of AUG would then allow Met to be found at other

sites in a protein, such as in forming Sulphur-aromatic interactions [34]. It may also be useful in

forming a hydrophobic core with its unique shape. Met is prone to oxidation at its sulphur,

9

Page 10: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

potentially leading to loss of protein function, but this would not have occurred before the Great

Oxidation Event around 2.3 billion years. Met is also the most expensive amino acid to

biosynthesise (Table 1). Life is now stuck with this non-ideal amino acid.

Lys, Arg Lys and Arg are used to give a protein solubility and for catalysis. They are rarely

buried, not only because they have positively charged groups, but also because they are on the

ends of long, flexible, straight chains, with four rotatable bonds. Their functional groups are also

valuable when a positive charge is needed. Arg, in particular, is a superb hydrogen bond donor,

with five NH bonds, polarised due to its delocalised positive charge.

Non-polar residues, such as Leu and Ile, are also often found on the protein surface [35]. One

might expect this to be a problem for the protein, as they will reduce protein solubility. If there

are Lys or Arg side chains nearby, however, non-polar groups can form hydrophobic interactions

with the CH2 groups in the Lys and Arg side chains [36]. Non-polar side chains would thus be

less tolerated on the surface if charged groups had fewer CH2 groups.

Ser, Thr, Asn, Gln, Tyr These amino acids frequently hydrogen bond to other side chains,

amide groups in the backbone, to substrates or to ligands. The intrinsic benefit of hydrogen bond

formation is offset by the conformational entropic cost of fixing the side chain in position and

strain. Functional groups selected for hydrogen bonding are thus on the ends of short chains,

with just one or two CH2 groups, so that the cost of restricting them is low and their biosynthetic

cost is minimised.

The aromatic ring in Tyr lowers the pKa of its OH, making it easier to form an O - group and

act as an acid. Tyr can also form functional radicals, such as in ribonucleotide reductase,

photosystem II and prostaglandin H synthase [37], and transport electrons in pili [38].

10

Page 11: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Allothreonine, with the opposite chirality at its C, decomposes twice as fast [39], so threonine

may have been selected for this reason.

Asp, Glu The carboxyl group is stable, has a negative charge for strong electrostatic bonds,

binds metals, is an excellent hydrogen bond acceptor, increases protein solubility and can

transfer protons. Its two-fold symmetry means that the entropic cost of fixing the carboxyl group

is low.

Since Asp and Glu (and Asn and Gln) have the same functional groups, why do we have two

pairs of these amino acids? At some sites Asp and Glu (or Asn and Gln) can readily substitute for

each other. However, this is not always the case. The first residue preceding the -helix is the N-

cap; Asp and Asn are strongly favoured here since they can accept hydrogen bonds from free

backbone NH groups [40, 41]. Glu and Gln cannot do this, as their extra CH2 group pushes their

functional groups out beyond the helix terminus. Similarly, at the second (N2) position of an -

helix, Gln and Glu are frequently found since they can loop around and hydrogen bond to the

backbone NH groups [42]. Asn and Asp are too short to form these rings. Glu to Asp or Gln to

Asn can sometimes thus be very non-conservative substitutions, so all four are used.

Cys A key function of proteins is to bind metals. Soft metals, such as copper, zinc and

cadmium, bind more tightly to sulphur than to oxygen. Cys may therefore have been selected for

its metal binding properties, despite its high biosynthetic cost (Table 1). In particular, Cys is

commonly used to bind iron-sulphur clusters. These cofactors are found in a wide variety of

metalloproteins, playing crucial roles in metabolism and are very ancient [43]. The SH side chain

is also an effective nucleophile; it has a lower pKa than OH, readily forming S-.

Cys is the only side chain able to perform redox reactions, by forming disulphide bonds to

stabilise folded proteins. Cys was selected in an anaerobic environment, however, where

11

Page 12: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

disulphide bond formation would have been rare or non-existent. It is therefore not plausible than

Cys was selected for its ability to form disulphides, and its subsequent use for this purpose,

starting very approximately 1.5 billion years later after the Great Oxidation Event, is a lucky

accident.

His While the range of chemical reactions available to the 20 side chains is dismally poor,

they are good at proton transfer. Rates of proton transfer are maximised when the general acid

has a pKa that is the same as the environment. His, with a pKa of around 6.5, that is easily

tweaked by varying its environment, is an excellent side chain for general acid and base

catalysis, and is thus abundantly found in active sites requiring proton transfer. It is also often

found binding metals.

Pro The unique structure of Pro, with no backbone NH group and its ring restricting its

backbone, means that it is incompatible with many sites in a protein. Nevertheless, it can be

highly stabilising, if its ring locks it into a desired conformation, such as within turns. Indeed, it

has the lowest conformational entropic cost of folding of any amino acid. Pro is the simplest

stable ring structure linking the C and N in an amino acid.

Trp Trp with its double aromatic ring is the least abundant amino acid and the most expensive

to synthesise. It has some distinct properties: Trp, with an absorption maximum at around 280

nm, is well suited to be a UV-B chromophore [44]. The UVR8 protein uses an excitonically

coupled Trp pyramid for this purpose in plants, for example [45], detecting potentially damaging

levels of UV radiation. The Trp indole chromophore may therefore have been selected to help

protect organisms from the intense UV radiation that was present on early Earth in the absence of

an ozone layer.

12

Page 13: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Radical formation is often seen as problematic, as it can lead to oxidative damage, and Trp is

most susceptible to radical formation [46]. This ability to form radicals can be used by proteins

for electron transfer, with chains of Trp used to transfer electrons across a protein [47, 48].

Amino Acids Not Selected

We can now consider why some amino acids were not selected (Fig. 2), even though they are

unlikely to be difficult or expensive to synthesise.

D-amino acids Mirror image D-amino acids can adopt unusual backbone conformation to

drive the formation of specific folded structures, such as turns and helix terminators [49].

Nevertheless, D-amino acids have steric clashes with the backbone when in secondary structure.

Proteins must therefore be either all L or all D so that these helices and sheets can form, and this

choice appears to be arbitrary.

Disubstituted at C All 20 amino acids have a hydrogen at the C, but it is possible to

have a carbon here instead. The simplest of these disubstituted amino acids is -aminoisobutyric

acid, with two methyl groups attached to the C. This additional side chain restricts the

conformation of the backbone to a helix, with the rare 310-helix favoured over the more common

-helix [50]. Disubstituted amino acids may therefore not be used, due to this lack of flexibility.

Trisubstituted at C If the H attached to the C in Val is replaced by another CH3, we

would have a C(CH3)3 side chain, giving t-leucine. This would be very hydrophobic for just one

rotatable bond. However, side chain selection is a balance between its stabilising effect on a

protein and the ability to adopt a range of conformations and a quaternary carbon would be

highly restrictive.

Straight Chain Hydrocarbons It is remarkable that the simplest hydrocarbon side chain

after Ala, CH2CH3 (α-aminobutyric acid or homoalanine), is not used. Similarly, norvaline and

13

Page 14: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

norleucine, which have CH2CH2CH3 and CH2CH2CH2CH3, respectively, are not used, despite

each having a similar hydrophobicity to their isomers. However, as explained above, they cost

more conformational entropy to fold than their branched isomers. If Ala is extended to α-

aminobutyric acid, it gains an additional rotamer to be fixed; extending it further to Val gives

extra non-polar surface without adding any more bonds to be fixed. Hence, Val is favoured over

α-aminobutyric acid. Straight chain hydrocarbons favour isolated -helix formation [51], so

would promote the formation of solvent exposed helices or dry molten globules, not folded

proteins. Similarly, lipids use long straight chains to ensure that membranes remain fluid.

Norvaline and norleucine are also prone to misincorporation in proteins, in place of leucine or

methionine, respectively [52].

Short Chain Amines The amine group NH3+ is found on the end of a long chain of four

CH2 groups. Shorter chains, notably CH2NH3+, would have the same functional group and cost

less to biosynthesise. The Lys side chain has been selected to be on the protein surface, driven

there, not just by the presence of the highly polar group, but also by the amine being on the end

of a long straight chain, which maximises the number of bonds to be fixed if the side chain was

buried. The CH2NH3+ side chain is not used since it is conflicted - its lack of carbons means that

it would have few rotamers to fix, thus encouraging its burial, while the amine group would

prefer the surface. Amino acids with amine side chains are also prone to acyl transfer and

lactamisation (Fig. 3), with the likelihood of these unwanted reactions decreasing with chain

length [53, 54]. Arginine analogues with one or two CH2 groups can also cyclise (Fig. 4) [10].

Long Chain Carboxyls Carboxyls are found with just one or two CH2 groups, and not on

the end of long straight chains. Asp and Glu are used to hydrogen bond or transfer protons. If

they were on the ends of long chains, it would cost more conformational entropy for them to fold

14

Page 15: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

into a single conformation for hydrogen bonding. Long chain carboxyls could play a similar role

to Lys with solubility, but having two polar atoms in the carboxyl group, compared to one in an

amine, makes the carboxyl more suitable for hydrogen bonding than the amine.

Functional Groups Bonded to C Functional groups are never bonded directly to the

C; thus we see a CH2 between a hydroxyl, carboxyl, thiol, amide, indole, phenyl or imidazole

group and the C. Large groups attached to the C, such as indole, phenyl and imidazole greatly

restrict the allowed conformations of the backbone. Aromatic or carboxyl groups attached to the

C will allow easy racemisation at the C, as proton loss here is facilitated by the aromatic

rings. Amino acids with OH or SH attached to the C are unstable. In general, a CH2R side chain

is most often preferred, since it allows and structure, is not too expensive, is stable and does

not lose too much conformational entropy when folded.

Conclusion

While we will never know for sure what happened 4 billion years ago, many of the ideas

discussed here are testable, principally by introducing unnatural amino acids into proteins or only

using a subset of amino acids. For example, it may be impossible, or at least much more difficult,

to make folded, functional proteins that use only straight chain hydrocarbon side chains, lack

short chain hydrogen bonding groups, or use short chain positive side chains instead of Lys or

Arg.

There are excellent reasons for the choice of every one of the twenty amino acids and the non-

use of other apparently simple alternatives. If all else fails, one can resort to chance or a “frozen

accident”, as an explanation. The only true frozen accident may be using isoleucine rather than

alloisoleucine. I therefore believe that when we study the xenobiochemistry of life on other

15

Page 16: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

planets, or if we reran the events at the end of the RNA World, alien organisms will be using

much the same group of amino acids that we find here on Earth.

Acknowledgements

I thank my colleagues Simon Lovell, Alex Jones, Douglas Kell, Pedro Mendes, Andy Munro and

Simon Webb for many helpful discussions.

References

1. Crick, F. H. C. (1968) The Origin of the Genetic Code, J Mol Biol. 38, 367-379.2. Pressman, A., Blanco, C. & Chen, I. A. (2015) The RNA World as a Model System to Study the Origin of Life, Current Biology. 25, R953-R963.3. White, H. B. (1976) Coenzymes as Fossils of an Earlier Metabolic State, J Mol Evol. 7, 101-104.4. Jadhav, V. R. & Yarus, M. (2002) Coenzymes as coribozymes, Biochimie. 84, 877-888.5. Saran, D., Frank, J. & Burke, D. H. (2003) The tyranny of adenosine recognition among RNA aptamers to coenzyme A, BMC Evol Biol. 3.6. Denessiouk, K. A., Rantanen, V. V. & Johnson, M. S. (2001) Adenine recognition: A motif present in ATP-, CoA-, NAD-, NADP-, and FAD-dependent proteins, Proteins-Structure Function and Genetics. 44, 282-291.7. Chen, X., Li, N. & Ellington, A. D. (2007) Ribozyme catalysis of metabolism in the RNA world, Chem Biodivers. 4, 633-655.8. Philip, G. K. & Freeland, S. J. (2011) Did Evolution Select a Nonrandom "Alphabet" of Amino Acids?, Astrobiology. 11, 235-240.9. Ilardo, M., Meringer, M., Freeland, S., Rasulev, B. & Cleaves, H. J. (2015) Extraordinarily Adaptive Properties of the Genetically Encoded Amino Acids, Sci Rep. 5.10. Weber, A. L. & Miller, S. L. (1981) Reasons for the Occurrence of the Twenty Coded Protein Amino Acids, J Mol Evol. 17, 273-284.11. Cleaves, H. J. (2010) The origin of the biologically coded amino acids, J Theor Biol. 263, 490-498.12. Akashi, H. & Gojobori, T. (2002) Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis, Proceedings of the National Academy of Sciences of the United States of America. 99, 3695-3700.13. Richards, F. M. (1977) Areas, Volumes, Packing, and Protein-Structure, Annual Review of Biophysics and Bioengineering. 6, 151-176.14. Brooks, D. J., Fresco, J. R., Lesk, A. M. & Singh, M. (2002) Evolution of amino acid frequencies in proteins over deep time: Inferred order of introduction of amino acids into the genetic code, Mol Biol Evol. 19, 1645-1655.

16

Page 17: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

15. Doi, N., Kakukawa, K., Oishi, Y. & Yanagawa, H. (2005) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids, Protein Eng Des Sel. 18, 279-284.16. McDonald, G. D. & Storrie-Lombardi, M. C. (2010) Biochemical Constraints in a Protobiotic Earth Devoid of Basic Amino Acids: The "BAA(-) World", Astrobiology. 10, 989-1000.17. Longo, L. M. & Blaber, M. (2012) Protein design at the interface of the pre-biotic and biotic worlds, Arch Biochem Biophys. 526, 16-21.18. Davidson, A. R., Lumb, K. J. & Sauer, R. T. (1995) Cooperatively Folded Proteins in Random Sequence Libraries, Nature Structural Biology. 2, 856-864.19. Riddle, D. S., Santiago, J. V., BrayHall, S. T., Doshi, N., Grantcharova, V. P., Yi, Q. & Baker, D. (1997) Functional rapidly folding proteins from simplified amino acid sequences, Nature Structural Biology. 4, 805-809.20. Longo, L. M., Lee, J. & Blaber, M. (2013) Simplified protein design biased for prebiotic amino acids yields a foldable, halophilic protein, Proceedings of the National Academy of Sciences of the United States of America. 110, 2135-2139.21. Higgs, P. G. (2009) A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code, Biol Direct. 4.22. Baldwin, R. L., Frieden, C. & Rose, G. D. (2010) Dry molten globule intermediates and the mechanism of protein unfolding, Proteins. 78, 2725-2737.23. Baldwin, R. L. & Rose, G. D. (2013) Molten globules, entropy-driven conformational change and protein folding, Curr Opin Struct Biol. 23, 4-10.24. Scholtz, J. M., Marqusee, S., Baldwin, R. L., York, E. J., Stewart, J. M., Santoro, M. & Bolen, D. W. (1991) Calorimetric determination of the enthalpy change for the -helix to coil transition of an alanine peptide in water, Proc Nat Acad Sci USA. 88, 2854-8.25. Doig, A. J. & Sternberg, M. J. (1995) Side-chain conformational entropy in protein folding, Prot Sci. 4, 2247-51.26. Maiorov, V. & Abagyan, R. (1998) Energy strain in three-dimensional protein structures, Fold Des. 3, 259-269.27. Penel, S. & Doig, A. J. (2001) Rotamer strain energy in protein helices - Quantification of a major force opposing protein folding, J Mol Biol. 305, 961-968.28. Ventura, S., Vega, M. C., Lacroix, E., Angrand, I., Spagnolo, L. & Serrano, L. (2002) Conformational strain in the hydrophobic core and its implications for protein folding and design, Nature Structural Biology. 9, 485-493.29. Creamer, T. P. & Rose, G. D. (1992) Side-chain entropy opposes -helix formation but rationalizes experimentally determined helix-forming propensities, Proc Nat Acad Sci USA. 89, 5937-41.30. Rohl, C. A., Chakrabartty, A. & Baldwin, R. L. (1996) Helix propagation and N-cap propensities of the amino acids measured in alanine-based peptides in 40 volume percent trifluoroethanol, Prot Sci. 5, 2623-37.31. Lovell, S. C., Word, J. M., Richardson, J. S. & Richardson, D. C. (2000) The penultimate rotamer library, Proteins: Struc Funct Genet. 40, 389-408.32. Doig, A. J. (1996) Thermodynamics of amino acid side-chain internal rotations, Biophys Chem. 61, 131-141.33. McAuliffe, C. (1966) Solubility in Water of Paraffin, Cycloparaffin, Olefin, Acetylene, Cycloolefin, and Aromatic Hydrocarbons, J Phys Chem. 70.

17

Page 18: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

34. Valley, C. C., Cembran, A., Perlmutter, J. D., Lewis, A. K., Labello, N. P., Gao, J. & Sachs, J. N. (2012) The Methionine-aromatic Motif Plays a Unique Role in Stabilizing Protein Structure, Journal of Biological Chemistry. 287, 34979-34991.35. Chothia, C. (1976) The nature of the accessible and buried surfaces in proteins, J Mol Biol. 105, 1-14.36. Andrew, C. D., Penel, S., Jones, G. R. & Doig, A. J. (2001) Stabilising non-polar/polar side chain interactions in the -helix, Proteins: Struct Funct Genet. 45, 449-455.37. Pedersen, J. Z. & Finazziagro, A. (1993) Protein-Radical Enzymes, Febs Letters. 325, 53-58.38. Vargas, M., Malvankar, N. S., Tremblay, P. L., Leang, C., Smith, J. A., Patel, P., Synoeyenbos-West, O., Nevin, K. P. & Lovley, D. R. (2013) Aromatic Amino Acids Required for Pili Conductivity and Long-Range Extracellular Electron Transport in Geobacter sulfurreducens, mBio. 4.39. Schroeder, R. A. & Bada, J. L. (1977) Kinetics and Mechanism of the Epimerization and Decomposition of Threonine in Fossil Foraminifera, Geochim Cosmochim Acta. 41, 1087-1095.40. Serrano, L. & Fersht, A. R. (1989) Capping and -helix stability, Nature. 342, 296-9.41. Doig, A. J. & Baldwin, R. L. (1995) N- and C-capping preferences for all 20 amino acids in -helical peptides, Prot Sci. 4, 1325-36.42. Penel, S., Hughes, E. & Doig, A. J. (1999) Side-chain structures in the first turn of the -helix, J Mol Biol. 287, 127-143.43. Wachtershauser, G. (1992) Groundworks for an Evolutionary Biochemistry - The Iron Sulfur World, Prog Biophys Mol Biol. 58, 85-201.44. Fritsche, E., Schafer, C., Calles, C., Bernsmann, T., Bernshausen, T., Wurm, M., Hubenthal, U., Cline, J. E., Hajimiragha, H., Schroeder, P., Klotz, L. O., Rannug, A., Furst, P., Hanenberg, H., Abel, J. & Krutmann, J. (2007) Lightening up the UV response by identification of the arylhydrocarbon receptor as a cytoplasmatic target for ultraviolet B radiation, Proceedings of the National Academy of Sciences of the United States of America. 104, 8851-8856.45. Christie, J. M., Arvai, A. S., Baxter, K. J., Heilmann, M., Pratt, A. J., O'Hara, A., Kelly, S. M., Hothorn, M., Smith, B. O., Hitomi, K., Jenkins, G. I. & Getzoff, E. D. (2012) Plant UVR8 Photoreceptor Senses UV-B by Tryptophan-Mediated Disruption of Cross-Dimer Salt Bridges, Science. 335, 1492-1496.46. Hawkins, C. L. & Davies, M. J. (2001) Generation and propagation of radical reactions on proteins, Biochim Biophys Acta-Bioenerg. 1504, 196-219.47. Park, H. W., Kim, S. T., Sancar, A. & Deisenhofer, J. (1995) Crystal-Structure of DNA Photolyase from Escherichia-Coli, Science. 268, 1866-1872.48. Chaves, I., Pokorny, R., Byrdin, M., Hoang, N., Ritz, T., Brettel, K., Essen, L. O., van der Horst, G. T. J., Batschauer, A. & Ahmad, M. (2011) The Cryptochromes: Blue Light Photoreceptors in Plants and Animals in Annual Review of Plant Biology, Vol 62 (Merchant, S. S., Briggs, W. R. & Ort, D., eds) pp. 335-364, Annual Reviews, Palo Alto.49. Mahalakshmi, R. & Balaram, P. (2006) The Use of D-Amino Acids in Peptide Design in D-Amino Acids: A New Frontier in Amino Acid and Protein Research - Practical Methods and Protocols (Konno, R., Brückner, H., D'Aniello, A., Fisher, G. H., Noriko, F. & Homma, H., eds) pp. 415-430, Nova Science Publishers.50. Karle, I. L. & Balaram, P. (1990) Structural characteristics of -helical peptide molecules containing Aib residues, Biochemistry. 29, 6747-6756.51. Padmanabhan, S. & Baldwin, R. L. (1991) Straight-chain non-polar amino acids are good helix-formers in water, J Mol Biol. 219, 135-7.

18

Page 19: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

52. Alvarez-Carreno, C., Becerra, A. & Lazcano, A. (2013) Norvaline and Norleucine May Have Been More Abundant Protein Components during Early Stages of Cell Evolution, Orig Life Evol Biosph. 43, 363-375.53. Poduska K, K. G., Silaev AB, Rudinger, J. (1965) Amino acids and peptides. LII. Intramolecular aminolysis of amide bonds in derivatives of α,γ-diaminobutyric acid, α,β-diaminopropionic acid, and ornithine, Collect Czech Chem Comm. 30, 2410-2433.54. Hettinge.Tp & Craig, L. C. (1970) Edeine. 4. Structures of Antibiotic Peptides Edeine-A1 and Edeine-B1, Biochemistry. 9, 1224-&.55. Phillips, R., Kondev, J., Theriot, J. & Garcia, H. G. (2013) Physical Biology of the Cell, 2nd edn, Garland Science.

19

Page 20: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Table 1. Amino Acid Properties

Amino Acid

Molecules per Cell[55]

Biosynthetic Cost in Glucose Equivalents

Anaerobic Biosynthetic Cost in ATP Equivalents [55]

Number of Rotatable Bonds

Ala 2.9 x 10-8 0.5 1 0

Arg 1.7 x 10-8 0.5 13 4

Asn 1.4 x 10-8 0.5 5 2

Asp 1.4 x 10-8 0.5 2 2

Cys 5.2 x 10-7 0.5 15 1

Glu 1.5 x 10-8 0.5 -1 3

Gln 1.5 x 10-8 0.5 0 3

Gly 3.5 x 10-8 0.5 2 0

His 5.4 x 10-7 1 7 2

Ile 1.7 x 10-8 1 11 2

Leu 2.6 x 10-8 1.5 1 2

Lys 2.0 x 10-8 1 9 4

Met 8.8 x 10-7 1 23 3

Phe 1.1 x 10-8 2 2 2

Pro 1.3 x 10-8 0.5 4 0

Ser 1.2 x 10-8 0.5 2 1

Thr 1.5 x 10-8 0.5 8 1

Trp 3.3 x 10-7 2.5 7 2

Tyr 7.9 x 10-7 2 2 2

Val 2.4 x 10-8 1 2 1

20

Page 21: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Table 2. Possible Structures of a Protein

Structure

Unfolded Isolated Secondary Structure

Dry Molten Globule

Folded Aggregated

Secondary Structure

None -Helices and/or -sheets

-Helices and/or -sheets

-Helices and/or -sheets

Low -helix; High -sheet

Properties Side Chains Solvent exposed

Solvent exposed Buried non-polar side chains. Fluid, liquid-like core, with side chains able to rotate

Buried non-polar side chains. Rigid, solid-like core, with side chains locked into a single rotamer

Buried non-polar side chains. Rigid, solid-like core, with side chains locked into a single rotamer

Main Thermodynamic Changes upon Formation

Loss of conformational entropy offset by hydrogen bonding and water released from polar groups

Loss of conformational entropy offset by gain in entropy of water released from non-polar groups

Gain of van der Waals and hydrogen bonds, offset by loss of conformational entropy and introduction of strain

Loss of conformational entropy offset by gain in entropy of water released from protein surface

21

Page 22: Template for Electronic Submission to ACS Journals Web viewConsidering the environment 4 billion years ago, ... While we will never know for sure what happened 4 billion years ago,

Figure Legends

Fig. 1. The 20 Standard Amino Acids showing Rotatable Bonds

Fig. 2. Examples of Amino Acids not Selected. A) D-Alanine. B) Norvaline. C) t-Leucine. D) -

aminoisobutyric acid E) diaminopropionic acid. F) Homoglutamic acid

Fig. 3. Acyl Transfer and Lactamisation in Amino Acids with Amines

Fig. 4. Cyclisation of Arg Analogues

22