bioinf1 protein structure - lunds universitet · to helix formation. the α-helix is right-handed...
TRANSCRIPT
Protein Structure
Principles & Architecture
Marjolein ThunnissenDep. of Biochemistry & Structural BiologyLund UniversitySeptember 2011
Bioinformatics
Homology, pattern and 3D structure searches need databases and database managing tools, search technique and dedicated tools for sequence and structure comparison and detection of similarity, for homology modelling etc.All this is the object of bioinformatics
Why use bioinformatics?
•An explosive growth in the amount of biological information •A more global perspective in experimental design.•Data-mining.
• The potential for uncovering phylogenetic relationships and evolutionary patterns.
Role of (bio)informatics in drug discovery
Genome Gene Protein HTS Hit Lead Candidate drug
Genomics Bioinformatics Structural bioinformatics Chemoinformatics Structure-Based Drug Design
ADMET Modelling
Structural bioinformatics techniques are valuable in areas from target identification to lead discovery
Why study 3D structures of biological macromolecules?
1. FUNCTION IS STRUCTURE!
2. Sequence homology is not enough to identify functional relationships.
3. Protein folding is still not fully understood. Predictions do not work satisfactory.
4. Drug design. Pharmaceutical industry
Proteins are polymersProteins are formed by a chain of repeating molecules. One such molecule is called an amino-acid. There are 20 types of amino-acids but they have all a
common backbone or main-chain:
The protein chain is formed by linking the amino-acids together. The linkage is called the peptide bond:
The chain of amino-acids linked to each other by peptide bonds is also called: polypeptide chain.
In DNA code: 20 different amino-acids.
In proteins 20 different amino-acids are found. The names of the different aminoacids can be given as a 3 letter code or a 1 letter code: Alanine ——> Ala ——> A
The amino-acids can be divided into sub-groups dependent on the nature of their side-chain.
Group1 Hydrophobic Ala (A), Val (V), Leu (L), Ile (I), Phe (F), Pro (P) and Met (M) Group2 Charged Asp (D), Glu (E), Arg (R), Lys (K) Group3 Polar Ser (S), Thr (T), Cys (C), Asn (N), Gln (Q), His (H), Tyr (Y) and Trp (W)Group4 No special properties Gly (G)
Alternatively there is also a 5th group:Group 5 Aromatic rings Phe (F), Tyr (Y), Trp (W) and His (H)
The 20 amino-acids: hydrophobic residues
Alanine (Ala, A) Valine (Val, V) Proline (Pro, P)
Isoleucine (Ile, I)
Leucine (Leu, L)
Phenylalanine (Phe, F) Methionine (Met, M)
The 20 amino-acids: charged residues
Arginine (Arg, R) Lysine (Lys, K)
Aspartic acid (Asp, D) Glutamic acid (Glu, E)
The 20 amino-acids: polar residues
Serine (Ser, S) Threonine (Thr T) Tyrosine (Tyr, Y) Histidine (His, H)
Cysteine (Cys, C) Asparagine (Asn, N) Glutamine (Gln, Q) Tryptophan (Trp, W)
The 20 amino-acids: Glycine
Glycine, (Gly, G)
Structure in four dimensionsDue to the fact that there are 20 different amino-acids, proteins are described in different dimensions.
Primary Structure Amino-acid sequence.Secondary Structure Local regular structure: α-helices and β-sheets.Tertiary Structure Packing of secondary structure into one or several compact globular domains Quaternary Structure The overall if the protein exists out of several polypeptide chains.
Special properties of amino-acids
Since there are 4 different groups attached to the central Cα atom of an amino-acid (except for Glycine), it is an asymmetric atom.
Amino acids are therefore chiral molecules. There are two forms: L-form and D-form:
The natural configuration of amino acids in proteins is always the L-form.
Cysteines can form cross-links
Cysteine residues from different parts of the sequence can link together in a disulfide-bridge to form cross-links. The environment needs to be oxidative, within the cell the environment is reductive: cross-bridges are not often seen. Quite normal for extra-
cellular proteins.
These cross-links give extra stability to a protein structure. They can also link two polypeptide chains together.
Properties of the peptide bond
The peptide bond unit containing the atoms Cn, On, Nn+1 is a rigid plane with bond lengths and angles nearly the same for each of these units in a polypeptide chain.
The freedom in conformation of this chain comes from rotating around the bonds between Nn+1 - Cαn+1 and Cαn+1 -Cn+1
Phi-Psi angles
The rotation around N- Cα is called phi (φ) and the angle around Cα-C is called psi (ψ). Each amino acid is associated with these two conformational angles. If phi and psi for each residue is known: conformation of the whole backbone-chain is known since the peptide planes are so rigid.
Goto King Basic
Ramachandran plot
Most combinations of φ/ψ are not allowed since they would cause steric collisions between side chains and main chain (kinemage).
The φ/ψ pairs can be plotted against each other. Such a plot is called a
Ramachandran plot. The residues will cluster in certain areas. These areas are called after the secondary structure the residues have.
Ramachandran plot of barnase
Glycine residuesGlycine residues lack a side chain. Therefore they can have a much wide range of conformations then other residues. Glycines are used a lot to be able to have unusual main chain conformations (like a tight turn).
Low and high energy conformation (allowed and disallowed):
Certain side chain conformations are energetically more favourable than others: these are more frequently seen in proteins. These
conformers are called rotamers.
Rotamers for Phe
Go to King Basic no 4
Forces holding proteins together
Electrostatic interactions Ionic interactions e.g. salt bridges Dipolar interactions dipole-dipole induced dipoleHydrogen bondsshared H-atomHydrophobic packingmainly entropic
Salt bridges and polar interactions
Ionic interactions occur either between fully charged groups (ionic), or between partially charged groups (dipole-dipole).
The force of attraction between δ+ and δ- decreases rapidly with distance. In the absence of water these interactions can be very strong.
In protein molecules ionic bonds occur between the charged residues. Combinations: Arg-Asp, Arg-Glu, Lys-Asp and Lys-Glu Dipole-Dipole interactions can occur eg.
between Asn-Thr or Ser- Gln (many more combinations possible).
Hydrogen bonds
Proteins DNA
Examples from macro-molecules:
Hydrogen bonds occur when one hydrogen is shared between two atoms (mostly O and N atoms) . One atom “donates” the hydrogen while the other “accepts “ it.
The hydrogen bond is the strongest when it is in a straight line.
Hydrophobic interactions
In proteins this means that the protein folds such that a core arises in which hydrophobic residues are buried.
This is one major force in why proteins do fold. It is based on the fact that apolar and polar molecules do not like to mix, e.g. water-oil mixtures do not mix. The hydrophobic effect is really an entropy phenomenon. By clustering the hydrophobic molecules together there are less ordered water molecules.
Secondary structure
The main driving force behind protein folding is to pack hydrophobic residues into the interior of the protein thereby creating a hydrophobic core.
Problem: the backbone of an amino acid contains some highly polar atoms: O and N.
These atoms have to be ”neutralized”
Neutralization is achieved by formation of hydrogen bonds, the O is an acceptor, while the N is a donator.
Secondary structure is an elegant way for the protein to bury the polar peptide bond in the protein interior.
There are two types of secondary structure: alpha helices and beta sheets
Alpha (α) helices
α-helixes are found in proteins when consecutive residues all have φ/ψ angles of approximately -60° and -50°. This gives rise to helix formation. The α-helix is right-handed and has 3.6 residues per turn and there is a rise of 1.5Å per residue.In proteins α-helices are between 4 to 5 residues up to over 40 residues long with an average length of 10 residues (15Å).
Hydrogen bonding pattern in an α-helix
In the α-helix a very regular pattern of hydrogen bonds is formed. Hydrogen bonds are formed between the C=O of residue n and the NH of residue n+4. Therefore all these polar atoms are joined through hydrogen bonds. Exceptions are the NH atoms at the beginning of the helix and the O atoms at the end of the helix. The ends of the helix are polar and are found most often at the surface of the protein.
Amphipatic α-helix
A very common position for an α-helix is on the surface of the protein. This means that one side of the helix points towards the solution and the other side towards the hydrophobic core. There are 3.6 residues per turn: patterns arise where residues change from hydrophobic to hydrophilic every 3 to 4 residues. The helix is polar on one side and hydrophobic on the other: amphipatic.A way to look at sequences in an helix is to use an helical wheel representation : This is a projection of the residues on a plane perpendicular to the axis of the helix
Goto King: Motif
Connecting helices: Helix-turn-helix motif
DNA-binding motif Ca-binding motif
The Ca-binding motif Residue conservation within the Ca2+-binding motif
Four-helix bundle Examples of 4-helix bundles:
The globin domain Beta (β)-strands
The second major type of secondary structure is β-sheets. In contrast with α-helices these are not built from continous stretches of sequence but from a combination of several regions of the polypeptide chain. These regions are between 5 to 10 residues long and the residues are in a full extended conformation with φ/ψ angles of around -135/135°. This is called a β-strand.
The β-strands are aligned adjacent to each other so that hydrogen bonds can be formed between the C=O groups from one strand and the N atoms from another strand. The sheets that are formed are pleated: Ca atoms are alternatively a little above and a little beneath the plane of the β-sheet.
There are two alignments possible: parallel and anti-parallel.
β-sheets parallel & antiparallel.
A sheet is called parallel if the amino-acids in the strands run all in the same biochemical direction (amino-terminal to carboxyl-terminal). If the strands have an alternating pattern N --> C and then C--> N etc then it is an antiparallel sheet.
Hydrogen bonding in β-sheets.The hydrogen bonding pattern is quite different between parallel and antiparallel sheets. In the antiparallel sheet there are narrowly spaced hydrogen bonds alternating with more widely spaced. The parallel sheet has more evenly spread hydrogen bonds.
Mixed β-sheets
Almost all β-sheets ( whatever type) have their strands twisted: this twist has always the same handedness: right-handed twist.
β-sheets can also have a mixed character: partially parallel and antiparallel: mixed β-sheets. These are the most common β-sheets in proteins.
Goto King: Motifs
Loops and turnsMost proteins are built from several secondary structure elements which are linked to each other by loop regions. These loop regions differ in size and shape. The main chain C=O and N atoms are not interacting with each each other through hydrogen bonds, instead they are exposed. This is one reason that loops are often found on the surface of proteins so that these atoms can make hydrogen bonds with water molecules. Often charged and polar residues are used in loops.Some loops (especially in antiparallel β-sheets) are quite common: they are called hairpin loops.
How to represent protein structures?
In order to obtain most information from pictures about protein structure we need to simplify. We use schematic cartoons for doing that
Topology diagramIn order to have an overview of all the secondary structure elements and the order in
which they appear in a protein, simple schematic drawings have been developed. These are called topology files. In these β-strands are represented by arrows and α-
helices by cylinders.
β-sheet topology diagrams
Antiparallel β-sheet in aspartatetranscarbamoylase
Antiparallel barrel in plastocyaninParallel β-sheet in flavodoxin
Tertiary structure: motifs
Some simple combinations of secondary structure elements occur in many different proteins. These can exist
out of e.g. two helices connected with a loop or two β-strands and a helix. These combinations have been called
supersecondary structure or motifs.
Some of these motifs have a particular function (e.g. DNA binding) but others seem to have no biological role but are
used as building blocks.
Greek Key motif
This motif occurs in proteins with 4 adjacent anti-parallel β-strands. Since the topology file resembles an ornamental pattern used in ancient Greece it was called Greek Key.
This motif is structural and no specific function is associated with it.
The eight strands in γ-crystallin are arranged in two Greek key motifs
βαβ motif
For antiparallel β-sheets we can link the strands with small loops (quite often hairpins), however for parallel β-sheets we need longer loops or cross-over segments. These segments are frequently ,made by α-helices. The whole unit looks the like β-strand - loop - α-helix - loop -β-strand. This is called the βαβ motif. The loops in this motif can differ in length (from only several residues to nearly 100) and contain more secondary structure elements. The element can have two hands (helix under strands or above) but the latter is much much more common.
Adding βαβ motifs together:
Two ways to join the units together, giving:
open twisted α−β structureα−β barrels
Three main types of structure based on βαβ motifs
Closed barrelTriosephosphate isomerase
Open twisted β-sheetAlcohol dehydrogenase
Open barrelRibonuclease inhibitor
The active site in all α/β barrels is in a pocket formed by the loop regions that connect the carboxy ends of the β strands with the adjacent a helices
A view from the top of thebarrel of the active site of the enzyme RuBisCo (ribulose bisphosphate carboxylase)
Motifs are used as building blocks.
Motifs and secondary structure elements are used as a kind of Lego blocks to form 3-dimensional structures. If the resulting structure can fold independently it is called a
domain.
Fatty acid binding protein: beta barrel + helix-loop-helix
Lac-repressor: many motifs e.g. helix-loop-helix and 4 helix bundle
Large polypeptide chains fold into several domains
Large polypeptide chains often fold into several domains. Often these domains are also units of function. E.g a DNA binding domain, a catalytic domain, an interaction
domain etc. Certain domain folds are used in many different proteins.
Classes of structures
In general all proteins structures can be placed into three groups:
all α-helical proteins all β-sheet proteins α/β proteins
α-domain structures
Many different types of structures can be formed by α-helices alone. The first protein structures (myoglobin and hemoglobin) solved had only α-helices. Their
fold is called globin-fold.
Hemoglobin
α-domain structures
The helices in an all-helical domain can be packed in almost parallel manner. This gives rise to two different types of packing: 4 helix bundles or large arrangements.
All β-structures
Up-and-down barrelRetinol binding protein
Up-and-down sheet Propeller-like fold Influenza neuraminidase
All-β structures are predominantly antiparallel (no helicesto make crossovers) and consist of packed sheets
Superoxide dismutase (SOD) comprises eight antiparallel β-strands All β-structures (2)
Jelly-roll barrel(2 x Greek key)Viral coat proteins
β-helixPectate lyase
α/β and α+β structures
These are the most common structures found. They consist of a central sheet (mixed or parallel) surrounded by α-helices (α/β) or segregated α and β regions. There are many variations in these classes (e.g. see how
βαβ-units pack). Often the secondary structure elements provide structural strength while loops are involved in the function of the protein.
α+β Lysozyme
α/β tyrosyl-tRNAtransferase
Protein structure universe
Membrane proteins
Membrane proteins account for up to two thirds of known druggable targets.Especially receptors (G-coupled receptors, GPCR’s) and ion-channels are important targets. Structural information still limited but growing.
Four different ways in which protein molecules may be bound to a membrane.
General topology
Membrane anchor by one transmembrane helix
α-helical integral membrane protein
β-sheet integral membrane protein Membrane anchored protein
by amphiphilic helices
Hydropathy plots can be used for predicting transmembrane helices:
Plots for the polypeptide chains L and M of the reaction centerThe photosynthetic reaction center of a purple bacterium,
nobel-price in 1988
First high resolution structural information for a membrane protein
GPCR: Rhodopsin
7 membrane spanning helices
First structural information by EMHigh resolution structures by X-ray diffraction
Low resolution modelsHigh resolution model
Ion-channel: The potassium channel.
Viewed perpendicular to the plane of the membrane
The way the selectivity filter is formed: Main-chain atoms line the walls of this narrow passage with carbonyl oxygen atoms pointing into the pore, forming binding sites for K+ ions.
16 β-strands form an antiparallel β barrel that traverses the membrane.
Example all β integral membrane protein: porin
Molecular structure of a porin. Central channel allows passage of molecules across the membrane
Example membrane-anchored by amphipathic helices: cyclo-oxygenase
Important enzyme in the prostaglandin pathway. Aspirin targets this enzyme.
If you would like to more more about protein structure:
Anders Liljas et al, : Textbook of Structural Biology, ISBN-10: 9812772081
Greg Petsko & Dagmar Ringe: Protein Structure and Function
Carl Brändén & John Tooze: Introduction to Protein StructurePhilip Bourne & Helge Weissig: Structural Bioinformatics in
particular chapter 22 & 23
http://kinemage.biochem.duke.edu/teaching/anatax/index.html
At LU: Department Of Biochemistry and Structural Biology.