bioinf1 protein structure - lunds universitet · to helix formation. the α-helix is right-handed...

Protein Structure

Principles & Architecture

Marjolein ThunnissenDep. of Biochemistry & Structural BiologyLund UniversitySeptember 2011

Bioinformatics

Homology, pattern and 3D structure searches need databases and database managing tools, search technique and dedicated tools for sequence and structure comparison and detection of similarity, for homology modelling etc.All this is the object of bioinformatics

Why use bioinformatics?

•An explosive growth in the amount of biological information •A more global perspective in experimental design.•Data-mining.

• The potential for uncovering phylogenetic relationships and evolutionary patterns.

Role of (bio)informatics in drug discovery

Genome Gene Protein HTS Hit Lead Candidate drug

Genomics Bioinformatics Structural bioinformatics Chemoinformatics Structure-Based Drug Design

ADMET Modelling

Structural bioinformatics techniques are valuable in areas from target identification to lead discovery

Why study 3D structures of biological macromolecules?

1. FUNCTION IS STRUCTURE!

2. Sequence homology is not enough to identify functional relationships.

3. Protein folding is still not fully understood. Predictions do not work satisfactory.

4. Drug design. Pharmaceutical industry

Proteins are polymersProteins are formed by a chain of repeating molecules. One such molecule is called an amino-acid. There are 20 types of amino-acids but they have all a

common backbone or main-chain:

The protein chain is formed by linking the amino-acids together. The linkage is called the peptide bond:

The chain of amino-acids linked to each other by peptide bonds is also called: polypeptide chain.

In DNA code: 20 different amino-acids.

In proteins 20 different amino-acids are found. The names of the different aminoacids can be given as a 3 letter code or a 1 letter code: Alanine ——> Ala ——> A

The amino-acids can be divided into sub-groups dependent on the nature of their side-chain.

Group1 Hydrophobic Ala (A), Val (V), Leu (L), Ile (I), Phe (F), Pro (P) and Met (M) Group2 Charged Asp (D), Glu (E), Arg (R), Lys (K) Group3 Polar Ser (S), Thr (T), Cys (C), Asn (N), Gln (Q), His (H), Tyr (Y) and Trp (W)Group4 No special properties Gly (G)

Alternatively there is also a 5th group:Group 5 Aromatic rings Phe (F), Tyr (Y), Trp (W) and His (H)

The 20 amino-acids: hydrophobic residues

Alanine (Ala, A) Valine (Val, V) Proline (Pro, P)

Isoleucine (Ile, I)

Leucine (Leu, L)

Phenylalanine (Phe, F) Methionine (Met, M)

The 20 amino-acids: charged residues

Arginine (Arg, R) Lysine (Lys, K)

Aspartic acid (Asp, D) Glutamic acid (Glu, E)

The 20 amino-acids: polar residues

Serine (Ser, S) Threonine (Thr T) Tyrosine (Tyr, Y) Histidine (His, H)

Cysteine (Cys, C) Asparagine (Asn, N) Glutamine (Gln, Q) Tryptophan (Trp, W)

The 20 amino-acids: Glycine

Glycine, (Gly, G)

Structure in four dimensionsDue to the fact that there are 20 different amino-acids, proteins are described in different dimensions.

Primary Structure Amino-acid sequence.Secondary Structure Local regular structure: α-helices and β-sheets.Tertiary Structure Packing of secondary structure into one or several compact globular domains Quaternary Structure The overall if the protein exists out of several polypeptide chains.

Special properties of amino-acids

Since there are 4 different groups attached to the central Cα atom of an amino-acid (except for Glycine), it is an asymmetric atom.

Amino acids are therefore chiral molecules. There are two forms: L-form and D-form:

The natural configuration of amino acids in proteins is always the L-form.

Cysteines can form cross-links

Cysteine residues from different parts of the sequence can link together in a disulfide-bridge to form cross-links. The environment needs to be oxidative, within the cell the environment is reductive: cross-bridges are not often seen. Quite normal for extra-

cellular proteins.

These cross-links give extra stability to a protein structure. They can also link two polypeptide chains together.

Properties of the peptide bond

The peptide bond unit containing the atoms Cn, On, Nn+1 is a rigid plane with bond lengths and angles nearly the same for each of these units in a polypeptide chain.

The freedom in conformation of this chain comes from rotating around the bonds between Nn+1 - Cαn+1 and Cαn+1 -Cn+1

Phi-Psi angles

The rotation around N- Cα is called phi (φ) and the angle around Cα-C is called psi (ψ). Each amino acid is associated with these two conformational angles. If phi and psi for each residue is known: conformation of the whole backbone-chain is known since the peptide planes are so rigid.

Goto King Basic

Ramachandran plot

Most combinations of φ/ψ are not allowed since they would cause steric collisions between side chains and main chain (kinemage).

The φ/ψ pairs can be plotted against each other. Such a plot is called a

Ramachandran plot. The residues will cluster in certain areas. These areas are called after the secondary structure the residues have.

Ramachandran plot of barnase

Glycine residuesGlycine residues lack a side chain. Therefore they can have a much wide range of conformations then other residues. Glycines are used a lot to be able to have unusual main chain conformations (like a tight turn).

Low and high energy conformation (allowed and disallowed):

Certain side chain conformations are energetically more favourable than others: these are more frequently seen in proteins. These

conformers are called rotamers.

Rotamers for Phe

Go to King Basic no 4

Forces holding proteins together

Electrostatic interactions Ionic interactions e.g. salt bridges Dipolar interactions dipole-dipole induced dipoleHydrogen bondsshared H-atomHydrophobic packingmainly entropic

Salt bridges and polar interactions

Ionic interactions occur either between fully charged groups (ionic), or between partially charged groups (dipole-dipole).

The force of attraction between δ+ and δ- decreases rapidly with distance. In the absence of water these interactions can be very strong.

In protein molecules ionic bonds occur between the charged residues. Combinations: Arg-Asp, Arg-Glu, Lys-Asp and Lys-Glu Dipole-Dipole interactions can occur eg.

between Asn-Thr or Ser- Gln (many more combinations possible).

Hydrogen bonds

Proteins DNA

Examples from macro-molecules:

Hydrogen bonds occur when one hydrogen is shared between two atoms (mostly O and N atoms) . One atom “donates” the hydrogen while the other “accepts “ it.

The hydrogen bond is the strongest when it is in a straight line.

Hydrophobic interactions

In proteins this means that the protein folds such that a core arises in which hydrophobic residues are buried.

This is one major force in why proteins do fold. It is based on the fact that apolar and polar molecules do not like to mix, e.g. water-oil mixtures do not mix. The hydrophobic effect is really an entropy phenomenon. By clustering the hydrophobic molecules together there are less ordered water molecules.

Secondary structure

The main driving force behind protein folding is to pack hydrophobic residues into the interior of the protein thereby creating a hydrophobic core.

Problem: the backbone of an amino acid contains some highly polar atoms: O and N.

These atoms have to be ”neutralized”

Neutralization is achieved by formation of hydrogen bonds, the O is an acceptor, while the N is a donator.

Secondary structure is an elegant way for the protein to bury the polar peptide bond in the protein interior.

There are two types of secondary structure: alpha helices and beta sheets

Alpha (α) helices

α-helixes are found in proteins when consecutive residues all have φ/ψ angles of approximately -60° and -50°. This gives rise to helix formation. The α-helix is right-handed and has 3.6 residues per turn and there is a rise of 1.5Å per residue.In proteins α-helices are between 4 to 5 residues up to over 40 residues long with an average length of 10 residues (15Å).

Hydrogen bonding pattern in an α-helix

In the α-helix a very regular pattern of hydrogen bonds is formed. Hydrogen bonds are formed between the C=O of residue n and the NH of residue n+4. Therefore all these polar atoms are joined through hydrogen bonds. Exceptions are the NH atoms at the beginning of the helix and the O atoms at the end of the helix. The ends of the helix are polar and are found most often at the surface of the protein.

Amphipatic α-helix

A very common position for an α-helix is on the surface of the protein. This means that one side of the helix points towards the solution and the other side towards the hydrophobic core. There are 3.6 residues per turn: patterns arise where residues change from hydrophobic to hydrophilic every 3 to 4 residues. The helix is polar on one side and hydrophobic on the other: amphipatic.A way to look at sequences in an helix is to use an helical wheel representation : This is a projection of the residues on a plane perpendicular to the axis of the helix

Goto King: Motif

Connecting helices: Helix-turn-helix motif

DNA-binding motif Ca-binding motif

The Ca-binding motif Residue conservation within the Ca2+-binding motif

Four-helix bundle Examples of 4-helix bundles:

The globin domain Beta (β)-strands

The second major type of secondary structure is β-sheets. In contrast with α-helices these are not built from continous stretches of sequence but from a combination of several regions of the polypeptide chain. These regions are between 5 to 10 residues long and the residues are in a full extended conformation with φ/ψ angles of around -135/135°. This is called a β-strand.

The β-strands are aligned adjacent to each other so that hydrogen bonds can be formed between the C=O groups from one strand and the N atoms from another strand. The sheets that are formed are pleated: Ca atoms are alternatively a little above and a little beneath the plane of the β-sheet.

There are two alignments possible: parallel and anti-parallel.

β-sheets parallel & antiparallel.

A sheet is called parallel if the amino-acids in the strands run all in the same biochemical direction (amino-terminal to carboxyl-terminal). If the strands have an alternating pattern N --> C and then C--> N etc then it is an antiparallel sheet.

Hydrogen bonding in β-sheets.The hydrogen bonding pattern is quite different between parallel and antiparallel sheets. In the antiparallel sheet there are narrowly spaced hydrogen bonds alternating with more widely spaced. The parallel sheet has more evenly spread hydrogen bonds.

Mixed β-sheets

Almost all β-sheets ( whatever type) have their strands twisted: this twist has always the same handedness: right-handed twist.

β-sheets can also have a mixed character: partially parallel and antiparallel: mixed β-sheets. These are the most common β-sheets in proteins.

Goto King: Motifs

Loops and turnsMost proteins are built from several secondary structure elements which are linked to each other by loop regions. These loop regions differ in size and shape. The main chain C=O and N atoms are not interacting with each each other through hydrogen bonds, instead they are exposed. This is one reason that loops are often found on the surface of proteins so that these atoms can make hydrogen bonds with water molecules. Often charged and polar residues are used in loops.Some loops (especially in antiparallel β-sheets) are quite common: they are called hairpin loops.

How to represent protein structures?

In order to obtain most information from pictures about protein structure we need to simplify. We use schematic cartoons for doing that

Topology diagramIn order to have an overview of all the secondary structure elements and the order in

which they appear in a protein, simple schematic drawings have been developed. These are called topology files. In these β-strands are represented by arrows and α-

helices by cylinders.

β-sheet topology diagrams

Antiparallel β-sheet in aspartatetranscarbamoylase

Antiparallel barrel in plastocyaninParallel β-sheet in flavodoxin

Tertiary structure: motifs

Some simple combinations of secondary structure elements occur in many different proteins. These can exist

out of e.g. two helices connected with a loop or two β-strands and a helix. These combinations have been called

supersecondary structure or motifs.

Some of these motifs have a particular function (e.g. DNA binding) but others seem to have no biological role but are

used as building blocks.

Greek Key motif

This motif occurs in proteins with 4 adjacent anti-parallel β-strands. Since the topology file resembles an ornamental pattern used in ancient Greece it was called Greek Key.

This motif is structural and no specific function is associated with it.

The eight strands in γ-crystallin are arranged in two Greek key motifs

βαβ motif

For antiparallel β-sheets we can link the strands with small loops (quite often hairpins), however for parallel β-sheets we need longer loops or cross-over segments. These segments are frequently ,made by α-helices. The whole unit looks the like β-strand - loop - α-helix - loop -β-strand. This is called the βαβ motif. The loops in this motif can differ in length (from only several residues to nearly 100) and contain more secondary structure elements. The element can have two hands (helix under strands or above) but the latter is much much more common.

Adding βαβ motifs together:

Two ways to join the units together, giving:

open twisted α−β structureα−β barrels

Three main types of structure based on βαβ motifs

Closed barrelTriosephosphate isomerase

Open twisted β-sheetAlcohol dehydrogenase

Open barrelRibonuclease inhibitor

The active site in all α/β barrels is in a pocket formed by the loop regions that connect the carboxy ends of the β strands with the adjacent a helices

A view from the top of thebarrel of the active site of the enzyme RuBisCo (ribulose bisphosphate carboxylase)

Motifs are used as building blocks.

Motifs and secondary structure elements are used as a kind of Lego blocks to form 3-dimensional structures. If the resulting structure can fold independently it is called a

domain.

Fatty acid binding protein: beta barrel + helix-loop-helix

Lac-repressor: many motifs e.g. helix-loop-helix and 4 helix bundle

Large polypeptide chains fold into several domains

Large polypeptide chains often fold into several domains. Often these domains are also units of function. E.g a DNA binding domain, a catalytic domain, an interaction

domain etc. Certain domain folds are used in many different proteins.

Classes of structures

In general all proteins structures can be placed into three groups:

all α-helical proteins all β-sheet proteins α/β proteins

α-domain structures

Many different types of structures can be formed by α-helices alone. The first protein structures (myoglobin and hemoglobin) solved had only α-helices. Their

fold is called globin-fold.

Hemoglobin

α-domain structures

The helices in an all-helical domain can be packed in almost parallel manner. This gives rise to two different types of packing: 4 helix bundles or large arrangements.

All β-structures

Up-and-down barrelRetinol binding protein

Up-and-down sheet Propeller-like fold Influenza neuraminidase

All-β structures are predominantly antiparallel (no helicesto make crossovers) and consist of packed sheets

Superoxide dismutase (SOD) comprises eight antiparallel β-strands All β-structures (2)

Jelly-roll barrel(2 x Greek key)Viral coat proteins

β-helixPectate lyase

α/β and α+β structures

These are the most common structures found. They consist of a central sheet (mixed or parallel) surrounded by α-helices (α/β) or segregated α and β regions. There are many variations in these classes (e.g. see how

βαβ-units pack). Often the secondary structure elements provide structural strength while loops are involved in the function of the protein.

α+β Lysozyme

α/β tyrosyl-tRNAtransferase

Protein structure universe

Membrane proteins

Membrane proteins account for up to two thirds of known druggable targets.Especially receptors (G-coupled receptors, GPCR’s) and ion-channels are important targets. Structural information still limited but growing.

Four different ways in which protein molecules may be bound to a membrane.

General topology

Membrane anchor by one transmembrane helix

α-helical integral membrane protein

β-sheet integral membrane protein Membrane anchored protein

by amphiphilic helices

Hydropathy plots can be used for predicting transmembrane helices:

Plots for the polypeptide chains L and M of the reaction centerThe photosynthetic reaction center of a purple bacterium,

nobel-price in 1988

First high resolution structural information for a membrane protein

GPCR: Rhodopsin

7 membrane spanning helices

First structural information by EMHigh resolution structures by X-ray diffraction

Low resolution modelsHigh resolution model

Ion-channel: The potassium channel.

Viewed perpendicular to the plane of the membrane

The way the selectivity filter is formed: Main-chain atoms line the walls of this narrow passage with carbonyl oxygen atoms pointing into the pore, forming binding sites for K+ ions.

16 β-strands form an antiparallel β barrel that traverses the membrane.

Example all β integral membrane protein: porin

Molecular structure of a porin. Central channel allows passage of molecules across the membrane

Example membrane-anchored by amphipathic helices: cyclo-oxygenase

Important enzyme in the prostaglandin pathway. Aspirin targets this enzyme.

If you would like to more more about protein structure:

Anders Liljas et al, : Textbook of Structural Biology, ISBN-10: 9812772081

Greg Petsko & Dagmar Ringe: Protein Structure and Function

Carl Brändén & John Tooze: Introduction to Protein StructurePhilip Bourne & Helge Weissig: Structural Bioinformatics in

particular chapter 22 & 23

http://kinemage.biochem.duke.edu/teaching/anatax/index.html

At LU: Department Of Biochemistry and Structural Biology.

bioinf1 protein structure - lunds universitet · to helix formation. the α-helix is right-handed...

Documents