protein structure prediction: the holy grail of bioinformatics

97
Protein structure prediction: The holy grail of bioinformatics

Upload: betty-oliver

Post on 16-Jan-2016

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Protein structure prediction: The holy grail of bioinformatics

Protein structure prediction:The holy grail of bioinformatics

Page 2: Protein structure prediction: The holy grail of bioinformatics

Proteins: Four levels of structural Proteins: Four levels of structural organization:organization:

Primary structurePrimary structure

Secondary structureSecondary structure

Tertiary structureTertiary structure

Quaternary structureQuaternary structure

Page 3: Protein structure prediction: The holy grail of bioinformatics

Primary structure = the linear amino acid sequence

Page 4: Protein structure prediction: The holy grail of bioinformatics

Secondary structure = spatial arrangement of amino-acid residues that are adjacent in the primary structure

Page 5: Protein structure prediction: The holy grail of bioinformatics

helix = A helical structure, whose chain coils tightly as a right-handed screw with all the side chains sticking outward in a helical array. The tight structure of the helix is stabilized by same-strand hydrogen bonds between NH groups and CO groups spaced at four amino-acid residue intervals.

Page 6: Protein structure prediction: The holy grail of bioinformatics

The -pleated sheet is made of loosely coiled strands are stabilized by hydrogen bonds between -NH and -CO groups from adjacent strands.

Page 7: Protein structure prediction: The holy grail of bioinformatics

An antiparallel β sheet. Adjacent β strands run in opposite directions. Hydrogen bonds between NH and CO groups connect each amino acid to a single amino acid on an adjacent strand, stabilizing the structure.

Page 8: Protein structure prediction: The holy grail of bioinformatics

A parallel β sheet. Adjacent β strands run in the same direction. Hydrogen bonds connect each amino acid on one strand with two different amino acids on the adjacent strand.

Page 9: Protein structure prediction: The holy grail of bioinformatics
Page 10: Protein structure prediction: The holy grail of bioinformatics

Silk fibroinSilk fibroin

Page 11: Protein structure prediction: The holy grail of bioinformatics

helix sheet (parallel and antiparallel)tight turnsflexible loopsirregular elements (random coil)

Page 12: Protein structure prediction: The holy grail of bioinformatics

Tertiary structure = three-dimensional structure of protein

Page 13: Protein structure prediction: The holy grail of bioinformatics

The tertiary structure is formed by The tertiary structure is formed by the folding of secondary structures the folding of secondary structures by covalent and non-covalent forces, by covalent and non-covalent forces, such assuch as hydrogen bondshydrogen bonds,, hydrophobic interactionshydrophobic interactions,, salt salt bridgesbridges between positively and between positively and negatively charged residues, as well negatively charged residues, as well asas disulfide bondsdisulfide bonds between pairs of between pairs of cysteines.cysteines.

Page 14: Protein structure prediction: The holy grail of bioinformatics

Quaternary structure = spatial arrangement of subunits Quaternary structure = spatial arrangement of subunits and their contacts.and their contacts.

Page 15: Protein structure prediction: The holy grail of bioinformatics
Page 16: Protein structure prediction: The holy grail of bioinformatics

Prosthetic groupProsthetic group

HoloproteinHoloprotein

Holoproteins & ApoproteinsHoloproteins & Apoproteins

ApoproteinApoprotein

Prosthetic groupProsthetic groupHoloproteinHoloprotein

Page 17: Protein structure prediction: The holy grail of bioinformatics

Apohemoglobin = 2Apohemoglobin = 2 + 2 + 2

Page 18: Protein structure prediction: The holy grail of bioinformatics

Prosthetic groupProsthetic group

HemeHeme

Page 19: Protein structure prediction: The holy grail of bioinformatics

Hemoglobin = Apohemoglobin + 4HemeHemoglobin = Apohemoglobin + 4Heme

Page 20: Protein structure prediction: The holy grail of bioinformatics
Page 21: Protein structure prediction: The holy grail of bioinformatics

Sela M, White FH, & Anfinsen CB. 19591959. The reductive cleavage of disulfide bonds and its application to problems of protein structure. Biochim. Biophys. Acta. 31:417-426.

Christian B. AnfinsenChristian B. Anfinsen1916-19951916-1995

Page 22: Protein structure prediction: The holy grail of bioinformatics

Not all proteins fold independently.Chaperones.

Page 23: Protein structure prediction: The holy grail of bioinformatics
Page 24: Protein structure prediction: The holy grail of bioinformatics
Page 25: Protein structure prediction: The holy grail of bioinformatics
Page 26: Protein structure prediction: The holy grail of bioinformatics

Reducing agents: Ammonium thioglycolate (alkaline) pH 9.0-10Glycerylmonothioglycolate (acid) pH 6.5-8.2

Page 27: Protein structure prediction: The holy grail of bioinformatics

Oxidant

Page 28: Protein structure prediction: The holy grail of bioinformatics

What do we need to know in order to What do we need to know in order to state that the tertiary structure of a state that the tertiary structure of a

protein has been solved?protein has been solved?

Ideally: We need to determine the position of all Ideally: We need to determine the position of all atoms and their connectivity.atoms and their connectivity.

Less Ideally: We need to determine the position Less Ideally: We need to determine the position of all Cof all Cbackbone structure).backbone structure).

Page 29: Protein structure prediction: The holy grail of bioinformatics

Protein structure: Limitations and caveats

• Not all proteins or parts of proteins assume a well-defined 3D structure in solution.

• Protein structure is not static, there are various degrees of thermal motion for different parts of the structure.

• There may be a number of slightly different conformations in solution.

• Some proteins undergo conformational changes when interacting with STUFF.

Page 30: Protein structure prediction: The holy grail of bioinformatics

Experimental Protein Structure Experimental Protein Structure DeterminationDetermination

• X-ray crystallography X-ray crystallography – most accuratemost accurate– in vitroin vitro– needs crystalsneeds crystals– ~$100-200K per structure~$100-200K per structure

• NMR NMR – fairly accuratefairly accurate– in vivoin vivo– no need for crystalsno need for crystals– limited to very small proteinslimited to very small proteins

• Cryo-electron-microscopyCryo-electron-microscopy– imaging technologyimaging technology– low resolutionlow resolution

Page 31: Protein structure prediction: The holy grail of bioinformatics

Why predict protein structure?

• Structural knowledge = some understanding of function and mechanism of action

• Predicted structures can be used in structure-based drug design

• It can help us understand the effects of mutations on structure and function

• It is a very interesting scientific problem (still unsolved in its most general form after more than 50 years of effort)

Page 32: Protein structure prediction: The holy grail of bioinformatics

Secondary structure prediction

Page 33: Protein structure prediction: The holy grail of bioinformatics

• Historically first structure prediction methods predicted secondary structure

• Can be used to improve alignment accuracy

• Can be used to detect domain boundaries within proteins with remote sequence homology

• Often the first step towards 3D structure prediction

• Informative for mutagenesis studies

Secondary structure prediction

Page 34: Protein structure prediction: The holy grail of bioinformatics

Protein Secondary Structures (Simplifications)

COIL (everything else)COIL (everything else)

-STRAND-STRAND

-HELIX-HELIX

Page 35: Protein structure prediction: The holy grail of bioinformatics

Assumptions• The entire information for forming secondary structure is

contained in the primary sequence

• side groups of residues will determine structure

• examining windows of 13-17 residues is sufficient to predict secondary structure

-helices 5–40 residues long -strands 5–10 residues long

Page 36: Protein structure prediction: The holy grail of bioinformatics

Predicting Secondary Structure From Primary Structure

• accuracy 64-75%• higher accuracy for -helices than for

sheets• accuracy is dependent on protein family• predictions of engineered (artificial) proteins

are less accurate

Page 37: Protein structure prediction: The holy grail of bioinformatics

A surprising result!

ChameleonChameleonsequencessequences

Page 38: Protein structure prediction: The holy grail of bioinformatics

The “Chameleon” sequence

TEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEK

TEAVDAWTVEKAFKTFANDNGVDGAWTVEKAFKTFTVTEK

sequence 1 sequence 2

Replace both sequences withan engineered peptide (“chameleon”)

Source: Minor and Kim. 1996. Nature 380:730-734

-helix -strand

Page 39: Protein structure prediction: The holy grail of bioinformatics

Measures of prediction accuracy

• Qindex and Q3

• Correlation coefficient

Page 40: Protein structure prediction: The holy grail of bioinformatics

Qindex

Qindex: (Qhelix, Qstrand, Qcoil, Q3) - percentage of residues correctly predicted as -helix, -strand,

coil, or for all 3 conformations.

Drawbacks:- even a random assignment of structure can achieve a high score

(Holley & Karpus 1991)

1003 observed

predicted

N

NQ

Page 41: Protein structure prediction: The holy grail of bioinformatics

Correlation coefficient

True positive

pa

False positive

(overpredicted)

oa

True negative

na

False negative

(underpredicted)

ua

])][][[]([

opuponun

ounpC

C= 1 (=100%)

Page 42: Protein structure prediction: The holy grail of bioinformatics

Methods of secondary structure prediction

Page 43: Protein structure prediction: The holy grail of bioinformatics

Chou & Fasman (1974 & 1978) : Some residues have particular secondary-structure

preferences. Based on empirical frequencies of residues in -helices, -sheets, and coils.

Examples: Glu α-helix Val β-strand

First generation methods: single residue statistics

Page 44: Protein structure prediction: The holy grail of bioinformatics

Chou-Fasman methodName P(H) P(E) P(turn) f(i) f(i+1) f(i+2) f(i+3)

Alanine 142 83 66 0.06 0.076 0.035 0.058

Arginine 98 93 95 0.07 0.106 0.099 0.085

Aspartic Acid 101 54 146 0.147 0.11 0.179 0.081

Asparagine 67 89 156 0.161 0.083 0.191 0.091

Cysteine 70 119 119 0.149 0.05 0.117 0.128

Glutamic Acid 151 37 74 0.056 0.06 0.077 0.064

Glutamine 111 110 98 0.074 0.098 0.037 0.098

Glycine 57 75 156 0.102 0.085 0.19 0.152

Histidine 100 87 95 0.14 0.047 0.093 0.054

Isoleucine 108 160 47 0.043 0.034 0.013 0.056

Leucine 121 130 59 0.061 0.025 0.036 0.07

Lysine 114 74 101 0.055 0.115 0.072 0.095

Methionine 145 105 60 0.068 0.082 0.014 0.055

Phenylalanine 113 138 60 0.059 0.041 0.065 0.065

Proline 57 55 152 0.102 0.301 0.034 0.068

Serine 77 75 143 0.12 0.139 0.125 0.106

Threonine 83 119 96 0.086 0.108 0.065 0.079

Tryptophan 108 137 96 0.077 0.013 0.064 0.167

Tyrosine 69 147 114 0.082 0.065 0.114 0.125

Valine 106 170 50 0.062 0.048 0.028 0.053

Page 45: Protein structure prediction: The holy grail of bioinformatics

Amino Acid P P Pt Glu 1.51 0.37 0.74 Met 1.45 1.05 0.60 Ala 1.42 0.83 0.66 Val 1.06 1.70 0.50 Ile 1.08 1.60 0.50 Tyr 0.69 1.47 1.14 Pro 0.57 0.55 1.52 Gly 0.57 0.75 1.56

Page 46: Protein structure prediction: The holy grail of bioinformatics

Chou-Fasman Method

• Accuracy: Q3 = 50-60%

Page 47: Protein structure prediction: The holy grail of bioinformatics

Second generation methods: segment statistics

• Similar to single-residue methods, but incorporating additional information (adjacent residues, segmental statistics).

• Problems:– Low accuracy - Q3 below 66% (results).– Q3 of -strands (E) : 28% - 48%.– Predicted structures were too short.

Page 48: Protein structure prediction: The holy grail of bioinformatics

The GOR method

• developed by Garnier, Osguthorpe & Robson• build on Chou-Fasman Pij values• evaluate each residue PLUS adjacent 8 N-

terminal and 8 carboxyl-terminal residues • sliding window of 17 residues• underpredicts -strand regions• GOR method accuracy Q3 = ~64%

Page 49: Protein structure prediction: The holy grail of bioinformatics

Third generation methods

• Third generation methods reached 77% accuracy.• They consist of two new ideas:

1. A biological idea –

Using evolutionary information based on conservation analysis of multiple sequence alignments.

2. A technological idea –

Using neural networks.

Page 50: Protein structure prediction: The holy grail of bioinformatics

Artificial Neural NetworksAn attempt to imitate the human brain (assuming that this is the way it works).

Page 51: Protein structure prediction: The holy grail of bioinformatics

Neural network models

- machine learning approach - provide training sets of structures (e.g. -helices, non

-helices)- computers are trained to recognize patterns in known

secondary structures- provide test set (proteins with known structures)

- accuracy ~ 70 –75%

Page 52: Protein structure prediction: The holy grail of bioinformatics

Reasons for improved accuracy

• Align sequence with other related proteins of the same protein family

• Find members that has a known structure

• If significant matches between structure and sequence assign secondary structures to corresponding residues

Page 53: Protein structure prediction: The holy grail of bioinformatics

New and Improved Third-Generation Methods

Exploit evolutionary information. Based on conservation analysis of multiple sequence alignments.

• PHD (Q3 ~ 70%)

Rost B, Sander, C. (1993) J. Mol. Biol. 232, 584-599.

• PSIPRED (Q3 ~ 77%)

Jones, D. T. (1999) J. Mol. Biol. 292, 195-202.Arguably remains the top secondary structure prediction method(won all CASP competitions since 1998).

Page 54: Protein structure prediction: The holy grail of bioinformatics

Secondary Structure PredictionSummary

1st Generation - 1970s• Q3 = 50-55%• Chou & Fausman, GOR

2nd Generation -1980s• Q3 = 60-65%• Qian & Sejnowski, GORIII

3rd Generation - 1990s• Q3 = 70-80%• PhD, PSIPRED

Many 3rd+ generation methods exist: PSI-PRED - http://bioinf.cs.ucl.ac.uk/psipred/ JPRED - http://www.compbio.dundee.ac.uk/~www-jpred/ PHD - http://www.embl-heidelberg.de/predictprotein/predictprotein.html NNPRED - http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html

Page 55: Protein structure prediction: The holy grail of bioinformatics

The sequence-structure gapThe sequence-structure gap

More than 13,137,813 known protein sequences, 76,495 experimentally determined structures.

Page 56: Protein structure prediction: The holy grail of bioinformatics

2000040000

6000080000

100000120000

140000160000

0

Seq

uen

ces

Seq

uen

ces S

tructu

resS

tructu

res

180000200000

The gap is getting biggerThe gap is getting bigger..The sequence-structure gapThe sequence-structure gap

Page 57: Protein structure prediction: The holy grail of bioinformatics

Protein Secondary Structures (Simplifications)

COIL (everything else)COIL (everything else)

-STRAND-STRAND

-HELIX-HELIX

Page 58: Protein structure prediction: The holy grail of bioinformatics

Beyond Secondary StructureBeyond Secondary StructureBefore Tertiary StructureBefore Tertiary Structure

Supersecondary structures (motifs): small, discrete, commonly observed aggregates of secondary structures

helix-loop-helix

Domains: independent units of structure barrel four-helix bundle

The terms “domain” and “motif” are The terms “domain” and “motif” are

sometimes used interchangeably.sometimes used interchangeably.

Page 59: Protein structure prediction: The holy grail of bioinformatics

Helix-loop-helixHelix-loop-helix

Page 60: Protein structure prediction: The holy grail of bioinformatics

Beyond Secondary StructureBeyond Secondary StructureBefore Tertiary StructureBefore Tertiary Structure

Folds: Compact folding arrangements of a polypeptide chain (a protein or part of a protein).

The terms “domain” and “fold” are The terms “domain” and “fold” are

sometimes used interchangeably.sometimes used interchangeably.

Page 61: Protein structure prediction: The holy grail of bioinformatics

EF Fold

Found in Calcium binding proteins such as Calmodulin

Page 62: Protein structure prediction: The holy grail of bioinformatics

Leucine Zipper

Page 63: Protein structure prediction: The holy grail of bioinformatics

•The beta-alpha-beta-alpha-beta subunit•Often present in nucleotide-binding proteins

Rossman Fold

Page 64: Protein structure prediction: The holy grail of bioinformatics

sandwich barrel

Page 65: Protein structure prediction: The holy grail of bioinformatics

horseshoe

Page 66: Protein structure prediction: The holy grail of bioinformatics

Four helix bundleFour helix bundle

•24 amino acid peptide with a hydrophobic surface•Assembles into 4 helix bundle through hydrophobic regions•Maintains solubility of membrane proteins

Page 67: Protein structure prediction: The holy grail of bioinformatics

TIM Barrel

Page 68: Protein structure prediction: The holy grail of bioinformatics

PDB New Fold Growth

• The number of unique folds in nature is fairly small (possibly a few thousands)

• 90% of new structures submitted to PDB in the past three years have similar structural folds in PDB

New fold

Old fold

Page 69: Protein structure prediction: The holy grail of bioinformatics

Protein data bank

• http://www.rcsb.org/pdb/

Page 70: Protein structure prediction: The holy grail of bioinformatics

Protein 3D structure data: The structure of a protein consists of the 3D (X,Y,Z) coordinates of each non-hydrogen atom of the protein. Some protein structure also include coordinates of covalently linked prosthetic groups, non-covalently linked ligand molecules, or metal ions.For some purposes (e.g. structural alignment) only the Cα coordinates are needed.

Example of PDB format: X Y Z occupancy / temp. factor

ATOM 18 N GLY 27 40.315 161.004 11.211 1.00 10.11ATOM 19 CA GLY 27 39.049 160.737 10.462 1.00 14.18ATOM 20 C GLY 27 38.729 159.239 10.784 1.00 20.75ATOM 21 O GLY 27 39.507 158.484 11.404 1.00 21.88

Note: the PDB format provides no information about connectivity between atoms. The last two numbers (occupancy, temperature factor) relate to disorders of atomic positions in crystals.

Page 71: Protein structure prediction: The holy grail of bioinformatics
Page 72: Protein structure prediction: The holy grail of bioinformatics

Protein structure: Some computational tasksProtein structure: Some computational tasks

• Building a protein structure model from X-ray data

• Building a protein structure model from NMR data

• Computing the energy for a given protein structure (conformation)

• Energy minimization: Finding the structure with the minimal energy according to some empirical “force fields”.

• Simulating the protein folding process (molecular dynamics)

• Structure visualizationStructure visualization

• Computing secondary structure from atomic coordinates

• Protein superposition, structural alignmentProtein superposition, structural alignment

• Protein fold classificationProtein fold classification

• Threading: finding a fold (prototype structure) that fits to a sequenceThreading: finding a fold (prototype structure) that fits to a sequence

• Docking: fitting ligands onto a protein surface by molecular dynamics or energy minimization

• Protein 3D structure prediction from sequenceProtein 3D structure prediction from sequence

Page 73: Protein structure prediction: The holy grail of bioinformatics

Viewing protein structures

When looking at a protein structure, we may ask the following types of questions:

• Is a particular residue on the inside or outside of a protein?• Which amino acids interact with each other?• Which amino acids are in contact with a ligand (DNA, peptide

hormone, small molecule, etc.)?• Is an observed mutation likely to disturb the protein structure?

Standard capabilities of protein structure software:• Display of protein structures in different ways (wireframe, backbone,

sticks, spacefill, ribbon.• Highlighting of individual atoms, residues or groups of residues• Calculation of interatomic distances• Advanced feature: Superposition of related structures

Page 74: Protein structure prediction: The holy grail of bioinformatics

Example: c-abl oncoprotein SH2 domain, display wireframe

Page 75: Protein structure prediction: The holy grail of bioinformatics

Example: c-abl oncoprotein SH2 domain, display sticks

Page 76: Protein structure prediction: The holy grail of bioinformatics

Example: c-abl oncoprotein SH2 domain, display backbone

Page 77: Protein structure prediction: The holy grail of bioinformatics

Example: c-abl oncoprotein SH2 domain, display spacefill

Page 78: Protein structure prediction: The holy grail of bioinformatics

Example: c-abl oncoprotein SH2 domain, display ribbons

Page 79: Protein structure prediction: The holy grail of bioinformatics

Predicting protein 3d structure

Goal: 3d structure from 1d sequence

Fold recognition

Homology modeling

ab-initio

An existing fold

A new fold

Page 80: Protein structure prediction: The holy grail of bioinformatics

Homology modelingBased on the two major observations

(and some simplifications):

1. The structure of a protein is uniquely defined by its amino acid sequence.

2. Similar sequences adopt similar structures. (Distantly related sequences may still fold into similar structures.)

Page 81: Protein structure prediction: The holy grail of bioinformatics

Homology modeling needs three items of input:

• The sequence of a protein with unknown 3D structure, the "target sequence."

• A 3D “template” – a structure having the highest sequence identity with the target sequence ( >30% sequence identity)

• An sequence alignment between the target sequence and the template sequence

Page 82: Protein structure prediction: The holy grail of bioinformatics

Homology Modeling: How it works

o Find template

o Align target sequence with template

o Generate model:- add loops- add sidechains

o Refine model

Page 83: Protein structure prediction: The holy grail of bioinformatics

[Rost, Protein Eng. 1999]

Two zones of homology modeling

Page 84: Protein structure prediction: The holy grail of bioinformatics

Automated Web-Based Homology Modelling

SWISS Model : http://www.expasy.org/swissmod/SWISS-MODEL.html

WHAT IF : http://www.cmbi.kun.nl/swift/servers/

The CPHModels Server : http://www.cbs.dtu.dk/services/CPHmodels/

3D Jigsaw : http://www.bmm.icnet.uk/~3djigsaw/

SDSC1 : http://cl.sdsc.edu/hm.html

EsyPred3D : http://www.fundp.ac.be/urbm/bioinfo/esypred/

Page 85: Protein structure prediction: The holy grail of bioinformatics

Fold recognition = Protein Threading

Which of the known folds is likely to be similar to the (unknown) fold of a new protein when only its amino-acid sequence is known?

Page 86: Protein structure prediction: The holy grail of bioinformatics

Protein Threading• The goal: find the “correct” sequence-structure alignment

between a target sequence and its native-like fold in PDB

• Energy function – knowledge (or statistics) based rather than physics based – Should be able to distinguish correct structural folds from

incorrect structural folds– Should be able to distinguish correct sequence-fold alignment

from incorrect sequence-fold alignments

MTYKLILN …. NGVDGEWTYTE

Page 87: Protein structure prediction: The holy grail of bioinformatics

Protein Threading

• Basic premise

• Statistics from Protein Data Bank (~2,000 structures)

• Chances for a protein to have a structural fold that already exists in PDB are quite good.

The number of unique structural (domain) folds in The number of unique structural (domain) folds in nature is fairly small (possibly a few thousand)nature is fairly small (possibly a few thousand)

90% of new structures submitted to PDB in the past 90% of new structures submitted to PDB in the past three years have similar structural folds in PDB three years have similar structural folds in PDB

Page 88: Protein structure prediction: The holy grail of bioinformatics

Protein Threading

Basic components:– Structure database– Energy function– Sequence-structure alignment algorithm– Prediction reliability assessment

Page 89: Protein structure prediction: The holy grail of bioinformatics

Protein Threading – structure database

• Build a template database

Page 90: Protein structure prediction: The holy grail of bioinformatics

Process

• Threading - A protein fold recognition technique that involves incrementally replacing the sequence of a known protein structure with a query sequence of unknown structure. The new “model” structure is evaluated using a simple heuristic measure of protein fold quality. The process is repeated against all known 3D structures until an optimal fit is found.

Page 91: Protein structure prediction: The holy grail of bioinformatics

Fold recognition methods

• 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/

• Fugue http://www-cryst.bioc.cam.ac.uk/~fugue/

• HHpred http://protevo.eb.tuebingen.mpg.de/toolkit/index.php?view=hhpred

Page 92: Protein structure prediction: The holy grail of bioinformatics

ab-initio foldingGoal: Predict structure from “first principles”Requires:

– A free energy function, sufficiently close to the “true potential”

– A method for searching the conformational space

Advantages:– Works for novel folds– Shows that we understand the process

Disadvantages:– Applicable to short sequences only

Page 93: Protein structure prediction: The holy grail of bioinformatics

Rosetta [Simons et al. 1997]

http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php

Page 94: Protein structure prediction: The holy grail of bioinformatics

Qian et al. (Nature: 2007) used distributed computing* to predict the 3D structure of a protein from its amino-acid sequence. Here, their predicted structure (grey) of a protein is overlaid with the experimentally determined crystal structure (color) of that protein. The agreement between the two is excellent.

*70,000 home computers for about two years.

Page 95: Protein structure prediction: The holy grail of bioinformatics

Protein Sequence

Database SearchingMultiple Sequence

Alignment

Homologuein PDB

HomologyModelling

SecondaryStructurePrediction

No

Yes

3-D Protein Model

FoldRecognition

PredictedFold

Sequence-StructureAlignment

Ab-initioStructurePrediction

No

Yes

Overall Approach

Page 96: Protein structure prediction: The holy grail of bioinformatics

ExPASy Proteomics Server:Expert Protein Analysis System

links to lots of protein prediction resources

http://expasy.org/

Page 97: Protein structure prediction: The holy grail of bioinformatics

RMSDRMSDminmin

The root mean square deviation (RMSD) is the measure of the average distance between the backbones of superimposed proteins. In the study of globular protein conformations, one customarily measures the similarity in three-dimensional structure by the RMSD of the Cα atomic coordinates after optimal rigid body superposition.

A widely used way to compare the structures of biomolecules or solid bodies is to “translate” or rotate one structure with respect to the other to minimize the RMSD. This RMSDmin can be used as a distance measure between two proteins.