protein structure prediction haixu tang school of informatics

Download Protein structure prediction Haixu Tang School of Informatics

If you can't read please download the document

Upload: janel-pierce

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

Basic operations in a cell (Central Dogma) A gene is expressed in two steps 1)Transcription: RNA synthesis 2)Translation: Protein synthesis Proteins

TRANSCRIPT

Protein structure prediction Haixu Tang School of Informatics Basic operations in a cell (Central Dogma) A gene is expressed in two steps 1)Transcription: RNA synthesis 2)Translation: Protein synthesis Basic operations in a cell (Central Dogma) A gene is expressed in two steps 1)Transcription: RNA synthesis 2)Translation: Protein synthesis Proteins Proteins are major function biomolecules in cells Examples of protein functions Catalysis: Almost all chemical reactions in a living cell are catalyzed by protein enzymes. Transport: Some proteins transports various substances, such as oxygen, ions, and so on. Information transfer: For example, hormones. Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones Haemoglobin carries oxygen Insulin controls the amount of sugar in the blood Protein is composed of amino acids COO - NH 3 + C R H Amino groupCarboxylic acid group Different side chains, R, determine the chemical properties of 20 amino acids. 20 Amino acids Glycine (G) Glutamic acid (E) Asparatic acid (D) Methionine (M) Threonine (T) Serine (S) Glutamine (Q) Asparagine (N) Tryptophan (W) Phenylalanine (F) Cysteine (C) Proline (P) Leucine (L) Isoleucine (I) Valine (V) Alanine (A) Histidine (H) Lysine (K) Tyrosine (Y) Arginine (R) White: Hydrophobic, Green: Hydrophilic, Red: Acidic, Blue: Basic Proteins are linear polymers of amino acids R1R1 NH 3 C CO H R2R2 NH C CO H R3R3 NH CCO H R2R2 NH 3 C COO H R1R1 NH 3 C COO H H2OH2O H2OH2O Peptide bond The amino acid sequence is called as primary structure AA F N G G S T S D K Each Protein has a unique structure Amino acid sequence NLKTEWPELVGKSVEE AKKVILQDKPEAQIIVL PVGTIVTMEYRIDRVR LFVDKLDNIAEVPRVG folding Protein Structure Determination X-ray crystallography most accurate in vitro need crystal proteins ~100K per structure Nuclear Magnetic Resonance Fairly accurate in vivo, in solution No need for crystals Limited to small proteins Protein data bankPDB files: atom coordinates, etc ( 1atn: actin/DNAse I complex) ATOM 1 CA ACE A ATN 263 ATOM 2 C ACE A ATN 264 ATOM 3 O ACE A ATN 265 ATOM 4 N ASP A ATN 266 ATOM 5 CA ASP A ATN 267 ATOM 6 C ASP A ATN 268 ATOM 7 O ASP A ATN 269 ATOM 8 CB ASP A ATN 270 Visualizing protein structure (PDB files) Basic structural units of proteins: Secondary structure -helix -sheet Secondary structures, -helix and -sheet, have regular hydrogen-bonding patterns. Three-dimensional structure of proteins Tertiary structure Quaternary structure Hierarchical nature of protein structure Primary structure (Amino acid sequence) Secondary structure -helix, -sheet Tertiary structure Three-dimensional structure formed by assembly of secondary structures Quaternary structure Structure formed by more than one polypeptide chains Secondary Structure Prediction Given a protein sequence, secondary structure prediction aims at predicting the state of each amino acid as being either H (helix), E (extended=strand), or O (other). The quality of secondary structure prediction is measured with a 3-state accuracy score, or Q 3. Q 3 is the percent of residues that match reality (X-ray structure). Early methods for Secondary Structure Prediction Chou and Fasman (Chou and Fasman. Prediction of protein conformation. Biochemistry, 13: , 1974) GOR (Garnier, Osguthorpe and Robson. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol., 120: , 1978) Amino Acid -Helix -SheetTurn Ala Cys Leu Met Glu Gln His Lys Val Ile Phe Tyr Trp Thr Gly Ser Asp Asn Pro Arg Chou and Fasman Favors -Helix Favors -strand Favors turn The GOR method For each position j in the sequence, eight residues on either side are considered. j Accuracy Both Chou and Fasman and GOR have been assessed and their accuracy is estimated to be Q3=60-65%. Neural networks The most successful methods for predicting secondary structure are based on neural networks. The overall idea is that neural networks can be trained to recognize amino acid patterns in known secondary structure units, and to use these patterns to distinguish between the different types of secondary structure. Neural networks classify input vectors or examples into categories (2 or more). Protein 3D Structure Prediction In theory, a protein structure can be predicted computationally A protein folds into a 3D structure to minimizes its free potential energy The problem can be formulated as a search problem for minimum energy the search space is enormous even for small proteins! the number of local minima increases exponentially of the size of proteins Computational Methods for Protein 3D Structure Prediction Comparative modeling Protein threading make structure prediction through identification of good sequence-structure fit Homology modeling identification of homologous proteins through sequence alignment; structure prediction through placing residues into corresponding positions of homologous structure models Protein Threading Find the correct sequence-structure alignment between a target sequence and its native-like fold in PDB Energy function knowledge (or statistics) based rather than physics based Should be able to distinguish correct structural folds from incorrect structural folds Should be able to distinguish correct sequence-fold alignment from incorrect sequence-fold alignments Protein Threading Structure database Fitness function Sequence-structure alignment algorithm Prediction reliability assessment Protein Threading structure database Build a template database Protein Threading fitness function MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE how well a residue fits a structural environment: E_s how preferable to put two particular residues nearby: E_p alignment gap penalty: E_g find a sequence-structure alignment to minimize the energy function Protein Threading (sequence-structure alignment) Unlike sequence-sequence alignment where amino acids are aligned, a sequence-structure alignment aligns amino acids with structural environments A simple definition of structural environment secondary structure: alpha-helix, beta-strand, loop solvent accessibility: 0, 10, 20, , 100% of accessibility each combination of secondary structure and solvent accessibility level defines a structural environment E.g., (alpha-helix, 30%), (loop, 80%), Protein Threading -- algorithm Threading algorithm to find a sequence-structure alignment with the minimum fitness function sequence fold links CASP CASP = Critical Assessment of Structure Prediction First held in 1994, every 2 years afterwards Teams make structure predictions from sequences alone CASP Two categories of predictors Automated Automatic Servers, must complete analysis within 48 hours Shows what is possible through computer analysis alone Non-automated Groups spend considerable time and effort on each target Utilize computer techniques and human analysis techniques CASP CASP6, 2004 200 prediction teams from 24 countries Over 30,000 predictions for 64 protein targets collected and evaluated Conference held after to discuss results, with many teams presenting individual results and methodologies Helps to steer future work