folding and misfolding initiation sites in proteins by chris bystroff rensselaer polytechnic...

Download Folding and Misfolding Initiation Sites in Proteins By Chris Bystroff Rensselaer Polytechnic Institute Troy, New York, USA

If you can't read please download the document

Upload: darlene-clark

Post on 17-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

folding pathways must exist something happens first...

TRANSCRIPT

Folding and Misfolding Initiation Sites in Proteins By Chris Bystroff Rensselaer Polytechnic Institute Troy, New York, USA Folding is 2-state Unfolded Folded Intermediates are not observed, but folding pathways must exist something happens first... Early events eliminate alternative pathways Protein Misfolding diseases Alzheimer's disease Creutzfeldt-Jakob Disease (CJD)* Scrapie* Kuru* Huntington's Disease Parkinson's Disease Type-2 diabetes Familial Amyloid Polyneuropathy (FAP) *Prion-linked Misfolding Diseases are Diseases of the Folding Pathway Predicting protein structure using homology may be easier and faster, BUT You cant predict misfolding using homology. You CAN predict misfolding by modeling the folding pathway. Early folding events might be recorded in the database HDFPIEGGDSPMQTIFFWSNANAKLSHGY CPYDNIWMQTIFFNQSAAVYSVLHLIFLT IDMNPQGSIEMQTIFFGYAESA ELSPVVNFLEEMQTIFFISGFTQTANSD INWGSMQTIFFEEWQLMNVMDKIPS IFNESKKKGIAMQTIFFILSGR PPPMQTIFFVIVNYNESKHALWCSVD PWMWNLMQTIFFISQQVIEIPS MQTIFFVFSHDEQMKLKGLKGA Short, recurrent sequence patterns could be folding Initiation sites Nature has selected for these patterns because they speed folding. Non-homologous proteins recurrent part I-sites VIVAANRSA VIVSAARTA VIASAVRTA VIVDAGRSA VIASGVRTA VIVAAKRTA VIVSAVRTP VIVSAARTA VIVSAVRTP VIVDAGRTA VIVSGARTP VIVDFGRTP VIVSATRTP VIVGALRTP VIVSATRTP VIASAARTA VIVDAIRTP VIVAAYRTA VIVSAARTP VIVDAIRTP VIVSAVRTA VIVAAHRTA Sequence alignment Sequence profile P ij w k s kj aa i k seqs w k k Sequence Profiles Red = high prob ratio (LLR>1) Green = background prob ratio (LLR0) Blue = low prob ratio (LLR 20ns long) HMMSTR Arrangement of I-sites motifs in proteins is highly non-random helix cap beta strand beta turn The dependencies can be modeled as a Markov chain Is there a motif grammar? Finding the connectivity of I-sites motifs HMM topology: aligned profiles aligned structures HMMSTR Hidden Markov Model for local protein STRucture Markov state pathways represent local structure motifs HMMSTR Support Vector Machine based on HMMSTR 282-dimensional vector: Prob of each HMMSTR state. SVM-HMMSTR outperforms SVM-pairwise SCOP benchmark of 54 sequence families 1 2 3 What can HMMSTR tell us about misfolding? Human prion protein fragment. (X-ray structure solved in 2002) Can we predict misfolding, given the Xray structure? HMMSTR secondary structure prediction Helix 3 is known to be the location of familial prion disease mutations. Knaus et al, NSB 8:770-4, 2001 Green Fluorescent Protein Beta-barrel, with a central helix Folds slowly (k=0.02s -1 ) Aggregates in the cell if overexpressed without chaparonin. Contains a misfolding initiation site: CFSRYPDH Proline helix C-cap M U F A U=unfolded M=misfolded F=folded A=aggregated misfolded piece corrrecly folded Destabilizing the misfolding initiation site CFSLYPDH CFSRYPDH HMM State Contact Potential G (p, q, s) represents the free energy of a motif-motif contact. HMMSTR-CM Contact Maps Definition: HMMSTR-CM Target Contact Potential Map E (i, j) is the contact potential between every two residues i and j. HMMSTR-CM E(i,j) HMMSTR-CM Both axes: sequence Red: favorable contact Blue: unfavorable helices strands HMMSTR-CM Features in a contact map can be interpreted as a TOPS diagram helices strands HMMSTR-CM Features in a contact map can be interpreted as a TOPS diagram Which one is right? HMMSTR-CM True Contact Map T0130 ab initio Prediction X True contact map amphipat hic non- polar Contact energies T0130 What does structure prediction tell us about the physics of folding? A. If we can predict protein structures, then we know how proteins fold. B. If we know how proteins fold, then we can predict protein structures. Check one: Bystroff Lab: Yu ShaoDonna Crone Xin YuanRachel van Duyne Kwang Kim Funding from: NSF, HHMI, RPI HMMSTR says: Think Globally, Act Locally.at NUS : Yuna Hou Mong-Li Lee Wynne Hsu HMMSTR : Chris Bystroff Vesteinn Thorsson David Baker Prediction algorithms have Underlying principles Darwin = protein evolution. Principle: Proteins that evolved from common ancestor have the same fold. Algorithm: Detect homology.. Boltzmann = protein folding Principle: Proteins search conformational space, minimizing the free energy. Algorithm: Fold the protein. Pathway based on nucleation sites and unfolding HMMSTR-CM More polarMore non-polar Alternative (wrong) pathway. Given the assumptions, the pathway is almost unambiguous!! HMMSTR-CM A rule-based approach to fragment assembly Uses the contact map representation May be easily combined with templates helices strands HMMSTR-CM Features in a contact map can be interpreted as a TOPS diagram Alpha-alpha corner What are the rules for folding? HMMSTR-CM Assumptions: 1. We know how proteins unfold. 2. Folding is the reverse of unfolding. HMMSTR-CM Unfolding a protein If we break the protein along the weakest fault lines.... HMMSTR-CM the result are protein fragments of the size we can already predict accurately. HMMSTR-CM Simplified representation GFP is an 11-stranded anti-parallel beta barrel. The core of the protein contains non- polar (blue) and polar sidechains (red). Are late folding peptides more antigenic? Experiment: Ab ** raised against peptides for early folding loop (hairpin 5-6) and late- folding hairpin (10- 11). Ab checked for binding to GFP. Late-folding peptide Early-folding peptide **Kindly provided by Dennis Metzger, AMC. If folding is the reverse of unfolding, what are the rules? HMMSTR-CM 1. more local 2. Right-handed + non-polar pairing 3. Helix crowding HMMSTR-CM CASP5 Target T0157 HMMSTR-CM This is how the procedure is carried out on the contact map. CASP5 Target T0157 HMMSTR-CM CASP5 Target T0157 HMMSTR-CM CASP5 Target T0157 HMMSTR-CM CASP5 Target T0157 HMMSTR-CM Scoring alignments Alignment Template Contact Map j i jj ii Target Contact Map Target template HMMSTR-CM T0157 Template: 1hjrA HMMSTR-CM Initial efforts at testing a pathway model Green Fluorescent Protein (GFP) folds slowly, and is known to misfold. We modeled the pathway by: (1) unfolding GFP using the known structure (1EMC) (2) predicting folding/misfolding initiation sites using I- sites. (3) Prediciting nucleation sites using contact maps. Local anti-parallel beta-sheet units in GFP. Contact potential map, and true contacts Highest propensity for local structure forms first. 2nd highest These two come together Bystroff Lab: Yu ShaoDonna Crone Xin YuanRachel van Duyne Paul GervasioKwang Kim Dan DitursiRob Klees Karunya Ramasamy Funding from: NSF, HHMI, RPI HMMSTR says: Think Globally, Act Locally. Variations on a motif theme are modeled as parallel paths Multiple state-pathways for the helix N-cap motif HMMSTR Anti-late folding peptide (P1) binds GFP Early folding peptide (P2) is not immunogenic (???!) Results from last week. P2 P1 P1+P2 GFP Common sub-graphs represent common sub-structures These peptide segments have the same state sequence (except where they branch) HMMSTR 3-state secondary structure prediction 74.9% correct 74.6% correct HMMSTR What does structure prediction tell us about the physics of folding? A. If we can predict protein structures, then we know how proteins fold. B. If we know how proteins fold, then we can predict protein structures. Check one: Rosetta Psi-Blast I-sites moves Fragment Insertion Monte Carlo in torsion angle space sequence I-sites motifs Bayesian Knowledge-based energy function Sequence splitter 3D Coordinates Sequence assembler Fragment insertion Monte Carlo Energy function change backbone angles Convert angles to 3D coordinates accept or reject Choose fragment from moveset moveset backbone torsion angles Rosetta Backbone angles are restrained in I-sites regions moveset backbone torsion angles Generally, about one-third of the sequence has an I-sites prediction with confidence > 0.75, and is restrained. Fragments that deviate from the paradigm (>90 in or ) are removed from the moveset. regions of high- confidence I-sites prediction Rosetta Sequence dependent features Rosetta Current structure Sequence-independent features The energy score for a contact between secondary structures is summed using database statistics. vector representation Probabilities from the database Rosetta CASP4 predictions 31 target sequences. Ab initio prediction i.e. Sequence homolog data was ignored if present. 61% topologically correct 60% locally correct 73% secondary structure correct Rosetta T (61 residues) predictiontrue structure Topologically correct (rmsd=5.9) but helix is mis- predicted as loop. Rosetta T (66 residues) predictiontrue structure Topologically correct (rmsd=5.9) but loop is mis- predicted as helix. Rosetta T (97 residues)...contains a 53 residue stretch with max deviation = 96 predictiontrue structure Rosetta T Low rmsd (5.6) and all angles correct ( mda = 84), but topologically wrong! predictiontrue structure (this is rare) Rosetta