protein structure prediction. protein structure u amino-acid chains can fold to form 3-dimensional...
TRANSCRIPT
![Page 1: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/1.jpg)
.
Protein Structure Prediction
![Page 2: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/2.jpg)
Protein Structure
Amino-acid chains can fold to form 3-dimensional structures
Proteins are sequencesthat have (more or less) stable 3-dimensional configuration
![Page 3: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/3.jpg)
Why Structure is Important?
The structure a protein takes is crucial for its function Forms “pockets” that can recognize an enzyme
substrate Situates side chain of
specific groups to co-locate to form areas with desired chemical/electrical properties
Creates firm structures such ascollagen, keratins, fibroins
![Page 4: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/4.jpg)
Determining Structure
X-Ray and NMR methods allow to determine the structure of proteins and protein complexes
These methods are expensive and difficult Could take several work months to process one
proteins
A centralized database (PDB) contains all solved protein structures
XYZ coordinate of atoms within specified precision
~19,000 solved structures
![Page 5: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/5.jpg)
Growth of the Protein Data Bank
![Page 6: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/6.jpg)
Structure is Sequence Dependent
Experiments show that for many proteins, the 3-dimensional structure is a function of the sequence
Force the protein to loose its structure, by introducing agents that change the environment
After sequences put back in water, original conformation/activity is restored
However, for complex proteins, there are cellular processes that “help” in folding
![Page 7: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/7.jpg)
Amino Acids
![Page 8: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/8.jpg)
What Forces Hold the Structure?
Structure is supported by several types of chemical bonds/forces
Hydrogen Bonds
![Page 9: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/9.jpg)
What Forces Hold the Structure?
Charge-charge interactions Positive charged groups prefer to be situated
against negatively charged groups
![Page 10: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/10.jpg)
What Forces Hold the Structure?
Disulfide bonds S-S bonds between
cysteine residues These form during
folding
![Page 11: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/11.jpg)
What Forces Hold the Structure?
Hydrophobic effect
![Page 12: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/12.jpg)
Levels of structure
![Page 13: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/13.jpg)
Secondary Structure
-helix -strands
![Page 14: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/14.jpg)
Hydrogen Bonds in -Helixes
![Page 15: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/15.jpg)
-Strands form Sheets
parallel Anti-parallel
These sheets hold together by hydrogen bonds across strands
![Page 16: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/16.jpg)
Angular Coordinates
Secondary structures force specific angles between residues
![Page 17: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/17.jpg)
Ramachandran Plot
We can related angles to types of structures
![Page 18: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/18.jpg)
Labeling Secondary Structure
Using both hydrogen bond patterns and angles, we can label secondary structure tags from XYZ coordinate of amino-acids
These do not lead to absolute definition of secondary structure
![Page 19: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/19.jpg)
Prediction of Secondary Structure
Input: amino-acid sequence
Output: Annotation sequence of three classes:
alpha beta other (sometimes called coil/turn)
Measure of success: Percentage of residues that were correctly labeled
![Page 20: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/20.jpg)
Protein Folds: sequential, spatial and topological arrangement of
secondary structures
The Globin foldThe Globin fold
![Page 21: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/21.jpg)
Approaches for structure prediction
Homology modeling (25-30% identity as a predictor)
Fold recognition Remote homology
Ab initio Prediction Heavy computations
![Page 22: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/22.jpg)
Newly Determined Structures-Fraction of New Folds
![Page 23: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/23.jpg)
Fraction of new folds (PDB new entries in 1998)
Koppensteiner et al., 2000,Koppensteiner et al., 2000,JMB 296:1139-1152.JMB 296:1139-1152.
![Page 24: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/24.jpg)
A Finite Number of Protein Folds
Aim: recognize fold that “matches” a given sequence
Approaches: PSI-Blast, Profile HMMs, etc. Threading
![Page 25: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/25.jpg)
EEabab A C D E …..
A -3 -1 0 0 ..C -1 -4 1 2 ..D 0 1 5 6 ..E 0 2 6 7 ... . . . .
ACCECADAAC -3-1-4-4-1-4-3-3=-23
• structural templatestructural template
• neighbor definitionneighbor definition
• energy functionenergy function
11
22
33
44
55
66
77
1010
88
99
AA
CC
CC
EE
CC
AA
DDAA
AA
CC
E Eji, positions
ba ji
Threading: Essential components
![Page 26: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/26.jpg)
MAHFPGFGQSLLFGYPVYVFGD...
Potential fold
...
1) ... 56) ... n)
...
-10 ... -123 ... 20.5
Find best fold for a protein sequence:
Fold recognition (threading)
![Page 27: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/27.jpg)
GenTHREADER(Jones , 1999, JMB 287:797-815)
For each template provide MSA align the query sequence with the MSA assess the alignment by sequence alignment
score assess the alignment by pairwise potentials assess the alignment by solvation function record lengths of: alignment, query, template
![Page 28: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/28.jpg)
Essentials of GenTHREADER
![Page 29: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/29.jpg)
Ab-initio Structure Recognition
Goal: Predict structure from “first principles”
Benefits: Works for novel folds Shows that we understand the process
![Page 30: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/30.jpg)
Approaches to Ab-initio Prediction
Molecular Dynamics Simulates the forces that governs the protein within
water Since proteins natural fold, this would lead to
solved structure
Problems: Thousands of atoms Huge number of time steps to reach folded protein
Intractable problem
![Page 31: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/31.jpg)
Approaches to Ab-initio Prediction
Minimal Energy Assumption: folded form is the minimal energy
conformation of the protein
Decomposition: Define energy function Search for 3-D conformation that minimize energy
![Page 32: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/32.jpg)
Energy Function
Account for the forces that apply on the molecule Van der wals forces Covalent bonds Hydrogen bonds Charges Hydrophobic effects
Issues: Estimating parameters How do we compute it --- O( (# atoms)^2 )
![Page 33: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/33.jpg)
Simplified Energy Functions
Different levels of granularity Residue-Residue energy function (Bead model)
Partial model Backbone as a bid Side-chain as a rigid body that can move wrt to
backbone
Many other variants
![Page 34: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/34.jpg)
Search Strategy
High dimensional search problem
How do we represent partial solutions?
Position of each atom (too detailed!) Position of each reside (too coarse!) Intermediate solutions (e.g., backbone and side
chain)
![Page 35: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/35.jpg)
Search Strategy
Representation tradeoffs
X,Y,Z coordinates Easy to compute distances between residues Might represent infeasible solutions
Angles between successive residues Easy to ensure a “legal” protein Harder to compute distances
![Page 36: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/36.jpg)
Search Strategy
Typical approach: Secondary structure prediction Attempts at different conformation keeping
secondary structure fixed Finer moves relaxing secondary structure
Use Greedy search Simulated annealing …
![Page 37: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/37.jpg)
Rosetta Method
Idea: “Structural” signatures are reoccurring within
protein structures Use these as cues during structure search
![Page 38: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/38.jpg)
Local structure motifs
diverging type-2 turn
Serine hairpin Type-I hairpin
Frayed helix
Proline helix C-capalpha-alpha corner
glycine helix N-cap
I-sites Library = a catalog of local sequence-structure correlations
![Page 39: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/39.jpg)
Example: Non-polar Alpha-helix
![Page 40: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/40.jpg)
Example: Non-polar beta-strand
![Page 41: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/41.jpg)
Example: Gly alpha-C-cap Type 1
![Page 42: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/42.jpg)
Construction of I-sites library
Construct profiles (PSI-BLAST like) for each solved structure
Collect each possible segments of fixed length(len = 3, 9, 15)
Perform k-means clustering of segments Check each cluster for a “coherent” structure (in
terms of dihedral angles Prune incoherent structures Iteratively refine remaining clusters by removing
structurally different segments, redefining cluster membership, etc.
![Page 43: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/43.jpg)
All proteins can be constructed from fragments
Recent experiment:
For representative proteins, backbones were assembled from a library of 1000 different 5-residue fragments.
![Page 44: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/44.jpg)
Fragment insertion Monte Carlo
Energyfunctionchange backbone
angles
Convert to 3D
accept or reject
Choose a fragment
frag
men
ts
backbone torsion angles
Rosetta: a folding simulation program
evaluate
![Page 45: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/45.jpg)
Sequence dependent features
Rosetta’s energy function
Residue-residue contact energies are derived from the database
![Page 46: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/46.jpg)
Current structure
Sequence-independent features
The energy score for a contact between secondary structures is summed using database statistics.
vector representationProbabilities from the database
Rosetta’s energy function
![Page 47: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/47.jpg)
Rosetta prediction results
61% “topologically correct”
60% “locally correct”
73% secondary structure (Q3) correct
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
![Page 48: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/48.jpg)
Evaluation of partially correct predictions
RM
SD
L=30
L=20
L=8
6.0Å
Sequence
Tertiary structure %correct is the fraction of the sequence that is in a 30-residue window with RMSD < 6.0Å
MD
AL=windowsize
Ter
iary
str
uct
ure
Loc
al s
tru
ctu
re
mda = maximum deviation in backbone angles over an 8 residue window.
Local structure %correct is the fraction of the sequence that has mda < 90°.
90°
Sequence
![Page 49: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/49.jpg)
T0116 262-322 (61 residues)
prediction true structure
Topologically correct (rmsd=5.9Å) but helix is mis-predicted as loop.
![Page 50: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/50.jpg)
T0121 126-199 (66 residues)
prediction true structure
Topologically correct (rmsd=5.9Å) but loop is mis-predicted as helix.
![Page 51: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/51.jpg)
T0122 57-153 (97 residues)
...contains a 53 residue stretch with max deviation = 96°
prediction true structure
![Page 52: Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more](https://reader033.vdocuments.us/reader033/viewer/2022051821/5697bfc51a28abf838ca6d3d/html5/thumbnails/52.jpg)
T0112 153-213
Low rmsd (5.6Å) and all angles correct ( mda = 84°), but topologically wrong!!
prediction true structure
(this is rare)