protein structure prediction - indiana university...
TRANSCRIPT
![Page 1: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/1.jpg)
1
Ram Samudrala, University of Washington
Protein Structure Prediction
![Page 2: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/2.jpg)
2
Rationale for Understanding Protein Structure and Function
Protein sequence
-large numbers of sequences, includingwhole genomes
Protein function
- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution
?
structure determination structure prediction
homologyrational mutagenesisbiochemical analysis
model studies
Protein structure
- three dimensional- complicated- mediates function
![Page 3: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/3.jpg)
3
Protein Folding
…-L-K-E-G-V-S-K-D-…
…-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-…
one amino acid
DNA
protein sequence
unfolded protein
native state
spontaneous self-organization (~1 second)
not uniquemobileinactive
expandedirregular
![Page 4: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/4.jpg)
4
Protein Folding
…-L-K-E-G-V-S-K-D-…
…-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-…
one amino acid
DNA
protein sequence
unfolded protein
native state
spontaneous self-organisation (~1 second)
unique shapeprecisely orderedstable/functionalglobular/compacthelices and sheets
not uniquemobileinactive
expandedirregular
![Page 5: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/5.jpg)
5
unfolded
Protein Folding Landscape
Large multi-dimensional space of changing conformationsfr
ee e
nerg
y
folding reaction
moltenglobule
J=10-8 s
native
J=10-3 s
ΔG**
RTG
e*
(J) timejumpΔ−
∝
barrierheight
![Page 6: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/6.jpg)
6
Protein Primary Structuretwenty types of amino acids
R
HC
OH
O
N
H
HCα
two amino acids join by forming a peptide bond
R
Cα
HC
O
N
H
H NCα
H
C
O
OH
R
H
R
Cα
HC
O
N
H
NCα
H
C
O
R
HR
Cα
HC
O
N
H
NCα
H
C
O
R
Hχ
χ
χ
χ
φφ φφ
ψ
ψ
ψ
ψ
each residue in the amino acid main chain has two degrees of freedom (φ and ψ)
the amino acid side chains can have up to four degrees of freedom (χ1-4)
![Page 7: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/7.jpg)
7
Protein Secondary Structure
β
α
Lφ 0
0 ψ
+180
+180-180
-180
many φ,ψ combinations are not possible
α helix
β sheet (anti-parallel)
N
C
N
C
β sheet (parallel)
![Page 8: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/8.jpg)
8
Protein Tertiary and Quaternary Structures
Ribonuclease inhibitor (2bnh) Haemoglobin (1hbh)
Hemagglutinin (1hgd)
![Page 9: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/9.jpg)
9
Methods for Determining Protein Structure
Protein sequence
-large numbers of sequences, includingwhole genomes
Protein function
- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution
?
X-ray crystallographyNMR spectroscopy
homologyrational mutagenesisbiochemical analysis
model studies
Protein structure
- three dimensional- complicated- mediates function
expensive
and slow
![Page 10: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/10.jpg)
10
A Naïve Approach
• Use the first principles to produce the native conformation of a protein• not only the correct structure, but entire energy landscape• it would explain dynamic behavior of a protein
Let’s see how this could work…
• there are only 5 atom types (C, H, O, N, S) , so if we can accurately model interactions between them, we could get to the solution of the folding problem
So, why is it then so complicated…
• atomic interactions cannot be modeled with sufficient accuracy (plus proteins are only marginally stable)
• some phenomena are highly non-linear (for example, Van der Waals forces)
• large number in the degrees of freedom + modeling water molecules
ab initio !!!
![Page 11: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/11.jpg)
11
Predictions Needed NOW!!!
• Pure ab initio approach is out of reach for a long time
• We must adopt a less purist approach
What should we do?
• use approximations
• use all available information• vast number of sequences• large number of structures• functional site information
![Page 12: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/12.jpg)
12
Methods for Predicting Protein Structure
Protein sequence
-large numbers of sequences, includingwhole genomes
Protein function
- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution
?
comparative modelingfold recognition
ab initio prediction
homologyrational mutagenesisbiochemical analysis
model studies
Protein structure
- three dimensional- complicated- mediates function
![Page 13: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/13.jpg)
13
Protein Sequence
Database Searching Domain AssignmentMultiple SequenceAlignment
Homologuein PDB
ComparativeModelling
SecondaryStructure
and Disorder
Prediction
No
Yes
3-D Protein Model
FoldRecognition
PredictedFold
Sequence-StructureAlignment
Ab-initioStructurePrediction
No
Yes
Overall Approach
modified from http://bioinf.cs.ucl.ac.uk
![Page 14: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/14.jpg)
14
Comparative (Homology) Modeling of Protein Structure
• Aims to produce protein models with high accuracy
• Proteins that have similar sequences (i.e., related by evolution) have similar three-dimensional structures
• A model of a protein whose structure is not known can be constructed if the structure of a related protein has been determined by experimental methods
• Similarity must be obvious and significant for good models to be built
• Need ways to build regions that are not similar between the two related proteins
• Need ways to move model closer to the native structure
![Page 15: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/15.jpg)
15
Comparative Modeling of Protein Structure
KDHPFGFAVPTKNPDGTMNLMNWECAIPKDPPAGIGAPQDN----QNIMLWNAVIP** * * * * * * * **
… …
scanalign
build initial modelconstruct non-conserved
side chains and main chains
refine
![Page 16: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/16.jpg)
16
Let’s Look Closer at Steps of Homology Modeling
1. Template recognition and initial alignment
2. Alignment correction
3. Backbone generation
4. Loop modeling
5. Side-chain modeling
6. Model optimization
7. Model validation
![Page 17: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/17.jpg)
17
Let’s Look Closer at Steps of Homology Modeling
1. Template recognition and initial alignment
2. Alignment correction
3. Backbone generation
4. Loop modeling
5. Side-chain modeling
6. Model optimization
7. Model validation
![Page 18: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/18.jpg)
18
Let’s Look Closer at Steps of Homology Modeling
1. Template recognition and initial alignment
2. Alignment correction
3. Backbone generation
4. Loop modeling
5. Side-chain modeling
6. Model optimization
7. Model validation
![Page 19: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/19.jpg)
19
Recognition of similarity between the target and template
Target – protein with unknown structure.
Template – protein with known structure.
Main difficulty – deciding which template to pick, multiple choices/template structures.
Template structure can be found by searching for structures in PDB using sequence-sequence alignment methods.
1. Template Recognition
![Page 20: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/20.jpg)
20
Two Zones of Sequence Alignment
50 100 150 200
50
100
Safe homology modeling zone
Twilight zone
Alignment length
Sequence identity
![Page 21: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/21.jpg)
21
1. If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned.
2. If two aligned residues are the same, copy their side chain coordinates as well.
3. Backbone Generation
![Page 22: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/22.jpg)
22
insertion
AHYATPTTTAH---TPSS
deletion
Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict.
Approaches to loop modeling:- knowledge-based: searches the PDB for loops with known structure- energy-based: an energy function is used to evaluate the quality of a loop.
Energy minimization or Monte Carlo.
4. Loop Modeling
![Page 23: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/23.jpg)
23
Scan database and search protein fragments with correct number of residuesand correct end-to-end distances
4. Loop Modeling – Database Approach
![Page 24: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/24.jpg)
24
Side chain conformations – rotamers. In similar proteins - side chains have similar conformations.
If % identity is high - side chain conformations can be copied from template to target. If % identity is not very high - modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions.
Problem: side chain configurations depend on backbone conformation which is predicted, not real
E1
E2
E3 E = min (E1, E2, E3)
5. Side-Chain Modeling
![Page 25: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/25.jpg)
25
• Energy optimization of entire structure.
• Since conformation of backbone depends on conformations of side chains and vice versa - iterative approach
Predict rotamers Shift in backbone
6. Model Optimization
![Page 26: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/26.jpg)
26
CASP5 assessors, homology modeling category:
“We are forced to draw the disappointing conclusion that, similarlyto what observed in previous editions of the experiment, no modelresulted to be closer to the target structure than the template toany significant extent.”
The consensus is not to refine the model, as refinement usually pulls themodel away from the native structure!!
6. Model Optimization???
![Page 27: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/27.jpg)
27
Historical Perspective on Comparative Modeling
BC
excellent~ 80%1.0 Å2.0 Å
alignmentside chainshort loopslonger loops
![Page 28: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/28.jpg)
28
Historical Perspective on Comparative Modeling
CASP1
poor~ 50%~ 3.0 Å> 5.0 Å
BC
excellent~ 80%1.0 Å2.0 Å
alignmentside chainshort loopslonger loops
![Page 29: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/29.jpg)
29
Prediction for CASP4 target T128/sodm
Cα RMSD of 1.0 Å for 198 residues (PID 50%)
![Page 30: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/30.jpg)
30
Prediction for CASP4 target T122/trpa
Cα RMSD of 2.9 Å for 241 residues (PID 33%)
![Page 31: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/31.jpg)
31
Prediction for CASP4 target T125/sp18
Cα RMSD of 4.4 Å for 137 residues (PID 24%)
![Page 32: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/32.jpg)
32
Prediction for CASP4 target T112/dhso
Cα RMSD of 4.9 Å for 348 residues (PID 24%)
![Page 33: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/33.jpg)
33
Prediction for CASP4 target T92/yeco
Cα RMSD of 5.6 Å for 104 residues (PID 12%)
![Page 34: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/34.jpg)
34
CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity
**T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%)
**T128/sodm – 1.0 Å (198 residues; 50%)
**T125/sp18 – 4.4 Å (137 residues; 24%)
**T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%)
Comparative Modeling at CASP - conclusions
CASP2
fair~ 75%~ 1.0 Å~ 3.0 Å
CASP3
fair~75%
~ 1.0 Å~ 2.5 Å
CASP4
fair~75%~ 1.0 Å~ 2.0 Å
CASP1
poor~ 50%~ 3.0 Å> 5.0 Å
BC
excellent~ 80%1.0 Å2.0 Å
alignmentside chainshort loopslonger loops
![Page 35: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/35.jpg)
35
• Aim to solve the structure of all proteins: this is too much work experimentally!
• Solve enough structures so that the remaining structures can be inferred from those experimental structures
• The number of experimental structures needed depend on our abilities to generate a model.
Structural Genomics Project
![Page 36: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/36.jpg)
36
Proteinswithknownstructures
Unknown proteins
Structural Genomics Project
![Page 37: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/37.jpg)
37
• Goal: to find protein with known structure which best matches a givensequence
• Since similarity between target and the closest to it template is not high, sequence-sequence alignment methods fail
• Solution: threading – sequence-structure alignment method
Fold Recognition
![Page 38: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/38.jpg)
38
Fold Recognition
• The number of possible protein structures/folds is limited (large number of sequencesbut few folds)
• Proteins that do not have similar sequences sometimes have similar three-dimensional structures
• A sequence whose structure is not known is fitted directly (or “threaded”) onto a known structure and the “goodness of fit” is evaluated using a discriminatoryfunction
• Need ways to move model closer to the native structure
3.6 Å5% ID
NK-lysin (1nkl) Bacteriocin T102/as48 (1e68)
![Page 39: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/39.jpg)
39
Fold Recognition
KDHPFGFAVPTKNPDGTMNLMNWECAIPKDPPAGIGAPQDN----QNIMLWNAVIP** * * * * * * * **
… …
evaluatefit
build initial modelconstruct non-conserved
side chains and main chains
refine
![Page 40: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/40.jpg)
40
• Step 1: Construction of Template Library • Step 2: Design of Scoring Function• Step 3: Sequence-Structure Alignment• Step 4: Template Selection and Model Construction
Only step 1 is relatively easy!
Steps in Threading
![Page 41: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/41.jpg)
41
Target Sequence
α & β structure from template structureTemplate
Steps in Threading
![Page 42: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/42.jpg)
42
• Sequence-structure alignment– target sequence is compared to all structural templates from the database
Requires:• Alignment method
– dynamic programming, Monte Carlo,…
• Scoring function– yields relative score for each alternative
alignment
Threading – Method for Structure Prediction
![Page 43: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/43.jpg)
43
A representative set of protein structures extracted from the PDB database. It satisfies the following conditions:
1. The resolution of each representative structure should be good;2. A good X-ray structure has higher priority than an NMR structure;3. The sequence identity between any two representatives should be no
more than 30%, in order to save computing time.
Examples:
• CATH: http://www.biochem.ucl.ac.uk/bsm/cath/
• SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/
• PDB_SELECT: http://www.cmbi.kun.nl/gv/pdbsel/
Template Database
![Page 44: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/44.jpg)
44
• Contact-based scoring function depends on the amino acid types of two residues and distance between them.
• Sequence-sequence alignment scoring function does not depend on the distance between two residues.
• If distance between two non-adjacent residues in the template is less than 8Å, these residues make a contact.
Scoring Function for Threading
![Page 45: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/45.jpg)
45
),(),(
;),(1,
TrpIlewTyrAlawS
aawSN
jiji
+=
= ∑=
Ala
Ile Tyr
Trp
w - calculated from the frequency of amino acid contacts in PDB
ai - amino acid type of target sequence aligned with the position i of the template
N - number of contacts
Scoring Function for Threading
![Page 46: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/46.jpg)
46
Class work: calculate the score for target sequence “ATPIIGGLPY” aligned to the template structure which is defined by the contact matrix.
**10
9
*8
*7
*6
**5
*4
*3
2
***1
10987654321
0.3L
0.20.4G
0.40.20.3I
-0.2-0.1-0.2-0.4Y
-0.20.1-0.1-0.4-0.2P
00.1-0.3-0.2-0.10.3T
0.2-0.20.5-0.10-0.1-0.2A
LGIYPTA
∑=
=N
jiji aawS
1,),(
![Page 47: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/47.jpg)
47
• Dynamic programming.“frozen approximation”: traceback in the alignment matrix is not possible for interactions between two amino acids, so that:
),(1,
∑=
=N
jiji bawS
b – amino acid type from template, not from target; now the score of every position does not depend on the alignment elsewhere in thesequence.
• Monte Carlo
Alignment Algorithms
![Page 48: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/48.jpg)
48
• Approximation Algorithm– Interaction-Frozen Algorithm (A. Godzik et al.)– Monte Carlo Sampling (S.H. Bryant et al.)– Double dynamic programming (D. Jones et al.)
• Exact Algorithm– Branch-and-bound (R.H. Lathrop and T.F. Smith)– PROSPECT-I uses Divide-and-conquer (Y. Xu et al.)– Linear programming by RAPTOR (J. Xu et al.)
Pairwise Threading Algorithms
![Page 49: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/49.jpg)
49
• Sequence-sequence alignment• Sequence-profile alignment• Sequence-HMM model alignment
– e.g. SAMT02 (K. Karplus et al.)• Profile-sequence alignment
– e.g. PDB-Blast (A. Godzik et al.)• Profile-profile alignment
– e.g. PROSPECT-II (Y. Xu et al.)• Combinations of several alignments
– e.g. 3DPS (L.A. Kelley et al), SHGU (D. Fischer)
Non-Pairwise Threading Algorithms
![Page 50: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/50.jpg)
50
• Correct bond length and bond angles
• Correct placement of functionally important sites
• Prediction of global topology, not partial alignment (minimum number of gaps)
>> 3.8 Angstroms
Threading Model Validation
![Page 51: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/51.jpg)
51
Placement of functionally important sites in threading.
Prediction of structure of methylglyoxal synthase based on the template of carabamoyl phosphate synthase
![Page 52: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/52.jpg)
52
GenThreader
1. Predicts secondary structures for target sequence
2. Makes sequence profiles (PSSMs) for each template sequence
3. Uses threading scoring function to find the best matching profile
http://bioinf.cs.ucl.ac.uk/psipred
![Page 53: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/53.jpg)
53
• Threading models are generally not suitable for things like drug design
• Function prediction is only possible if the fold family is only associated with a single function
Threading - Conclusions
![Page 54: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/54.jpg)
54
Protein Sequence
Database Searching Domain AssignmentMultiple SequenceAlignment
Homologuein PDB
ComparativeModelling
SecondaryStructurePrediction
DisorderPrediction
No
Yes
3-D Protein Model
FoldRecognition
PredictedFold
Sequence-StructureAlignment
Ab-initioStructurePrediction
No
Yes
Overall Approach
http://bioinf.cs.ucl.ac.uk
![Page 55: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/55.jpg)
55
Ab Initio Methods
![Page 56: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/56.jpg)
56
What is an atom?
• Classical mechanics: a solid object
• Defined by its position (x, y, z), its shape (usually a ball) and its mass
• May carry an electric charge (positive or negative), usually partial (less than an electron)
![Page 57: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/57.jpg)
57
Atomic interactions
Torsion anglesAre 4-body
AnglesAre 3-body
BondsAre 2-body
Non-bondedpair
![Page 58: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/58.jpg)
58
Forces between atoms
Strong bonded interactions
20 )( bbKU −=
20 )( θθ −= KU
))cos(1( φnKU −=
b
θ
φ
All chemical bonds
Angle between chemical bonds
Preferred conformations forTorsion angles:
- ω angle of the main chain- χ angles of the sidechains
(aromatic, …)
![Page 59: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/59.jpg)
59
Forces between atoms: van der Waals interactions
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟⎟
⎠
⎞⎜⎜⎝
⎛=
612
2)(r
Rr
RrE ijij
ijLJ ε
1/r12
1/r6
Rij
r
Lennard-Jones potential
jiijji
ij
RRR εεε =
+= ;
2
![Page 60: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/60.jpg)
60
Forces between atoms: Electrostatics interactions
r
Coulomb potential
qi qj
rqq
rE ji
επε041)( =
![Page 61: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/61.jpg)
61
Some Common force fields in Computational Biology
ENCAD (Michael Levitt, Stanford)
AMBER (Peter Kollman, UCSF; David Case, Scripps)
CHARMM (Martin Karplus, Harvard)
OPLS (Bill Jorgensen, Yale)
MM2/MM3/MM4 (Norman Allinger, U. Georgia)
ECEPP (Harold Scheraga, Cornell)
GROMOS (Van Gunsteren, ETH, Zurich)
Michael Levitt. The birth of computational structural biology. Nature Structural Biology, 8, 392-393 (2001)
![Page 62: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/62.jpg)
62
Protein Structure Prediction
• One popular model for protein folding assumes a sequence of events:
– Hydrophobic collapse
– Local interactions stabilize secondary structures
– Secondary structures interact to form motifs
– Motifs aggregate to form tertiary structure
![Page 63: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/63.jpg)
63
Protein Structure Prediction
A physics-based approach:
- find conformation of protein corresponding to a thermodynamics minimum (free energy minimum)
- cannot minimize internal energy alone! Needs to include solvent
- simulate folding…a very long process!
Folding time are in the ms to second time rangeFolding simulations at best run 1 ns in one day…
![Page 64: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/64.jpg)
64
What is a molecular dynamics simulation?
• Simulation that shows how the atoms in the system move with time
• Typically on the nanosecond timescale
• Atoms are treated like hard balls, and their motions are described by Newton’s laws.
![Page 65: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/65.jpg)
65
Why MD simulations?
• Link physics, chemistry and biology
• Model phenomena that cannot be observed experimentally
• Understand protein folding…
• Access to thermodynamics quantities (free energies, binding energies,…)
![Page 66: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/66.jpg)
66
Characteristic protein motions
> 5 Å20 ns
(20 ps)ms – hrs
Globalprotein tumbling(water tumbling)protein folding
1-5 Åns – μs
Medium scaleloop motions
SSE formation
< 1 Å0.01 ps0.1 ps1 ps
Local:bond stretchingangle bendingmethyl rotation
AmplitudeTimescaleType of motion
Periodic (harmonic)
Random (stochastic)
![Page 67: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/67.jpg)
67
The Ergodic Hypothesis
• Time averages = Ensemble Averages
timeensembleAA =
![Page 68: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/68.jpg)
68
The Folding @ Home initiative(Vijay Pande, Stanford University)
http://folding.stanford.edu/
![Page 69: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/69.jpg)
69
The Folding @ Home initiative
![Page 70: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/70.jpg)
70
Folding @ Home: Results
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000experimental measurement
(nanoseconds)
Pre
dic
ted
fo
ldin
g t
ime
(nan
ose
con
ds)
PPA
alpha helix
betahairpin
villinExperiments:
villin: Raleigh, et al, SUNY, Stony Brook
BBAW:Gruebele, et al, UIUC
beta hairpin: Eaton, et al, NIH
alpha helix: Eaton, et al, NIH
PPA: Gruebele, et al, UIUC
BBAW
http://pande.stanford.edu/
![Page 71: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/71.jpg)
71
Protein Structure Prediction
DECOYS:Generate a large numberof possible shapes
DISCRIMINATION:Select the correct, native-like fold
Need good decoy structures Need a good energy function
![Page 72: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/72.jpg)
72
The CASP experiment
• CASP= Critical Assessment of Structure Prediction
• Started in 1994, based on an idea from John Moult(Moult, Pederson, Judson, Fidelis, Proteins, 23:2-5 (1995))
• First run in 1994; now runs regularly every second year (CASP6 was held last december)
![Page 73: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/73.jpg)
73
The CASP experiment: how it works
1) Sequences of target proteins are made available to CASP participantsin June-July of a CASP year
- the structure of the target protein is know, but not yet releasedin the PDB, or even accessible
2) CASP participants have between 2 weeks and 2 months over thesummer of a CASP year to generate up to 5 models for each of thetarget they are interested in.
3) Model structures are assessed against experimental structure
4) CASP participants meet in December to discuss results
![Page 74: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/74.jpg)
74
CASP Statistics
2896516687CASP6
2290917567CASP5
515011143CASP4
12566143CASP3
9477242CASP2
1003533CASP1
# of 3D models
# of predictors
# of TargetsExperiment
![Page 75: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/75.jpg)
75
CASP
Three categories at CASP
- Homology (or comparative) modeling
- Fold recognition
- Ab initio prediction
CASP dynamics:
- Real deadlines; pressure: positive, or negative?
- Competition?
- Influence on science ?
Venclovas, Zemla, Fidelis, Moult. Assessment of progress over the CASP experiments. Proteins, 53:585-595 (2003)
![Page 76: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/76.jpg)
76
Ab initio prediction of protein structure – concept • Go from sequence to structure by sampling the conformational space in a reasonable
manner and select a native-like conformation using a good discrimination function
• Problems: conformational space is astronomical, and it is hard to design functions thatare not fooled by non-native conformations (or “decoys”)
![Page 77: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/77.jpg)
77
Ab initio prediction of protein structuresample conformational space such that
native-like conformations are found
astronomically large number of conformations5 states/100 residues = 5100 = 1070
select
hard to design functionsthat are not fooled by
non-native conformations(“decoys”)
![Page 78: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/78.jpg)
78
Sampling conformational space – continuous approaches• Most work in the field
- Molecular dynamics- Continuous energy minimisation (follow a valley)- Monte Carlo simulation- Genetic Algorithms
• Like real polypeptide folding process
• Cannot be sure if native-like conformations are sampled
energy
![Page 79: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/79.jpg)
79
Molecular dynamics
• Force = -dU/dx (slope of potential U); acceleration, m a(t) = force
• All atoms are moving so forces between atoms are complicated functions of time
• Analytical solution for x(t) and v(t) is impossible; numerical solution is trivial
• Atoms move for very short times of 10-15 seconds or 0.001 picoseconds (ps)
x(t+Δt) = x(t) + v(t)Δt + [4a(t) – a(t-Δt)] Δt2/6
v(t+Δt) = v(t) + [2a(t+Δt)+5a(t)-a(t-Δt)] Δt/6
Ukinetic = ½ Σ mivi(t)2 = ½ n KBT
• Total energy (Upotential + Ukinetic) must not change with time
new position
old position
new velocity
old velocity
acceleration
acceleration
old velocity
n is number of coordinates (not atoms)
![Page 80: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/80.jpg)
80
Energy minimisation• For a given protein, the energy depends on thousands of x,y,z Cartesian atomic
coordinates; reaching a deep minimum is not trivial
• With convergence, we have an accurate equilibrium conformation and a well-definedenergy value
energy
number of steps deep minimum
starting conformation
steepest descent
conjugate gradient
energy
number of steps
give up
converge
RMSD
![Page 81: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/81.jpg)
81
Monte Carlo simulation• Discrete moves in torsion or cartesian conformational space
• Evaluate energy after every move and compare to previous energy (ΔE)
• Accept conformation based on Boltzmann probability:
• Many variations, including simulated annealing (starting with a high temperature somore moves are accepted initially and then cooling)
• If run for infinite time, simulation will produce a Boltzmman distribution
⎟⎠⎞
⎜⎝⎛ −
∝kTΔEexpP
![Page 82: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/82.jpg)
82
Genetic Algorithms• Generate an initial pool of conformations
• Perform crossover and mutation operations on this set to generate a much larger pool ofconformations
• Select a subset of the fittest conformations from this large pool
• Repeat above two steps until convergence
![Page 83: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/83.jpg)
83
Sampling conformational space – exhaustive approachesenumerate all possible conformations
view entire space (perfect partition function)
computationally intractable:5 states/100 residues = 5100 = 1070 possible conformations
select
must use discrete statemodels to minimise
number of conformationsexplored
![Page 84: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/84.jpg)
84
Scoring/energy functions• Need a way to select native-like conformations from non-native ones
• Physics-based functions: electrostatics, van der Waals, solvation, bond/angle terms
• Knowledge-based scoring functions: derive information about atomic properties from adatabase of experimentally determined conformations; common parametres includepairwise atomic distances and amino acid burial/exposure.
![Page 85: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/85.jpg)
85
Requirements for sampling methods and scoring functions• Sampling methods must produce good decoy sets that are comprehensive and includeseveral native-like structures
• Scoring function scores must correlate well with RMSD of conformations (the betterthe score/energy, the lower the RMSD)
![Page 86: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/86.jpg)
86
Protein StructurePrimary (Sequence)
Secondary (Helix/Strand/Coil)and lack of structure (disorder)
Quaternary (Complexes)Domain and Tertiary (Fold)
IVGGYTCAANSIPYQVSLNSGSHFCGGSLINSQWVVSAAHCYKSRIQVRLGEHNIDVLEGNEQFINAAKIITHPNFNGNTL...
http://bioinf.cs.ucl.ac.uk
![Page 87: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template](https://reader031.vdocuments.us/reader031/viewer/2022030509/5ab814937f8b9ad5338c67a0/html5/thumbnails/87.jpg)
87
Computational Aspects of Structural Genomics
D. ab initio prediction
C. fold recognition
*
*
*
*
*
*
*
*
*
*
B. comparative modelingA. sequence space
*
*
*
*
*
*
*
*
*
*
*
*
E. target selection
targets
F. analysis
**
(Figure idea by Steve Brenner.)