3d structure prediction
Post on 11-Jan-2016
67 Views
Preview:
DESCRIPTION
TRANSCRIPT
30-03-2006 Doctorado UAMAna Rojas
1
3D STRUCTURE 3D STRUCTURE PREDICTIONPREDICTION
30-03-2006 Doctorado UAMAna Rojas
2
A Long, Long Time Ago…A Long, Long Time Ago…Amino acids started to
make complex
structures and life has appeared...
INTRODUCTION
30-03-2006 Doctorado UAMAna Rojas
3
3.5 billion years later…(Around today…)
• We know more proteins sequences than we knew at the beginning of life (and even in the 1970’s!).
• In some cases we know their function.
• And rarely do we know their structure
INTRODUCTION
30-03-2006 Doctorado UAMAna Rojas
4
Current methods to predict protein structure
Schema
1D 2D 3D 4D
Additional info
Ab Initio
No Ab-Initio
-molecular dynamics-Energy minimization
correlatedmutations
2nd pred
2nd pred.-homology modeling-threading
AAVLYFGREDHTLLVY
AAVLYFGREDHTLLVYAA
VLY
FG
RED
HTLLV
Y
-docking
-filtered docking
Secondary ------ Tertiary QuaternaryStructural level
30-03-2006 Doctorado UAMAna Rojas
5
Secondary structure prediction is the first step in the proteinfolding prediction.
Why to Predict Secondary Why to Predict Secondary Structure?Structure?
Introduction
Predicted secondary structure can be used to help identifying protein function - by searching for similar secondary structural motifs.
Good secondary structure prediction is also useful in fold detection. The best fold recognition methods use a combination of sequence profiles and prediction secondary structure.
30-03-2006 Doctorado UAMAna Rojas
6
WHAT IS PREDICTING WHAT IS PREDICTING SECONDARY STRUCTURE ???SECONDARY STRUCTURE ???
To predict the alpha-beta-loop arrangementarrangement of a protein from aa sequence
Introduction
30-03-2006 Doctorado UAMAna Rojas
7
Why Such a Why Such a shift?shift?Sequencing DNA is easy= 1-2 days
Experimental determination of a protein is difficult= 1-3 years
Small targets
30-03-2006 Doctorado UAMAna Rojas
8
PDB
30-03-2006 Doctorado UAMAna Rojas
9
PDB file 1CRN.txt
30-03-2006 Doctorado UAMAna Rojas
10
PDB RECORD DISSECTION (OVERVIEW OF DESCRIPTORS)Protein Data Bank: PDB format
Section Record Name
Title (Summary descriptive remarks)
HEADER, TITLE, COMPND,SOURCE, KEYWDS, EXPDTA, AUTHOR,JRNL
Remark (Bibliography, annotations)
REMARKS 1, 2, 3, and others
Primary structure (sequence, databases)
DBREF, SEQRES, MODRES
Heterogen (non-standard groups) HET, HETNAM, FORMUL
Connectivity annotation SSBOND, LINK, HYDBND, SLTBRG, CISPEP
Miscellaneaous features, Crystalographic
SITE, CRYST1
Coordinate transformation ORIGXn, SCALEn, MTRIXn, TVECT
Coordinate (atomic coordinate data)
MODEL, ATOM, TER, HETATM, ENDMDL
Connectivity CONECT
Book keeping (Summary information)
MASTER. END
For a complete descritpion see: ftp:/ftp.rcsb.org/pub/pdb/doc/format_descritpions/Contests_Guide_21.txt
30-03-2006 Doctorado UAMAna Rojas
11
HEADER PLANT SEED PROTEIN 30-APR-81 1CRN 1CRND 1
COMPND CRAMBIN 1CRN 4
SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED 1CRN 5AUTHOR W.A.HENDRICKSON,M.M.TEETER 1CRN 6
REVDAT 5 16-APR-87 1CRND 1 HEADER 1CRND 2REVDAT 4 04-MAR-85 1CRNC 1 REMARK 1CRNC 1REVDAT 3 30-SEP-83 1CRNB 1 REVDAT 1CRNB 1REVDAT 2 03-DEC-81 1CRNA 1 SHEET 1CRNB 2REVDAT 1 28-JUL-81 1CRN 0 1CRNB 3
Filename=accession number=PDB code1)Filename is 4 positions (often 1 digit & 3 letters, i.e.: 1CRN)2)Be aware: 0HKY means entry HKY does not contain coordinates
PDB RECORD (1)
Header: Describes molecule & gives deposition date
CMPND: Name of the molecule
Source: organism
Revision Date
30-03-2006 Doctorado UAMAna Rojas
12
HELIX 1 H1 ILE 7 PRO 19 1 3/10 CONFORMATION RES 17,19 1CRN 55HELIX 2 H2 GLU 23 THR 30 1 DISTORTED 3/10 AT RES 30 1CRN 56SHEET 1 S1 2 THR 1 CYS 4 0 1CRNA 4SHEET 2 S1 2 CYS 32 ILE 35 -1 1CRN 58TURN 1 T1 PRO 41 TYR 44 1CRN 59
CRYST1 40.960 18.650 22.520 90.00 90.77 90.00 P 21 2 1CRN 63ORIGX1 1.000000 0.000000 0.000000 0.00000 1CRN 64ORIGX2 0.000000 1.000000 0.000000 0.00000 1CRN 65ORIGX3 0.000000 0.000000 1.000000 0.00000 1CRN 66SCALE1 .024414 0.000000 -.000328 0.00000 1CRN 67SCALE2 0.000000 .053619 0.000000 0.00000 1CRN 68SCALE3 0.000000 0.000000 .044409 0.00000 1CRN 69
SSBOND 1 CYS 3 CYS 40 1CRN 60SSBOND 2 CYS 4 CYS 32 1CRN 61SSBOND 3 CYS 16 CYS 26 1CRN 62
PDB RECORD (2)
HELIX/SHEET/TURN: Secondary structure elements as provided by
crystallographer (subjective)
Disulfide- bridges
CRYST1, ORIGX1, ORIGX2, ORIGX3, SCALE1, SCALE2, SCALE3 : crystallographic parameters!
30-03-2006 Doctorado UAMAna Rojas
13
ATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 1CRN 70ATOM 2 CA THR 1 16.967 12.784 4.338 1.00 10.80 1CRN 71ATOM 3 C THR 1 15.685 12.755 5.133 1.00 9.19 1CRN 72ATOM 4 O THR 1 15.268 13.825 5.594 1.00 9.85 1CRN 73ATOM 5 CB THR 1 18.170 12.703 5.337 1.00 13.02 1CRN 74ATOM 6 OG1 THR 1 19.334 12.829 4.463 1.00 15.06 1CRN 75ATOM 7 CG2 THR 1 18.150 11.546 6.304 1.00 14.23 1CRN 76ATOM 8 N THR 2 15.115 11.555 5.265 1.00 7.81 1CRN 77ATOM 9 CA THR 2 13.856 11.469 6.066 1.00 8.31 1CRN 78ATOM 10 C THR 2 14.164 10.785 7.379 1.00 5.80 1CRN 79ATOM 11 O THR 2 14.993 9.862 7.443 1.00 6.94 1CRN 80ATOM 12 CB THR 2 12.732 10.711 5.261 1.00 10.32 1CRN 81ATOM 13 OG1 THR 2 13.308 9.439 4.926 1.00 12.81 1CRN 82ATOM 14 CG2 THR 2 12.484 11.442 3.895 1.00 11.90 1CRN 83ATOM 15 N CYS 3 13.488 11.241 8.417 1.00 5.24 1CRN 84ATOM 16 CA CYS 3 13.660 10.707 9.787 1.00 5.39 1CRN 85
ATOM 324 CG ASN 46 12.538 4.304 14.922 1.00 7.98 1CRN 393ATOM 325 OD1 ASN 46 11.982 4.849 15.886 1.00 11.00 1CRN 394ATOM 326 ND2 ASN 46 13.407 3.298 15.015 1.00 10.32 1CRN 395ATOM 327 OXT ASN 46 12.703 4.973 10.746 1.00 7.86 1CRN 396TER 328 ASN 46 1CRN 397
ATOM: one line for each atom with its unique name and its, x, y, z, coordinates
The TERM record terminates the amino acid chain
PDB RECORD (3)
30-03-2006 Doctorado UAMAna Rojas
14
TODAY:
Programs that take coordinate files (mostly PDB-format) and process those toa graphical display output
Some have stereo view options and support animation
ViewersHEADER LIGAND BINDING PROTEIN 02-MAR-00 1EJE TITLE CRYSTAL STRUCTURE OF AN FMN-BINDING PROTEIN COMPND MOL_ID: 1; COMPND 2 MOLECULE: FMN-BINDING PROTEIN; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: METHANOBACTERIUM THERMOAUTOTROPHICUM; SOURCE 5 EXPRESSION_SYSTEM_PLASMID: PET15B KEYWDS FMN-BINDING PROTEIN, STRUCTURAL GENOMICS EXPDTA X-RAY DIFFRACTION AUTHOR D.CHRISTENDAT,V.SARIDAKIS,A.BOCHKAREV,C.ARROWSMITH, AUTHOR 2 A.M.EDWARDS REVDAT 2 15-AUG-01 1EJE 1 HEADER KEYWDS REVDAT 1 11-OCT-00 1EJE 0 JRNL AUTH D.CHRISTENDAT,A.YEE,A.DHARAMSI,Y.KLUGER, JRNL AUTH 2 A.SAVCHENKO,J.R.CORT,V.BOOTH,C.D.MACKERETH, JRNL AUTH 3 V.SARIDAKIS,I.EKIEL,G.KOZLOV,K.L.MAXWELL,N.WU, JRNL AUTH 4 L.P.MCINTOSH,K.GEHRING,M.A.KENNEDY,A.R.DAVIDSON, JRNL AUTH 5 E.F.PAI,M.GERSTEIN,A.M.EDWARDS,C.H.ARROWSMITH JRNL TITL STRUCTURAL PROTEOMICS OF AN ARCHAEON JRNL REF NAT.STRUCT.BIOL. V. 7 903 2000 JRNL REFN ASTM NSBIEW US ISSN 1072-8368 REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.2 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. ATOM 1 N GLY A 1 54.915 15.553 3.252 1.00 26.12 N ATOM 2 CA GLY A 1 54.219 16.804 3.668 1.00 23.30 C ATOM 3 C GLY A 1 54.870 18.009 3.019 1.00 25.07 C ATOM 4 O GLY A 1 55.848 17.853 2.295 1.00 26.88 O ATOM 5 N SER A 2 54.330 19.202 3.252 1.00 22.48 N ATOM 6 CA SER A 2 54.918 20.404 2.680 1.00 25.55 C ATOM 7 C SER A 2 56.202 20.683 3.460 1.00 26.03 C ATOM 8 O SER A 2 56.308 20.321 4.632 1.00 22.51 O ATOM 9 CB SER A 2 53.973 21.594 2.828 1.00 27.30 C …. etc etc
30-03-2006 Doctorado UAMAna Rojas
15
Main 3D structure based databases I
:mostly manual, uses CE structure similarity program to decide whether two structures are similar.
CATH: uses structure similarity program SSAP
DALI: uses structure similarity program FSSP
PDB
SCOP
30-03-2006 Doctorado UAMAna Rojas
16
The rate of new sequences is growing exponentially relative to the rate of proteinstructures being solved!
30-03-2006 Doctorado UAMAna Rojas
17
WHERE ARE THE WHERE ARE THE STRUCTURES ???STRUCTURES ???
30-03-2006 Doctorado UAMAna Rojas
18
How could we fill the gap How could we fill the gap between the number of known between the number of known
sequences and known sequences and known structures?structures?
Structural Genomics Structural Genomics Initiative: JCSGInitiative: JCSG
30-03-2006 Doctorado UAMAna Rojas
19
30-03-2006 Doctorado UAMAna Rojas
20
30-03-2006 Doctorado UAMAna Rojas
21
30-03-2006 Doctorado UAMAna Rojas
22
How could we fill the gap between the How could we fill the gap between the number of known sequences and number of known sequences and
known structures?known structures?
oror
Structural Genomics Structural Genomics Initiative: JCSGInitiative: JCSG
Predicting MethodsPredicting Methods
Gaeta, Italy
30-03-2006 Doctorado UAMAna Rojas
23Russell:http://speedy.embl-heidelberg.de/gtsp/flowchart2.html
Structural predictionflowchart
30-03-2006 Doctorado UAMAna Rojas
24
Relationship between sequence and structural similarity
Chotia & Lesk, 1986
%id seq. => same 3D (for sure) %id seq. => sometimes same str.
sometimes not }depends on the length of thealigned region.
30-03-2006 Doctorado UAMAna Rojas
25
SIMPLIFIED PROTEIN STRUCTURE PREDICTIONSIMPLIFIED PROTEIN STRUCTURE PREDICTIONFLOW CHARTFLOW CHART
EXPERIMENTALSEQUENCE
DATABASESEARCHING
STRUCTUREHOMOLOG
YESHOMOLOGYMODELING
SECONDARYSTRUCTUREPREDICTION
NO
FOLD PREDICTION“THREADING”
FINAL STRUCTURE???
30-03-2006 Doctorado UAMAna Rojas
26
A target sequence (without a known structure)
Looking for a template *
WHY HOMOLOGY MODELING? Useful to infer function
Structure changes less than sequence in evolution!
Comparative modeling can generate models with <2A r.s.m.d
30-03-2006 Doctorado UAMAna Rojas
27
Sometimes it’s not so easy to find a template…
30-03-2006 Doctorado UAMAna Rojas
28
…or to make a good alignment…
?
Template
Target
Template
Target
A GOOD ALIGNMENT IS THE CRITICAL STEP!
30-03-2006 Doctorado UAMAna Rojas
29
SOME HISTORYSOME HISTORY
First model : LACTALBUMIN.
TEMPLATE: lysozyme. (structure will come in 1989)
1990’s expansion of modeling
Nowadays: if >40% of seq. identity it is possible to make modelscomparable to X-ray level low resolution!.
How many sequences can be modeled? Mostly up to a quarter of all availablesequences!
30-03-2006 Doctorado UAMAna Rojas
30
MODELLING STEPSMODELLING STEPS
1.- identify a suitable structural template
2.-Align and select the templates for modeling
3.- Build the model
4.- Evaluation of the model
N iterations to improvethe model
30-03-2006 Doctorado UAMAna Rojas
31
STEP 1: IDENTIFYING THE STRUCTURAL STEP 1: IDENTIFYING THE STRUCTURAL TEMPLATETEMPLATE
Database searches using BLAST or similar algorithms(more than one template is recommended)
When similarity is between 25-30% identity additionaldetections methods are required
30-03-2006 Doctorado UAMAna Rojas
32
?
Template
Target
Template
Target
A GOOD ALIGNMENT IS THE CRITICAL STEP!
Sequence search methods are biased towards seq. evolutiontherefore are not always optimal for modeling purposes
STEP2: ALIGNMENTSTEP2: ALIGNMENT
30-03-2006 Doctorado UAMAna Rojas
33
DVSHCIQETVESVGF---------NVIRDYVDVGEAIQEVMESYEVEIDNVIYQVKPIRNLN
DVSHCIQETVESVGF---NVI------RDYVDVGEAIQEVMESYEVEIDNVIYQVKPIRNLN
another example…….
if looks good in structure it should be like:
STEP2: ALIGNMENT (I)STEP2: ALIGNMENT (I)
30-03-2006 Doctorado UAMAna Rojas
34
PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS TEMPLATE PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS TARGET (ALIGNMENT 1) PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS TARGET (ALIGNMENT 2)
"Alignment 1" is chosen because of the PROs at position 7. But 10 Angstrom gap is too big to close.
?
STEP2: ALIGNMENT (II)STEP2: ALIGNMENT (II)
30-03-2006 Doctorado UAMAna Rojas
35
STEP 3: MODEL BUILDINGSTEP 3: MODEL BUILDING
•RIGID BODY ASSEMBLY:
Fit the query seq into this frame.
Align template structures and create a“consensus” frame (average of Ca in core regions)
Needs high sequence similarities
Caveats: with dissimilar sequences models areusually wrong (espe. deletion and insertion regions)
30-03-2006 Doctorado UAMAna Rojas
36
STEP 3: MODEL BUILDING (I)STEP 3: MODEL BUILDING (I)
•SEGMENT MATCHING:
Calculates conservation of positions in templates.Then calculates coordinates based on those.
•SATISFACTION OF SPATIAL RESTRAINS:
Satisfies spatial restrains between templates and query using:
distance geometryoptimisation
30-03-2006 Doctorado UAMAna Rojas
37
Backbone generation EASY
Gap filling: if <3 residues is easy to fix (this size allows few configurations)
Canonical Loop generation: common loops, can be modeled form libraries.
Side Chain generation
Ab Initio loop building
Model optimisation
STEP 3: MODEL BUILDING (II)STEP 3: MODEL BUILDING (II)
30-03-2006 Doctorado UAMAna Rojas
38
WHAT ABOUT THE SIDE CHAINS?WHAT ABOUT THE SIDE CHAINS?
Difficult! Several possible conformations!
Those are restricted to certainrotamers
What is known about rotamers?
Side chain rotamers of conserved residues are themselves conserved
Side chain replacements then focus on non-conserved regions
There are extensive databases and libraries of rotamers
STEP 3: MODEL BUILDING (III)STEP 3: MODEL BUILDING (III)
30-03-2006 Doctorado UAMAna Rojas
39
HOW TO MODEL ROTAMERS?HOW TO MODEL ROTAMERS?
Caveats: when many side chains need to be replaced… how can I chose the first?
Take a model and don’t use the conserved regions (no Gly or PRO) and replace others with Alanine.
Side chains are then replaced in order of decreasing rotameric entropy
Residues with very narrow rotamer distribution are built first. Otherwise are replaced when there are less degrees of freedom!
STEP 3: MODEL BUILDING (IV)STEP 3: MODEL BUILDING (IV)
30-03-2006 Doctorado UAMAna Rojas
40
STEP 4: MODEL EVALUATIONSTEP 4: MODEL EVALUATION
This step is crucial in the whole process!
Several programs evaluate the models:
Solv_Pref: Computes solvent exposure of the model. Negative values indicate structural stability
ProSA (Sippl): Based on potentials extracted from databases. Good models have low energies
PSQS: http://www1.jcsg.org/psqs/psqs.cgi)
An energy like measurement. It is calculated on the statistical potentials of mean force describing interactions between residue pairs and between single residues and solvent. Values approaching -0.2 are ok.
GLU71GLU70
GLU67
LYS64ASP32
LYS76
30-03-2006 Doctorado UAMAna Rojas
41
Energy-like measure
1. Contacts between amino acids
2. Burial status of amino acid
3. Secondary structure
Some structural features of proteins are overrepresented or underrepresented in known protein structures:
STEP 4: MODEL EVALUATION (I) STEP 4: MODEL EVALUATION (I)
30-03-2006 Doctorado UAMAna Rojas
42
Evaluate models built on alternative alignments with energy-like measures??
Modeling program
Good or bad?
-0.212 [d. a. f. u.]
GOOD MODEL! *
STEP 4: MODEL EVALUATION (III) STEP 4: MODEL EVALUATION (III)
30-03-2006 Doctorado UAMAna Rojas
43
Energy-like measure again
An energy like measure is based on statistics and, in fact, gives a hint if your structure is similar to A typical protein structure or not.
In the previous example:
Value of -0.212 is OK if the average in PDB is -0.278. But it’s still statistics….
STEP 4: MODEL EVALUATION (IV) STEP 4: MODEL EVALUATION (IV)
30-03-2006 Doctorado UAMAna Rojas
44
… … HOWEVERHOWEVER
Backbones might have conformational changes: see below backbone bending
2bb2and 1amm
30-03-2006 Doctorado UAMAna Rojas
45
PDB is a mess !PDB is a mess !
PDB files have some missing atoms, unsolved parts of structures, do not start from AA 1, several atoms with the same number, several structures with the same chain ID, are not consecutively numbered...
For example, to automate things in the case of Modeller, it is needed to have correct a alignment... otherwise:
“Alignment sequence not found in PDB file”
… … ANDAND
30-03-2006 Doctorado UAMAna Rojas
46
SWISS-MODEL - www.expasy.ch/swissmod/SWISS-MODEL.html
An automated comparative modelling server (ExPASy, CH)
CPHmodels - www.cbs.dtu.dk/services/CPHmodels/
Server using homology modelling (BioCentrum, Denmark)
SDSC1 - cl.sdsc.edu/hm.html
Protein structure homology modeling server (San Diego, USA)
3D-JIGSAW - www.bmm.icnet.uk/servers/3djigsaw/
Automated system for 3D models for proteins (Cancer Research UK)
WHATIF - www.cmbi.kun.nl/gv/servers/WIWWWI/
WHAT IF Web interface: homology modelling, drug docking, electrostatics calculations, structure validation and visualisation.
HOMOLOGY MODELING SERVERSHOMOLOGY MODELING SERVERS
30-03-2006 Doctorado UAMAna Rojas
47
BIOTECH Validation Suite - biotech.ebi.ac.uk:8400/
An evaluation suite that uses three widely available validation programs (PROCHECK, PROVE and WHAT IF)
Verify3D - www.doe-mbi.ucla.edu/Services/Verify_3D/
A tool designed to help in the refinement of crystallographic structures. It also provides a visual analysis of model quality.
Loops Database - www.bmm.icnet.uk/loop/
A table of five protein loop classes. (Cancer Research UK)
HOMOLOGY MODELING SERVERSHOMOLOGY MODELING SERVERS
30-03-2006 Doctorado UAMAna Rojas
48
SIMPLIFIED PROTEIN STRUCTURE PREDICTIONSIMPLIFIED PROTEIN STRUCTURE PREDICTIONFLOW CHARTFLOW CHART
EXPERIMENTALSEQUENCE
FINAL STRUCTURE???
DATABASESEARCHING
STRUCTUREHOMOLOGSECONDARY
STRUCTUREPREDICTION
NO YESHOMOLOGYMODELING
FOLD PREDICTION“THREADING”
30-03-2006 Doctorado UAMAna Rojas
49
Homology Modelling vs Fold Detection
Fold Detection Homology Modelling
% seq. ID
0 30 100
Approach
Model Quality
Any Sequence?? >= 30-50% IDwith template
Fold Level Atomic Level
The best method of determining 3D structure is to base the model you make on a known structure.
If your sequence is sufficiently similar (>30-50% identity) you could generate an all atom model by homology modelling.
Target Sequence
25%: “twilight zone”
30-03-2006 Doctorado UAMAna Rojas
50
THREADING
eg. FFAS03GenThreader
FOLD RECOGNITION
eg HMM
FOLD DETECTION
BLAST, FASTA
Fold recognition: distant/no clearhomology
Alignment of sequences tostructures asin THREADER(Jones et al. 1992)
CAPABLE TO DETECT VERY DISTANT HOMOLOGY(WHEN SEQUENCE-BASED METHODS FAIL)
30-03-2006 Doctorado UAMAna Rojas
51
CAPABLE TO DETECT VERY DISTANT HOMOLOGY (WHEN SEQUENCE-BASED METHODS FAIL)
FOLD RECOGNITION
FFAS03example
30-03-2006 Doctorado UAMAna Rojas
52
Find out the real structure with prediction methods
FIT SEQUENCES INTO STRUCTURES AND FIND THE BEST MATCH
FOR GOOD MATCH: HYDROPHOBIC BURIED AND POLARS EXPOSED
WE ASSUME THAT THE NATIVE PROTEIN CONFORMATION REALLY ISA FREE ENERGY MINIMUM!
However, some native conformations may only be of low energy becausepf prosthetic groups or unusual interactions.
30-03-2006 Doctorado UAMAna Rojas
53
FOLD RECOGNITION
BIOLOGIST’s APPROACH:
If seq 1 is similar to seq2 then structure 1 is similar to structure2and there is probably an evolutionary explanation!
PHYSICIST’s APPROACH:
Proteins form structures according to fundamental rules that we calle energies or free energies!
Quoted from: Protein Structure Prediction, Huber & Torda.
30-03-2006 Doctorado UAMAna Rojas
54
WHAT IS THREADING?
To fit a structure into a sequence!To fit a structure into a sequence!
30-03-2006 Doctorado UAMAna Rojas
55
Suboptimal alignmentsOptimal alignments
S1
S2
S3S4S5
Sheet helix
QUERY TO STRUCTURE ALIGNMENT
30-03-2006 Doctorado UAMAna Rojas
56
QUERY TO STRUCTURE ALIGNMENT I
query sequence
Structure template
ALIGNMENT (threading): covering of segments of the query sequence by template blocks!
A threading is completely determined by the starting positions of the blocks
30-03-2006 Doctorado UAMAna Rojas
57
QUERY TO STRUCTURE ALIGNMENT RULES
query sequence
Structure template
The blocks preserve their order
The blocks DO NOT OVERLAP
There is NO GAPS in the blocks!
30-03-2006 Doctorado UAMAna Rojas
58
The General Principle I
1. Library of protein structures (fold library) • all known structures• representative subset (seq. similarity
filters) • structural cores with loops removed
2. Binary alignment algorithm with Scoring functioncontact potentialenvironmentsOthers…..
Instead of aligning a sequence to a sequence, align strings of descriptors that represent 3D structural features.Usual Dynamic Programming: score matrix relates two amino acids
Threading Dynamic Programming: relates amino acids to environments in 3D structure
3. Method for generating models via alignments
ALMVWTGH.........
....
....
....
....
The General Principle I
30-03-2006 Doctorado UAMAna Rojas
59
S T
Blocki=1 i=2 i=3 i=4 i=5 i=6
j=1
j=2
j=3
j=4
Position
Each possible threading corresponds to a path from S to T in thegraph and vice-versa
The RED path corresponds to the threading (1,2,2,3,4,4)
THE KEY IS TO FIND THE SHORTEST PATH FROM S TO T=dynamic programming!!!
The GREEN path corresponds to the threading (1,4,1,4,1,4)
30-03-2006 Doctorado UAMAna Rojas
60
FOLD RECOGNITION
ji i + 1
LC i,j,i+1,l
F(i,j)
F(i +1,L)
F(i + 1, L)=min { F(i,j) + Ci,j,i+1,l } j=1,…,L
DYNAMIC PROGRAMMING:
30-03-2006 Doctorado UAMAna Rojas
61
FOLD RECOGNITION
ST
Blocki=1 i=2 i=3 i=4 i=5 i=6
j=1
j=2
j=3
j=4
Position
C1122
C2232
C3243
C4354
C5464
30-03-2006 Doctorado UAMAna Rojas
62
What are the scoring functionsWhat are the scoring functions in Fold recognition?in Fold recognition?
•Pair potentialsPair potentials
•Solvation energySolvation energy
•Consistency between real and predicted Consistency between real and predicted secondarysecondarystructure and accessibilitystructure and accessibility
•Structural environmentsStructural environments
•Sequence profilesSequence profiles
30-03-2006 Doctorado UAMAna Rojas
63
Threading 1D predictions (and accessibility)-Threading 1D predictions (and accessibility)-into 3D structures: compatibility based on into 3D structures: compatibility based on dynamic programmingdynamic programming
Approach 0: Approach 0:
F.R. by Threading: essential componentsF.R. by Threading: essential components
i.e.:TOPITS
30-03-2006 Doctorado UAMAna Rojas
64Rost, 1995Threading
Predicted 1D structure profile isaligned by dynamic programming( MaXHOM) to 1D assigned structures by DSSP.
INPUT
Secondary structure pred
30-03-2006 Doctorado UAMAna Rojas
65Kelley et al., 2000http://www.bmm.icnet.uk/~3dpssm
30-03-2006 Doctorado UAMAna Rojas
66
30-03-2006 Doctorado UAMAna Rojas
67
what are the scoring functions in Fold recognition?what are the scoring functions in Fold recognition?
•Pair potentialsPair potentials
•Solvation energySolvation energy
•Consistency between real and predicted Consistency between real and predicted secondarysecondarystructure and accessibilitystructure and accessibility
•Structural environmentsStructural environments
•Sequence profilesSequence profiles
30-03-2006 Doctorado UAMAna Rojas
68
Sequence-structure compatibility Sequence-structure compatibility function based on pairwise potentialsfunction based on pairwise potentials
e.g.Sipple.g.Sippl
Approach I:Approach I:
F.R. by Threading:essential componentsF.R. by Threading:essential components
30-03-2006 Doctorado UAMAna Rojas
69
Count pairs of each residue type at different separations
Energy of interaction = -KT ln (frequency of interactions) Boltzmann principle
d
Eco
unts
d
Jones, 1992; Sippl, 1995
This is transformed into energies:
Caveat: energy depends on inter-residue interactions:How do you know the position of the residues?
30-03-2006 Doctorado UAMAna Rojas
70
Threading: Essential componentsThreading: Essential componentsThreading: Essential componentsThreading: Essential components
EEabab A C D E …..
A -3 -1 0 0 ..C -1 -4 1 2 ..D 0 1 5 6 ..E 0 2 6 7 ... . . . .
ACCECADAAC -3-1-4-4-1-4-3-3=-23
E = Eaibjaibj positions i,j
• structural templatestructural template
• neighbor definitionneighbor definition
• energy functionenergy function
11
22
33
44
55
66
77
1010
88
99
AA
CC
CC
EE
CC
AA
DDAA
AA
CC
FOLD RECOGNITION
30-03-2006 Doctorado UAMAna Rojas
71
What are the scoring functions in Fold recognition?What are the scoring functions in Fold recognition?
•Pair potentialsPair potentials
•Solvation energySolvation energy
•Consistency between real and predicted Consistency between real and predicted secondarysecondarystructure and accessibilitystructure and accessibility
•Structural environmentsStructural environments
•Sequence profilesSequence profiles
30-03-2006 Doctorado UAMAna Rojas
72
SCORING FUNCTIONS : STRUCTURAL ENVIRONMENTS
There are 18 environments
•An environment is:
- area of the buried side chain- fraction of side chain exposed to polar atoms- local secondary structure
•Scoring matrix: probabilities of aa/environment class
•3D profile matrix is created for each fold in a bench mark
•Target sequence is aligned with the 3D profile.
30-03-2006 Doctorado UAMAna Rojas
73
what are the scoring functions in Fold recognition?what are the scoring functions in Fold recognition?
•Pair potentialsPair potentials
•Solvation energySolvation energy
•Consistency between real and predicted Consistency between real and predicted secondarysecondarystructure and accessibilitystructure and accessibility
•Structural environmentsStructural environments
•Sequence profilesSequence profiles
30-03-2006 Doctorado UAMAna Rojas
74
SOLVATION ENERGY
How buried-like is a certain amino acid?
Calculated: frequency of ocurrence at a specific degree or residue burialto the frequency of occurrence of all other aa types with this degree ofburial
Degree of burial: ratio between solvent accessible surface areaand its overall surface area
30-03-2006 Doctorado UAMAna Rojas
75
Combination of sequence-sequenceCombination of sequence-sequenceand sequence-structure comparisonsand sequence-structure comparisons
e.g. Jonese.g. Jones
Approach II:Approach II:
F.R. by Threading:essential componentsF.R. by Threading:essential components
30-03-2006 Doctorado UAMAna Rojas
76
GenTHREADER(Jones , 1999, JMB 287:797-815)
- for each template provide MSA- align the query sequence with the MSA* assess the alignment by sequence alignment score* assess the alignment by pairwise potentials* assess the alignment by solvation function* record lengths of: alignment, query, template
30-03-2006 Doctorado UAMAna Rojas
77
Essentials of GenTHREADEREssentials of GenTHREADER
Trained 383 pairs: in each pair the fold is shared but the sequence similarity is low.
30-03-2006 Doctorado UAMAna Rojas
78
•Sequence profilesSequence profiles
what are the scoring functions in Fold recognition?what are the scoring functions in Fold recognition?
•Pair potentialsPair potentials
•Solvation energySolvation energy
•Consistency between real and predicted Consistency between real and predicted secondarysecondarystructure and accessibilitystructure and accessibility
•Structural environmentsStructural environments
30-03-2006 Doctorado UAMAna Rojas
79
PROFILE METHODSPROFILE METHODS
e.g. FFAS03 (Godzik, A)e.g. FFAS03 (Godzik, A)
Approach III:Approach III:
F.R. by Threading:essential componentsF.R. by Threading:essential components
30-03-2006 Doctorado UAMAna Rojas
80
Differences between profile-based methods (Rychlewski, et al, 2000)
PSI-BLAST
Multiple alignments: 5 iterations with 10-3 evalue tresholdProfile: Preclustering with 98% cutoff, pseudocount based onvariability estimation-background aminoacid frequenciesDatabase: NR
PDB-BLAST Multiple alignment: same as PSI-BlastProfile: same as PSI-BlastDatabase: PDB database
BASIC Multiple alignment: 2 PSI-Blast it. with 0.1 e-value thresholdProfile: preclustering with 97% id cutoff;
amino-acid composition filter, distant homologues have smaller weights
Database: profiles of proteins from PDB
FFAS/FFAS03 Multiple alignment: same as PSI-BlastProfile: preclustering with 97% id cutoff; amino-acid composition
filter, sequence diversity based weightDatabase: profiles of proteins from PDB
30-03-2006 Doctorado UAMAna Rojas
81
Homology Modelling vs Fold Detection
Fold Detection Homology Modelling
% seq. ID
0 30 100
Approach
Model Quality
Any Sequence?? >= 30-50% IDwith template
Fold Level Atomic Level
The best method of determining 3D structure is to base the model you make on a known structure.
If your sequence is sufficiently similar (>30-50% identity) you could generate an all atom model by homology modelling.
Target Sequence
25%: “twilight zone”
30-03-2006 Doctorado UAMAna Rojas
82
Baker & Sali,Science 2001.
30-03-2006 Doctorado UAMAna Rojas
83
OKAY! I ‘VE GOT A FOLD, NOW WHAT?
30-03-2006 Doctorado UAMAna Rojas
84
COMBINING ADDITIONAL INFORMATION
30-03-2006 Doctorado UAMAna Rojas
85
2004
1996
2000
2002
1998
Critical Assessment of Techniques for Protein Structure Prediction
ASILOMAR, USA GAETA, ITALY
30-03-2006 Doctorado UAMAna Rojas
86
30-03-2006 Doctorado UAMAna Rojas
87http://maple.bioc.columbia.edu/eva/
LARGE SCALE BENCHMARKING PROJECTS
EVA/LiveBench
30-03-2006 Doctorado UAMAna Rojas
88
30-03-2006 Doctorado UAMAna Rojas
89
LIVEBENCH
30-03-2006 Doctorado UAMAna Rojas
90
30-03-2006 Doctorado UAMAna Rojas
91
QUERY
locate prot: BLAST
any similarity with a known protein?BLAST against PDB
YES
NO
search and align core loops ....
homology modeling serversSWISSMODEL/WHATIF
dom1 dom2 ...
model3D
¿Any domains?- experimental- PFAM/ProDom/InterPro- BLAST^^^
Model Evaluation- ProSa- Biotech suite
Full 3D model
Threadingservers:3DPSSMSAMT99...
model 1model 2model 3model 4.....
1D predictions:secondary struct/acc., hydrophobicity, trasnmemb.
2D: contacts......
Biology, MedLine, Swissprot, ...active sites, Mutants, functional domains,cofactors ....
Multiple alignment BLAST+ClustalW+ T-COFFEEconserved positions, correlated mutations, ...
model 1model 2
Visualization and model comparisons: Threadlizestructural classifications: FSSP, SCOP
model 3D (only C)
Canonic side chain generation MaxSprout
3D PREDICTIONS: STEPS
30-03-2006 Doctorado UAMAna Rojas
92
WHAT ABOUT PREDICTING INTERACTIONS?
30-03-2006 Doctorado UAMAna Rojas
93
Ras
Ral
Rho
RasRalRho
ranrcc1
by J.A. G-Ranea
30-03-2006 Doctorado UAMAna Rojas
94
Azuma et al., J,Mol. Biol. 1999
30-03-2006 Doctorado UAMAna Rojas
95
Complex(Model on Vomplex superposition)
Model
GDP
Mg++
D44
H78
D128
E157
R206
H270
H304
H78
H78
H410
D44GDPMg++
H270
R206
H304
E157
D128H78
H410
Green: Km, red: Kcat.
Mapping of mutants (side view)
30-03-2006 Doctorado UAMAna Rojas
96
SOME EXAMPLES:
PAAD DOMAIN (Rojas et al, 2003 Protein Science)
SPOC DOMAIN (Sanchez-Pulido et al, 2004, BMC bioinformatics)
30-03-2006 Doctorado UAMAna Rojas
97
PAAD/DAPIN/PYRIN DOMAIN:
Prediction of binding sites
Pyrin, Aim (absent in melanoma), Asc (apoptosis associated speck-likeprotein containing a Caspase recrutiment domain) and a Death domain-like (DD)
30-03-2006 Doctorado UAMAna Rojas
98
Nacht family: PAN/NALPs/DEFCAP/PYCARD,CATERPILLER(Tschopp et al, Nature, 2003)
PAAD family: MEFV/PYRIN (Pawlowski, et.al., 2001 , others)
WHERE IS THE PAAD DOMAIN?
BACKGROUND
30-03-2006 Doctorado UAMAna Rojas
99
PAAD ?
PAAD
? CARD
CARD
CARD CARD
NAC
NALP2
MATER
CARD4
NOD2
NAIP
COS1.5
CLAN
NACHT LRR’S
LRR’S
LRR’S
LRR’S
LRR’S NACHT
NACHT
NACHT
NACHT
?
CARD
LRR’S
LRR’S
LRR’S
NACHT
NACHT
NACHT
?
BIR
BIR
BIR
BIR
IF120X
PAAD
PAAD CASPASE
PAAD B-BOX Zn FINGER SPRY
IF120X PAAD IF120X
CARD
PAAD
PAAD
ASC
CASPASE ZF
PYRIN
IF16
MNDA,AIM2
ASC2
DOMAIN ARCHITECTURES
Sensors!
BACKGROUND
They connectdifferent pathways!
30-03-2006 Doctorado UAMAna Rojas
100
PAAD OF MEFV
Psi-Blast FFAS Saturatedblast
MALN=T-coffee
Trees (Bayes, NJME)
HITS
*Removal of redundancy(splicing variants)40 sequences
2nd struct. Pred(metaserver)
Pairwise-FFAS
Structural neighbours(SCOP)
JACKAL = MODELS
Minimized=CHARM
PSQS EvaluationConserved patchesIn the surface: CONSURF
Phylogeny
Modeling
METHODS
30-03-2006 Doctorado UAMAna Rojas
101
CARDDD DED
PAAD
ANCESTORAL DOMAIN
30-03-2006 Doctorado UAMAna Rojas
102
1 2 3 4 5 6
Sec.StructurePrediction
Hydrophobic core(sol. acc. area <10%maximum solv. area)
HELIX 3does not have coreresidues. In DD, and othershelix3 doesn’t pack too well
30-03-2006 Doctorado UAMAna Rojas
103
N
C
N
C
Hydrophobic core
Homology modeling of PAAD domain (MEFV from mouse)
H3 H3
30-03-2006 Doctorado UAMAna Rojas
104
ILE40
PRO41
VAL51
MET45
LYS35
LYS39
ARG49
ARG42
LYS52
180
pyrin
Charged patch
Hydrophobic patch
Pan2/NALP4
LYS48
ILE42VAL47
ALA50
PRO43
TRP44
30-03-2006 Doctorado UAMAna Rojas
105
GLU71GLU70
GLU67
LYS64ASP32
LYS76
GLU53
GLU54
LYS55
90o
+CHARGED
GLU20
ASP19
LYS23
AIM2
LYS71ARG67
LYS64180o
+ CHARGED (CONVEX)- CHARGED (CONCAVE)
IFI204
30-03-2006 Doctorado UAMAna Rojas
106
Paad is a 6 alpha helical bundle
Helix 3 is disordered Real structure 1PN5
Septiembre 2003
Released October 2003
Helix 3 is disordered
Binding patches correctly predicted
30-03-2006 Doctorado UAMAna Rojas
107
Needs partner interaction to fold properly .# Helix3 is disordered in DD/DED/CARD structures.
# PAAD_DAPIN is a vertebrate-specific domain
# PAAD from MEFV genes are the ancestral ones,sucesive duplications of the PAAD-PYR group yielded the mammalian pool
# Viral PAAD’s might mimic IFI/AIM family
# id, character and conserved patches are as divergent within PAAD, as PAAD with DED/DD/CARD=> suggest specialization for not “cross-talking”
SUMMARY
Confirmed later on by NMR (1UPC,1PN5)
# The binding interface contains at least 10 hydrophobic residues. By analogy withCARD domains, electrostatic forces are also important.
30-03-2006 Doctorado UAMAna Rojas
108
SPOC DOMAINA NOVEL DOMAIN ASSOCIATED TO CANCER
30-03-2006 Doctorado UAMAna Rojas
109
Blast tonr/uniprot90
Blast to EST’s &unfinished genomes
TO ENRICH PROFILE!
PROFILE BUILDING
Multiple alignmentT-COFFEE,MUSCLE, etc
HMMER/PSI-BLAST SEARCHES in Uniprot90
METHODS: Selecting regions first!
Query seq
30-03-2006 Doctorado UAMAna Rojas
110
Known
Known!!!
METHODS: HMMER Strategy/Intermediate searches
30-03-2006 Doctorado UAMAna Rojas
111
HMMER ANALYSES IIIMETHODS
PHD
Coiled-coil1183 aa
iso2
2256 aaiso3
614 aaiso1
NLS
SPOC: Protein-protein interaction (SanchezPulido et al, 2004)
0.083 0.05
30-03-2006 Doctorado UAMAna Rojas
112
HMMER ANALYSES IIIMETHODS
SPOC: Protein-protein interaction
Homologymodeling
iso2
RBMF_HUMAN
30-03-2006 Doctorado UAMAna Rojas
113
WHY THERE IS A FULLY R/Y CONSERVATION?
The co-activator of CREB-binding protein follows this structural schema(Xu et al, Science 2001) where a critical interaction occurs between R600 and Y640.The R is methylated causing a transcriptional switch!!
WHERE ELSE THIS IS FOUND?
30-03-2006 Doctorado UAMAna Rojas
114
ACKNOWLEDGEMENTSACKNOWLEDGEMENTS
LUIS SANCHEZ,MICHAEL TRESSFLORENCIO PAZOS, … AND REST OF PDG
top related