computational molecular biology protein structure and...
Post on 26-Jan-2020
10 Views
Preview:
TRANSCRIPT
Computational MolecularBiology
Protein Structureand
Homology Modeling
Prof. Alejandro Giorge1Dr. Francesco Musiani
Friday, March 1, 13
Sequence, function and structure relationships
v Life is the ability to metabolize nutrients, respond to external stimuli, grow, reproduce and evolve
v From a chemical point of view, proteins are linear hetero-polymers formed by amino acids (aa)
Friday, March 1, 13
Sequence, function and structure relationships
v Life is the ability to metabolize nutrients, respond to external stimuli, grow, reproduce and evolve
v From a chemical point of view, proteins are linear hetero-polymers formed by amino acids (aa)
v Proteins assume a 3D shape which is usually responsible for function
v The consequence of the tight link between structure, function and evolutionary pressure distinguish proteins from ordinary polymers
Friday, March 1, 13
Protein structure
v The sequence of amino acids is called the primary structure
v Secondary structure refers to local folding
v Tertiary structure is the arrangement of secondary elements in 3D
v Quaternary structure describes the arrangement of a protein subunits
v The peptide bond is planar and the dihedral angle it defines is almost always 180°
Friday, March 1, 13
Protein structure
v What is a dihedral angle?
Is the angle between two planes. In practice, if you have four connected atoms and you want measure the dihedral angle around the central bond, you orient the system in such a way that the two central atoms are superimposed and measure the resulting angle between the first and last atom.
Friday, March 1, 13
Protein structure
v What is a dihedral angle?
Is the angle between two planes. In practice, if you have four connected atoms and you want measure the dihedral angle around the central bond, you orient the system in such a way that the two central atoms are superimposed and measure the resulting angle between the first and last atom.
Friday, March 1, 13
Protein structure
v What is a dihedral angle?
Is the angle between two planes. In practice, if you have four connected atoms and you want measure the dihedral angle around the central bond, you orient the system in such a way that the two central atoms are superimposed and measure the resulting angle between the first and last atom.
Friday, March 1, 13
Protein structure
v The simplest arrangements of aa is the alpha-helix, a right handed spiral conformation.
v The structure repeats itself every 5.4 Å along the helix axis.
v There are 3.6 aa per turn.
O(n)-‐NH(n+4)
H-‐bond
Friday, March 1, 13
Protein structure
v The beta sheet.
v The R groups of neighboring residues in strand point in opposite directions.
v There are parallel or anti-parallel beta sheets.
Friday, March 1, 13
Protein structure
Ramchandan plot: pairs of angles that do not cause the atoms of a dipeptide to collide.
Friday, March 1, 13
Protein structure
Ramchandan plot: pairs of angles that do not cause the atoms of a dipeptide to collide.
Friday, March 1, 13
Protein structure
Friday, March 1, 13
Protein structure
Right-‐handedα-‐helix
Friday, March 1, 13
Protein structure
Right-‐handedα-‐helix
Parallelβ-‐sheet
Friday, March 1, 13
Protein structure
Right-‐handedα-‐helix
An<-‐parallelβ-‐sheet
Parallelβ-‐sheet
Friday, March 1, 13
Protein structure
Le?-‐handedα-‐helix
Right-‐handedα-‐helix
An<-‐parallelβ-‐sheet
Parallelβ-‐sheet
Friday, March 1, 13
Protein structure
Le?-‐handedα-‐helix
Right-‐handedα-‐helix
An<-‐parallelβ-‐sheet
Parallelβ-‐sheet
Collagentriple helix
Friday, March 1, 13
Protein structure
Loops:regions without repetitive structure that connects secondary structure elements.
Friday, March 1, 13
Protein structure
Supersecondary elements (motifs):
arrangements of two or three consecutive
secondary structure that are present in many
different protein structures, even with completely different
sequences.
Friday, March 1, 13
Protein structure
Domains: portion of the polypeptide chain that folds into a compact semi-independent unit.
v Class (C)Derived from secondary structure content is assigned automa<cally
v Architecture (A)Describes the gross orienta<on of secondary structures, independent of connec<vity.
v Topology (T)Clusters structures according to their topological connec<ons and numbers of secondary structures
v Homologous superfamily (H)
Friday, March 1, 13
Gly: unusual ramachandran, o?en found in turns
Ala: transient interac<ons
Cys: Very reac<ve, coordinate metals.
Thr, Ser: phosphoryla<on target: protein kinases aNack phosphate group to the side-‐chain.
Thr: Beta-‐branched more o?en found in beta-‐sheets.
Friday, March 1, 13
The problem of protein folding
What is protein fold:vCompact, globular folding arrangement of the polypeptide chainvChain folds to optimize packing of the hydrophobic residues in the interior core of the protein
Thermodynamics: ΔG = ΔH – TΔS
Friday, March 1, 13
The problem of protein folding
What is protein fold:vCompact, globular folding arrangement of the polypeptide chainvChain folds to optimize packing of the hydrophobic residues in the interior core of the protein
Thermodynamics: ΔG = ΔH – TΔS (i.e. stability of a given conformation)
Friday, March 1, 13
The problem of protein folding
What is protein fold:vCompact, globular folding arrangement of the polypeptide chainvChain folds to optimize packing of the hydrophobic residues in the interior core of the protein
Thermodynamics: ΔG = ΔH – TΔS (i.e. stability of a given conformation)
Enthalpy: electrostatics, dispersion, van der Waals, H-bonds.
Entropy: water molecules form “ordered cages” around hydrophobic amino acids. The protein folding process breaks this order.
The free energy of folding of a protein is of the order of few kcal/mol
Friday, March 1, 13
The problem of protein folding
v Anfinsen’s dogma: (at least for small globular proteins) the native structure is determined only by the protein's amino acid sequence
Friday, March 1, 13
The problem of protein folding
v Anfinsen’s dogma: (at least for small globular proteins) the native structure is determined only by the protein's amino acid sequence
v Levinthal paradox: because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations
Friday, March 1, 13
The problem of protein folding
v Anfinsen’s dogma: (at least for small globular proteins) the native structure is determined only by the protein's amino acid sequence
v Levinthal paradox: because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations
v Funnel theory: every protein has a specific folding pathway
Friday, March 1, 13
- 0 +
The problem of protein folding
ΔH
ΔG
–TΔS
–TΔS
Conformationalentropy
Folding
Result:
Hydrophobiceffects
Internalinteractions
Friday, March 1, 13
- 0 +
The problem of protein folding
ΔH
ΔG
–TΔS
–TΔS
Conformationalentropy
Folding
Result:
Hydrophobiceffects
Internalinteractions
Friday, March 1, 13
- 0 +
The problem of protein folding
ΔH
ΔG
–TΔS
–TΔS
Conformationalentropy
Folding
Result:
Hydrophobiceffects
Internalinteractions
Friday, March 1, 13
Evolution of protein structure
v What if a base-substitution event occurs in a protein-coding DNA region?
A. The fine balance between the gain and loss of free energy of folding is compromised: no single energy minimun → NOT FOLD
Friday, March 1, 13
Evolution of protein structure
v What if a base-substitution event occurs in a protein-coding DNA region?
A. The fine balance between the gain and loss of free energy of folding is compromised: no single energy minimun → NOT FOLD
B. The energy landscape of the protein change, but there is a global minimum of energy → same or similar function (i.e. local perturbations without affecting the general shape or topology) FOLD
Friday, March 1, 13
The “comparative modeling” principle
Friday, March 1, 13
Evolutionary-based methods for protein structure prediction
v Proteins evolved from a common ancestor maintain similar core 3D structures
We can use protein of known structure (templates)to model protein of unknown 3D structure (targets)
by starting from the sequence
This can be done if the templates and the targetare evolutionarily correlated
Friday, March 1, 13
Evolutionary-based methods for protein structure prediction
v Proteins evolved from a common ancestor maintain similar core 3D structures
We can use protein of known structure (templates)to model protein of unknown 3D structure (targets)
by starting from the sequence
This can be done if the templates and the targetare evolutionarily correlated
Friday, March 1, 13
Evolutionary-based methods for protein structure prediction
v Proteins evolved from a common ancestor maintain similar core 3D structures
We can use protein of known structure (templates)to model protein of unknown 3D structure (targets)
by starting from the sequence
This can be done if the templates and the targetare evolutionarily correlated
Friday, March 1, 13
Why Protein Structure Prediction?
We have an experimentally determined atomic structure for only ~1% of
the known protein sequences
Friday, March 1, 13
Why Protein Structure Prediction?
Growth in the number of unique foldsper year in the PDB based on the SCOP data
base from 1986 to 2007
Friday, March 1, 13
Why?
v We can use homology modeling to predict the structure of proteins of unknown structure…
but also…
Friday, March 1, 13
Why?
v We can use homology modeling to predict the structure of proteins of unknown structure…
but also…
To reconstruct some missing part in an incomplete protein structure (common in low resolution structures
or for large mobile loops)
Friday, March 1, 13
Why?
v We can use homology modeling to predict the structure of proteins of unknown structure…
but also…
To reconstruct some missing part in an incomplete protein structure (common in low resolution structures
or for large mobile loops)
To calculate a mutant of a known protein structure
To calculate the mean structure of an NMR ensamble
Friday, March 1, 13
Homology modeling flowchart
Query sequence
Friday, March 1, 13
Homology modeling flowchart
Search for suitable
template(s)
Query sequence
Sequence databases
Friday, March 1, 13
Homology modeling flowchart
Search for suitable
template(s)
Align sequence with
template(s)
Query sequence
Sequence databases
Friday, March 1, 13
Homology modeling flowchart
Search for suitable
template(s)
Align sequence with
template(s)
Query sequence
Sequence databases
Template PDB structure(s)
Friday, March 1, 13
Homology modeling flowchart
Search for suitable
template(s)
Align sequence with
template(s)
Calculate model(s)
Query sequence
Sequence databases
Template PDB structure(s)
Friday, March 1, 13
Homology modeling flowchart
Search for suitable
template(s)
Align sequence with
template(s)
Calculate model(s)
Assess resultsRefinement
(loops)
Query sequence
Sequence databases
Template PDB structure(s)
Friday, March 1, 13
Homology modeling flowchart
Search for suitable
template(s)
Align sequence with
template(s)
Calculate model(s)
Assess resultsModel(s)Refinement
(loops)
Query sequence
Sequence databases
Template PDB structure(s)
Friday, March 1, 13
Homology modeling flowchart
Search for suitable
template(s)
Align sequence with
template(s)
Calculate model(s)
Assess resultsModel(s)Refinement
(loops)
Query sequence
Sequence databases
Template PDB structure(s)Possible errors
Friday, March 1, 13
Homology modeling flowchart
hNp://salilab.org/modeller/
Friday, March 1, 13
How does it works?
Friday, March 1, 13
How does it works?
Friday, March 1, 13
1. Align sequence with structures
vFirst, must determine the template structures• Simplistically, try to align the target sequence against every
known structure’s sequence.• In practice, this is too slow, so heuristics are used (e.g. BLAST)• Profile or HMM searches are generally more sensitive in difficult
cases (Modeller’s profile.build method, PSI-BLAST or HHpred)• Could also use threading or other web servers
v Remember to look at:
Friday, March 1, 13
1. Align sequence with structures
vFirst, must determine the template structures• Simplistically, try to align the target sequence against every
known structure’s sequence.• In practice, this is too slow, so heuristics are used (e.g. BLAST)• Profile or HMM searches are generally more sensitive in difficult
cases (Modeller’s profile.build method, PSI-BLAST or HHpred)• Could also use threading or other web servers
v Remember to look at:• Sequence identity/similarity between the putative template(s) and
the target• Experimental method, resolution and completeness of the
template(s)• Other compounds bound to the template(s)• Oligomerization state
Friday, March 1, 13
1. Align sequence with structures
vAlignment to templates• Sequence-sequence: relies purely on a matrix of observed
residue-residue mutation probabilities (‘align’)• Sequence-structure: gap insertion is penalized within secondary
structure (helices etc.) (‘align2d’)• Other features, profile-profile, and/or user-defined (‘salign’) or
use an external program
v Remember:• An error in the alignment is always a fatal error for the whole
modeling procedure!• One amino acid sequence plays coy; a pair of homologous
sequences whisper; many aligned sequences shout out loud (A.M. Lesk, Introduction to Bioinformatics, 2002)
Friday, March 1, 13
1. Align sequence with structures
vEvaluation of sequence alignment quality
E. Krieger, S.B. Nabuurs, G. Vriend: „Homology modeling“. In Structural Bioinforma<cs. P.E. Bourne and H. Weissig Eds. (2003).
Friday, March 1, 13
2. Extract spatial restraints
vSpatial restraints incorporate homology information, statistical preferences, and physical knowledge
• Template Cα- Cα internal distances• Backbone dihedrals (φ/ψ)• Sidechain dihedrals given residue type of both target and
template• Force field stereochemistry (bond, angle, dihedral)• Statistical potentials• Other experimental constraints• Etc.
Friday, March 1, 13
3. Satisfy spatial restraints
v Satisfaction of spatial restraints
• Represent system at appropriate level(s) of resolution (e.g. atoms, residues, domains, proteins)
• Convert each data source into spatial restraints (e.g. harmonic distance simulates using “spring”)
• Sum all restraints into a scoring function• Generate models that are consistent with all restraints by
optimizing the scoring function (e.g. conjugate gradients, molecular dynamics, Monte Carlo)
Friday, March 1, 13
3. Satisfy spatial restraints
v All information is combined into a single objective function
• Force field (CHARMM 22) simply added in• Function is optimized by conjugate gradients and simulated
annealing molecular dynamics, starting from the target sequence threaded onto template structure(s)
• Multiple models are generally recommended• ‘best’ model or cluster or models chosen by simply taking the
lowest objective function score, or using a model assessment method such as Modeller’s own DOPE or GA341, or external programs such as PROSA or DFIRE
Friday, March 1, 13
4. Assess results
Friday, March 1, 13
4. Assess results
v How do we know if the model is a good one?
• Check log file for restraint violations and Modeller score (molpdf) (not reliable since the scoring function is not perfect!)
• Use another assessment score on the final modelØ Statistical Potential: GA341, DOPE, QMEAN
Ø Other programs (e.g. Prosa, Verify3D..)
• Use structure assessment programs (e.g. ProCheck)• Fit the model to some other experimental data not used in the
modeling procedure
Friday, March 1, 13
Typical assessments
DOPE profile
Friday, March 1, 13
Typical assessments
DOPE profile
Ramachandranplot (ProCheck)
Friday, March 1, 13
Typical assessments
DOPE profile
Ramachandranplot (ProCheck)
PROSA profile
Friday, March 1, 13
Structural alignment
Structural alignment of thioredoxins from humans (red) and the fly
Drosophila melanogaster (yellow)
Root-mean square deviation (RMSD)
Where xi and xj are the coordinate vectors of the
structure i and j, respectively, and N is the
number of atoms of the two strucures
2
Friday, March 1, 13
Typical errors in comparative models
Friday, March 1, 13
Model Accuracy as a Function of Target-Template Sequence Identity
Friday, March 1, 13
Model accuracy
Friday, March 1, 13
Applications of protein structure models
Drug designVirtual screening
DockingBinding site detec<on
Mutagenesis designFunc<onal rela<onship
Topology recogni<onFamili assignment
Overall fold
Friday, March 1, 13
Model refining
v Loop optimization
• Often, there are parts of the sequence which have no detectable templates
• “Mini folding problem” – these loops must be sampled to get improved conformations
• Database searches only complete for 4-6 residue loops• Modeller uses conformational search with a custom energy
function optimized for loop modeling (statistical potential derived from PDB)Ø Fiser/Melo protocol (‘loopmodel’)Ø Newer DOPE + GB/SA protocol (‘dope_loopmodel’)
Friday, March 1, 13
Model refining
v Accuracy of loop models as a function of amount of optimization
Friday, March 1, 13
Model refining
v Fraction of loops modeled with medium accuracy (<2Å)
Friday, March 1, 13
Advanced topics
v Modeller can also• Perform more sensitive searches for templates (sequence-profile,
profile-profile, similar to PSI-BLAST)• Incorporate ligands, RNA/DNA and water molecules into built
models• Build structures of multi-chain proteins (homo or hetero)• Add extra restraints to the modeling process (such as known
distances, e.g. from FRET)• Use multiple templates to build a model
v Remember:
Friday, March 1, 13
Advanced topics
v Modeller can also• Perform more sensitive searches for templates (sequence-profile,
profile-profile, similar to PSI-BLAST)• Incorporate ligands, RNA/DNA and water molecules into built
models• Build structures of multi-chain proteins (homo or hetero)• Add extra restraints to the modeling process (such as known
distances, e.g. from FRET)• Use multiple templates to build a model
v Remember: • You don’t have to use Modeller for template search, alignment,
assessment or refinement. If you know your template (e.g. from BLAST) just format the alignment for Modeller and skip straight to the model building step!
Friday, March 1, 13
42
Hidden Markov Models
A dishonest croupier could use a dice that has a higher probability of landing on a “6,” (e.g., 50%). To avoid being caught, the croupier can switch from a fair die to a loaded die with a certain frequency. For example, he can change the die from fair to loaded after 20 rolls and from loaded to fair after 10 rolls.
v Likelihood evaluation Given a series of emissions X1, X2, X3... Which is the probability that our model had emitted the observed sequence?
v Alignment.Given the sequence of observed emissions: which is the sequence of hidden states that generated it?
v Training:How can we optimize the statistical parameters in order to maximize probabilities 1 and 2?
Friday, March 1, 13
43
Hidden Markov Models: Protein Structural Bioinformatics
• In structure prediction, models can best be thought of as “sequence generators” (e.g., Hidden Markov Models) or “sequence classifiers” (e.g., Neural Networks)
v Likelihood evaluationPerformed using dynamic programming algorithms (similar to the ones used in sequence alignments)
v Alignment• Thus, given a model and a sequence we want to determine the
probability of any specific (query) sequence having been generated by the model in any of each possible paths.
v Training The model is ‘trained’ by aligning protein families. •
Friday, March 1, 13
44
Hidden Markov Models
v Described by
ü A set of possible states: match, insert, deletion.
ü A set of possible observations: frequencies of aa in each position.
ü A transition probability matrix
ü An emission probability matrix (frequencies of aa occurring in a particular state).
ü Initial state probabilities.
Friday, March 1, 13
HBA_human ... W G K V G A - - H A G E ...HBB_human ... W G K V - - - - N V D E ...MYG_phyca ... W G K V E A - - D V A G ...LGB2_luplu ... W K D F N A - - N I P K ...GLB1_glydi ... W E E I A G A D N G A G ...
0 0.25 0.75 0 0.2 0.4 0...A ...
0 0 0 0.2 0 0.2 0...D ...0 0.25 0 0 0 0 0.4...E ...
0.2 0 0 0 0 0 0...F ...0 0.25 0.25 0 0.2 0.2 0.4...G ...0 0 0 0.2 0 0 0...H ...
0.2 0 0 0 0.2 0 0...I ...0 0 0 0 0 0 0.2...K ...0 0 0 0 0 0 0...L ...
0 0.25 0 0.6 0 0 0...N ...0 0 0 0 0 0.2 0...P ...
0.6 0 0 0 0.4 0 0...V ...0 0 0 0 0 0 0...W ...
0 0 0 0 0 0 0...C ...
0 0 0 0 0 0 0...M ...
0 0 0 0 0 0 0...T ...
0 0 0 0 0 0 0...Q ...0 0 0 0 0 0 0...R ...0 0 0 0 0 0 0...S ...
0 0 0 0 0 0 0...Y ...
Each column of the profile pj(a)
contains the amino acid
frequencies in the multiple sequence
alignment
0
00.2
00.6
00
0.20
00
00
0
0
0
000
0
0
0.20.2
0000
0.60
00
00
0
0
0
000
0
0
00000000
00
01.0
0
0
0
000
0
master sequence
Sequence profiles are a condensed representation of alignments
Friday, March 1, 13
…
HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G A G V ...
M/D M/D M/D I I M/D M/D M/D M/D M/D Deletions
Insertions0 0.25 0.2 0.4 0 0...A ...
0 0 0 0.2 0 0...D ...0 0.25 0 0 0.4 0...E ...
0.2 0 0 0 0 0...F ...0 0.25 0.2 0.2 0.4 0...G ...0 0 0 0 0 0.4...H ...
0.2 0 0.2 0 0 0...I ...0 0 0 0 0.2 0...K ...0 0 0 0 0 0...L ...
0 0.25 0 0 0 0...N ...0 0 0 0.2 0 0...P ...
0.2 0 0 0 0 0...M→ D ...
0 0 0 0 0 0...C ...
0 0 0 0 0 0...M ...
0 0 0 0 0 0...W ...0 0 0 0 0 0.2...Y ...
0 1.0 0 0 0 0...D→ D ...
0 0 0 0 0 0...I → I ...0 0 0 0 0 0...M→ I ...
0.75
000
0.250000
00
0
0
0
00
0
0.50.25
0
0.2000
0.2000
0.60
0
0
0
00
0
00
Match or Delete
Probabilities for Insert Open Insert Extend Delete Open Delete Extend
HMM include position specific gap penalties
Friday, March 1, 13
HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...
M/D M/D M/D I I M/D M/D M/D M/D M/D
D
I
D
I
D
I
D
I
D
I
D
I
D
I
D
I
… …HMM p M M M M MMMM
Profile HMM can be represented as states connected by transitions
Probability that a sequence is emitted by an HMM rather than by a random model?
The probability for emitting the sequence x1, . ., xL along the path through an HMM is: P(x1, . . . , x1|emission on path).
This probability is a product of the amino acid emission probabilities for each state on the path and the transition probabilities between states.
Friday, March 1, 13
HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...
M/D M/D M/D I I M/D M/D M/D M/D M/D
D
I
D
I
D
I
D
I
D
I
D
I
D
I
D
I
… …
…
0 0.25 0.2 0.4 0 0A
0.2 0 0 0 0 0M→D
0 0 0 0 0 0C
0 0 0 0 0 0W0 0 0 0 0 0.2Y
0 1.0 0 0 0 0D→D
0 0 0 0 0 0I → I0 0 0 0 0 0M→ I
0.75
0
0
00
0
0.5 0.25
0
0
0
00
0
00
HMM p
pi(a)
pi(X→Y)
Matrix:
M M M M MMMM
Profile HMM can be represented as states connected by transitions
Friday, March 1, 13
HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...
M/D M/D M/D I I M/D M/D M/D M/D M/D
D
I
D
I
D
I
D
I
D
I
D
I
D
I
D
I
… …
…
0 0.25 0.2 0.4 0 0A
0.2 0 0 0 0 0M→D
0 0 0 0 0 0C
0 0 0 0 0 0W0 0 0 0 0 0.2Y
0 1.0 0 0 0 0D→D
0 0 0 0 0 0I → I0 0 0 0 0 0M→ I
0.75
0
0
00
0
0.50.25
0
0
0
00
0
00
HMM p
pi(a)
pi(X→Y)
Matrix:
M M M M MMMM
Profile HMM can be represented as states connected by transitions
Friday, March 1, 13
HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...
M/D M/D M/D I I M/D M/D M/D M/D M/D
D
I
D
I
D
I
D
I
D
II
DD
I
D
I
… …
…
0 0.25 0.2 0.4 0 0A
0.2 0 0 0 0 0M→D
0 0 0 0 0 0C
0 0 0 0 0 0W0 0 0 0 0 0.2Y
0 1.0 0 0 0 0D→D
0 0 0 0 0 0I → I0 0 0 0 0 0M→ I
0.75
0
0
00
0
0.50.25
0
0
0
00
0
00
HMM p
pi(a)
pi(X→Y)
Matrix:
M M M M MMMM
Profile HMM can be represented as states connected by transitions
Friday, March 1, 13
HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...
M/D M/D M/D I I M/D M/D M/D M/D M/D
D
I
D
I
D
I
D
I
D
I
D
I
D
I
D
I
… …
…
0 0.25 0.2 0.4 0 0A
0.2 0 0 0 0 0M→D
0 0 0 0 0 0C
0 0 0 0 0 0W0 0 0 0 0 0.2Y
0 1.0 0 0 0 0D→D
0 0 0 0 0 0I → I0 0 0 0 0 0M→ I
0.75
0
0
00
0
0.50.25
0
0
0
00
0
00
HMM p
pi(a)
pi(X→Y)
Matrix:
M M M M MMMM
Profile HMM can be represented as states connected by transitions
Friday, March 1, 13
State q
State p
M
D
I
M
D
I
M
D
I
M
D
I
M
D
I
M
D
I
M
D
I
HMM q
M
M
M
M
M
I
M
M
M
M
D
–
M
M
M
D
I
M
D
I
M
D
I
M
D
I
M
D
I
HMM p
x1 x2 x3 x4 x5 x6
Söding, J. (2005) Bioinformatics 21, 951-960.
Include Null model maximize “log-sum-of-odds score”
Co-emitted sequence
Find path through two HMM that maximizes co-emission probability
Friday, March 1, 13
Excercise
v Target:• human thioesterase 8 : interacts with HIV-1 Nef protein.
v Procedure:• Search for templates using HHpred• Prepare Modeller input files• Build the models• Evaluate the model structure
v Materials and Methods:• UniProt• Modeller (http://salilab.org/modeller/)• Modeller manual• ProCheck web server (http://www.ebi.ac.uk/thornton-srv/
databases/pdbsum/Generate.html)• Prosa web server (https://prosa.services.came.sbg.ac.at/
prosa.php)
Friday, March 1, 13
Profile method
For each aa we can calculate the frequency in Secondary elements Surface of the protein Hydrophobic environment ...
Each aa is substituted by a letter (property)
From the structure we can analyze positions in terms of:
- Presence in secondary structure element - Percentage of solvent exposition
- Hydrophobic or polar environment?
Principle: Find a compatible fold
>Target Sequence XY MSTLYEKLGGTTAVDLAVAAVA GAPAHKRDVLNQ
Rank models according to
SCORE or ENERGY
Build model of target protein based on eachtemplate structure
Thus each structure is converted into property sequences...not aa
PDB becomes a ‘property sequence’ DB. Thus we have to just align ‘property sequences’
Fold recognition
Friday, March 1, 13
M
A
TE
A
F
TS
G
Q
Fold recognition
v Threading methods
Ø Statistical Potentials
Ø Programs:
• Threader, mgenthreader.
• Several approximations: Frozen approximation used for accelerate calculations
• In the past used for remote homology assessment• Now used in automatic projects for the structural prediction of
the entire human genome.
Friday, March 1, 13
M
A
TE
A
F
TS
G
Q
Fold recognition
v Threading methods
Ø Statistical Potentials
Ø Programs:
• Threader, mgenthreader.
• Several approximations: Frozen approximation used for accelerate calculations
• In the past used for remote homology assessment• Now used in automatic projects for the structural prediction of
the entire human genome.
Friday, March 1, 13
New folds
Friday, March 1, 13
New folds
v ‘Ab inito modeling’ or de novo prediction
Ø Folding by statistical approaches: ‘very’ coarse-grainedØ Force FieldsØ Fragment Assemblies.
Ø Structure with common structural motifs or supersecondary structuresØ The relationship between local sequence and local structure is highly
degenerated
Ø Programs: Fragfold and RosettaØ These approaches were a real breakthrough in the fieldØ New folds, difficult crustal structures, difficult modeling,
protein design: see articles by David Baker.
Friday, March 1, 13
MSSPQAPEDGQGCGDRGDPPGDLRSVLVTTV
ROSETTA 9 aa fragmentsChoose the 25 closest sequences
ROSETTA
Simulated Annealing of dihedral angles
FRAGFOLDSupersecondary structure elements
tri, tetra and penta peptides Each fragment is energetically evaluated (Statistical potential)
Optimization and Assembly (statistical potential) FRAGFOLD Random combination of fragments
Simulated annealing
Fragment Assembly
Friday, March 1, 13
top related