lecture 8: protein modeling · •case study –modelling the drug binding site of herg. why...

73
1 Lecture 8: Protein Modeling Junmei Wang Department of Pharmacology, University of Texas Southwestern Medical Center at Dallas [email protected]

Upload: others

Post on 09-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

1

Lecture 8: Protein Modeling

Junmei Wang

Department of Pharmacology, University of Texas

Southwestern Medical Center at Dallas

[email protected]

Page 2: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

2

Project 2: MD Simulations

1. Select a protein system

Protein Class PDB Code -logkd Resolution

Neuraminidase 2QWG 8.4 1.8

DHFR 1DHF 7.4 2.3

L-arabinose 1ABE 6.52 1.7

Thrombin 1A5G 10.15 2.06

Human oxresin receptor 1 4ZJ8 ~10 2.75

Page 3: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

3

Project 2 MD Simulations -Continued

2. Select top hits from autodock-vina screening • Top 2 and bottom 1

3. Prepare ligand structures and residue topologies• Add hydrogen with adt• Generate Gaussian gcrt file with antechamber• Run G09 to calculate electrostatic potentials (ESP)• Run Antechamber to assign RESP charges• An alternative is run antechamber to assign am1-

bcc charge

Page 4: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

4

Project 2 MD Simulations -Continued

4. Prepare topology files for minimization and MD simulations• xleap• tleap

5. Run Minimization and MD simulations using a delicate scheme• Minimization with main chain restrained using a set

of gradually reduced restraint force constants• MD simulation with main chain restrained using a

set of gradually reduced restraint force constants• Heat systems up using a set of temperatures• Equilibrium phase• Sampling phase

Page 5: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

5

Project 2 MD Simulations -Continued

6. Analyze MD snapshots• Average structure• RMSD ~ simulation time plots • Quasi-harmonic analysis • MD movie

Page 6: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

6

Project 3: Binding Free Energy Calculations With MM-PB/GBSA

1. Select a protein system

Protein Class PDB Code -logkd Resolution

Neuraminidase 2QWG 8.4 1.8

DHFR 1DHF 7.4 2.3

L-arabinose 1ABE 6.52 1.7

Thrombin 1A5G 10.15 2.06

Human oxresin receptor 1 4ZJ8 ~10 2.75

Page 7: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

7

Project 3: Binding Free Energy Calculations With MM-PB/GBSA

2. Prepare topologies for energy calculations with implicit solvent• Xleap• Tleap

3. Run mmpbsa.py to do the calculation• Input file• Output files

4. Analyze the MM-PB/GBSA results

Page 8: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Protein Modeling

Page 9: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Contents

• Introduce the process of homology modelling.

• Summarise the methods for predicting the structure from sequence.

• Describe the individual steps involved in creating and optimising a protein homology model.

• Outline the methods available to evaluate the quality of homology models.

• Case Study – Modelling the Drug binding site of hERG.

Page 10: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Why Homology Model?

• Solving protein structures is not trivial.

• There are currently ~1.8 million known protein coding sequences.

• But only ~120,000 protein structures in the PDB.

• Even so, many of these structures are duplicates.

• For Membrane Proteins structural data is even more sparse:

• There are currently 2829 membrane protein structures

RSCB Protein Data Bank (PDB)

Statistics (22/07/16)

Method Totals

X-ray 117790

NMR 11469

EM 1089

Other 198

Total 120642

www.rscb.org

Page 11: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Amino Acid Residues

• Proteins are made up of amino acids, which are interconnected by peptide bonds.

• There are 20 naturally occurring amino acids.

• Amino acids may be subdivided by their individual properties.

Page 12: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

DSSRRQYQEKYKQVEQYMSFHKLPADFRQKIHDYYEHRYQGKMFDEDSILGELNGPLREEIVNFNCR

KLVASMPLFANADPNFVTAMLTKLKFEVFQPGDYIIREGTIGKKMYFIQHGVVSVLTGNKEMKLSDG

SYFGEICLLTRGRRTASVRADTYCRLYSLSVDNFNEVLEEYPMMRRAFETYVAIDRLDRIGKKNSIL

From Sequence to Structure

SecondaryStructure

TertiaryStructure

QuaternaryStructure

Primary Structure – Amino Acid Sequence

What information can we get from a Sequence of amino acids?

Page 13: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Secondary Structure Prediction

• The Secondary Structure of Proteins is Defined by the DSSP algorithm.

• Amino acids classified as either α-helix (H), β-strand (S) or loop (C).

• It is possible to extract structural information from amino acid sequence.

• These prediction methods were initially proposed by Chou & Fasman in 1978.

• They used a statistical method based on 15 known crystal structures.

• Recent developments and an increase in structural information has improved these methods and they are currently ~80% accurate.

PSI-Pred: http://bioinf.cs.ucl.ac.uk/psipred/

JPred: http://www.compbio.dundee.ac.uk/~www-jpred/

Page 14: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Transmembrane Helix Prediction

• The amino acids at the centre of transmembrane helices are generally hydrophobic in nature.

• Analysis of Hydropathicity can be used to predict the number of membrane spanning helices.

• The analysis for the G-protein coupled receptor to the right suggests it has 7 TM helices.

• The example used the Kyte & Doolittle scale.

Hydropathy Plot

http://expasy.org/tools/protscale.html

Page 15: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

BLAST

• How to find an appropriate template Structure for homology modelling…

• Basic Local Alignment Search Tool

• Used to search protein databases:

• e.g. Non-redundant (nr) & SwissProtto find similar sequences.

• Protein Data Bank (PDB) to find structures with similar sequences.

• PSI- & PHI-blast are more advanced Blast methods.

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

Page 16: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

The Importance of Resolution

• In X-ray crystallography it is not always possible to flawlessly resolve the crystal density of the protein of interest.

• This results in a lower resolution structure.

• The lower the resolution the more likely the structure is wrong.

• The resolution of the template structure also reflects in the quality of the homology model.

high

low4 Å

2 Å

3 Å

1 Å

Page 17: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Sequence Alignment

Aligns the sequence(s) of interest to that of the template structure(s):

• Emboss may be used for two sequence, to generate a pairwise alignment & a percentage identity – ideally an identity of >50%:

http://www.ebi.ac.uk/emboss/align/

• T-Coffee, Clustal & MUSCLE are popular methods for multiple sequence alignment. All may be found at :

http://www.ebi.ac.uk/

• ESPRIPT is useful for formatting to creating black & white figures:

http://espript.ibcp.fr/

• Promals3d: Nick Grishin at UTSW

http://prodata.swmed.edu/promals3d/promals3d.php

Page 18: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Automated Homology Modelling

If you are lazy there are servers that do the modelling for you!

• Swiss Model : http://swissmodel.expasy.org//SWISS-MODEL.html

• Robetta : http://robetta.bakerlab.org/

• 3D Jigsaw : http://www.bmm.icnet.uk/servers/3djigsaw/

• Phyre : http://www.sbg.bio.ic.ac.uk/phyre/

• EsyPred3D : http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/

• CPHmodels : http://www.cbs.dtu.dk/services/CPHmodels/

Eva-CM performs continuous and automated analysis of comparative protein structure modeling servershttp://pdg.cnb.uam.es/eva/doc/intro_cm.html

Page 19: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Modeller

from modeller import *

from modeller.automodel import *

log.verbose()

env = environ()

env.io.atom_files_directory = './'

a = automodel(

env,

alnfile = 'herg.ali',

knowns = '1q5o',

sequence = 'herg'

)

a.starting_model= 1

a.ending_model = 1

a.make()

>P1;1q5o

structureX: 1q5o : 443 : A : 644 : A ::::

DSSRRQYQEKYKQVEQYMSFHKLPADFRQKIHDYYEHRYQ-GKMFDEDSILGELNGPLRE

EIVNFNCRKLVASMPLFANADPNFVTAMLTKLKFEVFQPGDYIIREGTIGKKMYFIQHGV

VSVLTKGNKEMKLSDGSYFGEICLL--TRGRRTASVRADTYCRLYSLSVDNFNEVLEEYP

MMRRAFETVAIDRLDRIGKKNSIL.*

>P1;herg

sequence: herg : 1 :::::::

YSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQA

DICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGS

IEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYP

EFSDHFWSSLEITFNLRDTN-MIP.*

• Well regarded program for Homology/Comparative Modelling.

• Current Version 9.17 https://salilab.org/modeller/

• Requires an Input file, Sequence alignment & Template structure.

ATOM 1 N ASP A 443 -15.943 41.425 44.702 1.00 44.68

ATOM 2 CA ASP A 443 -15.424 42.618 45.447 1.00 43.15

ATOM 3 C ASP A 443 -14.310 43.306 44.686 1.00 41.81

ATOM 4 O ASP A 443 -14.298 44.528 44.539 1.00 42.61

etc...

Input File (*.py) Template Structure (*.pdb)

Sequence Alignment (*.ali)

Page 20: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

How Does it Work?

EnergyMinimisation

Amino acid Substitution

Template Structure Initial Model (*.ini) Output Model(s) (*.B999*)

Valine Glutamine Change inRotamer

Page 21: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Modeller : Output

• .log : log output from the run.

• .B* : model generated in the PDB format.

• .D* : progress of optimisation.

• .V* : violation profile.

• .ini : initial model that is generated.

• .rsr : restraints in user format.

• .sch : schedule file for the optimisation process.

Page 22: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Modeller Features & Restraints

• Secondary Structure.Regions of the protein may be forced to be α-helical or β-strand.

• Distance restraints.The distance between atoms may be restrained.

• Symmetry.Protein multimers can be restrained so that all monomers are identical.

• Disulphide Bridges.Two cysteine residues in the model can be forced to make a cystine bond.

• Ligands.Ions, waters and small molecules may be included from the template.

• Loop Refinement.Regions without secondary structure often require further refinement.

Page 23: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

An Iterative Process

Page 24: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Structural Convergence

• The catalytic triad of Serine, Aspartate and Histidine is found in certain protease enzymes. (a) Subtilisin (b) Chymotrypsin.

• However, the overall structure of the enzyme is often different.

• This is also important when considering ligand binding sites.

Page 25: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Modelling Ligand Interactions

• Small molecules, waters and ions can be retained from the template structure.

• It is possible to search for homologues based on the ligands they bind.

• Experimental data, especially mutagenesis is very useful when modelling ligand binding sites.

• Although the key residues may often remain, the overall structure of the protein may vary radically.

• The presence of the ligand is also likely to alter the conformation of the protein.

1ATN

1E4G

ATP Binding Site

Page 26: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Conformational States

• The backbone structure of the model will be almost identical to that of the template.

• Therefore the conformational state of the template will be retained in the resultant homology model.

• This is important when considering the open or closed conformation of a channel…

• … or the Apo versus bound state of a ligand binding site.

Closed

Open

Page 27: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Loop Modeling

Page 28: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Loop Modelling

Issues with Loop Modelling

• As loops are less restrained by hydrogen bonding networks they often have increased flexibility and therefore are less well defined.

• In addition the increased mobility make looped regions more difficult to structurally resolve.

• Proteins are often poorly conserved in loop regions.

• There are usually residue insertions or deletions within loops.

• Proline and Glycine resides are often found in loops – we’ll come back to this when discussing Model evaluation protocols.

Page 29: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Loop Modelling

• There are two main methods for modelling loops:

1. Knowledge based: A PDB search for fragments that match the sequence to be modelled (Levitt, Holm, Baker etc.).

2. Ab initio: A first principles approach to predict the fold of the loop, followed by minimisation steps.

• Many of the newer loop prediction methods use a combination of the two methods.

• These approaches are being developed into methods for computationally predicting the tertiary structure of proteins. eg Rosetta.

• But this is computationally expensive.

• Modeller creates an energy function to evaluate the loop’s quality.

• The function is then minimised by Monte Carlo (sampling), Conjugate Gradients (CG) or molecular dynamics (MD) techniques.

Page 30: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Loops – the Rosetta Method

Find fragments (10 per amino acid) with the same sequence and secondary structure profile as the query sequence.

Combine them using a Monte Carlo scheme to build the loop.

David Baker et al.

Page 31: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Predicting Sidechain Conformations

1. Networks of side chain contacts are important for retaining protein structure.

2. Sidechains may adopt a variety of different conformations, but this is dependent on the residue type.

For example a threonine generally adopts 3 conformations, whilst a lysine may adopt up to 81.

3. This is dependent backbone conformation of the residue.

4. The different residue conformations are known as rotamers.

5. Where a residue is conserved it is best to keep the side chain rotamer from the template than predict a new one.

6. Rotamer prediction accuracy is high for buried residues, but much lower for surface residues:– Side chains at the surface are more flexible.– Hydrophobic packing in the core is easier to handle than the electrostatic

interactions with water molecules. (cytoplasmic proteins)

7. Most successful method is SCWRL by Dunbrack et al.: http://dunbrack.fccc.edu/SCWRL3.php

Page 32: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Model Refinement

Page 33: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Refinement

Energy minimization Molecular dynamics

– Big errors like atom clashes can be removed, but force fields are not perfect and small errors will also be introduced – keep minimization to a minimum or matters will only get worse.

Page 34: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Error Recovery

If errors are introduced in the model, they normally can NOT be recovered at a later step

– The alignment can not make up for a bad choice of template.

– Loop modeling can not make up for a poor alignment.

If errors are discovered, the step where they were introduced should be redone.

Page 35: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Model Validation

Page 36: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Validation

1. Stereochemical checks on bond lengths, angles and atomic contacts Most programs will get the bond lengths and angles right

2. Ensures that the backbone conformation of the model is normal.

3. Evaluate the Ramachandran PlotThe Ramachandran plot of the model usually looks pretty much like the Ramachandran plot of the template

4. Check the inside/outside distributions of polar and apolar residues

5. Check if validate the known biological/biochemical data

– Active site residues

– Modification sites

– Interaction sites

Page 37: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Model Evaluation With Modeller

1. For every model, Modeller creates an objective function energy term, which is reported in the second line of the model PDB file (.B*).

This is not an absolute measure but can be used to rank models calculated from the same alignment. The lower the value the better.

2. DOPE scoring

3. A Cα-RMSD (Root Mean Standard Deviation) between the template structure and models can also be used to compare the final model to its template.

A good Cα-RMSD will be less than 2Å.

4. Modeller is good on the whole, but sometimes struggles with residues found in loops.

Page 38: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Ramachandran Plot

α-helix

β-strand

PsiDihedral

Angle

Phi Dihedral Angle

left-handedhelix

Peptidedihedralangles

Page 39: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Ramachandran Plot

• The results of the ramachandran plot will be very similar to that of the template.

• A Good template is therefore key!

• Most residues are mainly found on the left-hand side of the plot.

• Glycine is found more randomly within plot (orange), due to its small sidechain (H) preventing clashes with its backbone.

• Proline can only adopt a Phi angle of ~-60° (green) due to its sidechain.

• This also restricts the conformational space of the pre-proline residue.

N

Page 40: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Structure Validation

ProCheck: http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html

http://services.mbi.ucla.edu/PROCHECK/

WhatIf server :

http://swift.cmbi.kun.nl/WIWWWI/

ProQ :

http://www.sbc.su.se/~bjorn/ProQ

Biotech Validation Suite:

http://biotech.embl-ebi.ac.uk:8400/

RAMPAGE:

http://mordred.bioc.cam.ac.uk/~rapper/rampage.php

Page 41: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

+----------<<< P R O C H E C K S U M M A R Y >>>----------+

| |

| mgirk .pdb 2.5 104 residues |

| |

*| Ramachandran plot: 91.7% core 7.6% allow 0.3% gener 0.4% disall |

| |

*| All Ramachandrans: 15 labelled residues Backbone |

*| Chi1-chi2 plots: 6 labelled residues Sidechain |

| Main-chain params: 6 better 0 inside 0 worse |

| Side-chain params: 5 better 0 inside 0 worse |

| |

*| Residue properties: Max.deviation: 16.1 Bad contacts: 10 |

*| Bond len/angle: 8.0 Morris et al class: 1 1 3 |

| |

| G-factors Dihedrals: 0.10 Covalent: 0.29 Overall: 0.16 |

| |

| M/c bond lengths: 99.1% within limits 0.9% highlighted |

*| M/c bond angles: 98.1% within limits 1.9% highlighted |

| Planar groups: 100.0% within limits 0.0% highlighted |

| |

+----------------------------------------------------------------------------+

+ May be worth investigating further. * Worth investigating further.

PROCHECK

Page 42: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Validation – ProQ Server

ProQ is a neural network based predictor that based on a number of structural features predicts the quality of a protein model.

ProQ is optimized to find correct models in contrast to other methods which are optimized to find native structures.

Arne Elofssons group: http://www.sbc.su.se/~bjorn/ProQ/

Page 43: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

CASP

• Critical Assessment of Structure Prediction.

• A Biennial competition that has run since 1994.

• The next competition will be in 2008 (CASP8)

• http://predictioncenter.org/

• Its goal is to advance the methods for predicting protein structure from sequence.

• Protein structures yet to be published are used as blind targets for the prediction methods, with only sequence information released.

• Competitors may use Homology Modelling, Fold recognition or Ab Initio structural prediction methods to propose the structure of the protein.

Page 44: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Summary

• Homology Modelling is a valuable tool for structural biologists. It is important to take time when constructing a model.

• There are five main stages:

1. Identify an appropriate template structure(s).

2. Create a Sequence alignment.

3. Perform the homology modelling.

4. Analyse and Evaluate the quality of the model.

5. Refinement.

• Successful homology modelling depends on the following:

– Template quality

– Alignment (add biological information)

– Modelling program/procedure (use more than one)

• Always validate your final model!

Page 45: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Modeller: G5G8

Page 46: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

46

Protein Modeling With Modeller: An Example

Alignment 1. Promals3d Alignment2. Generate alignment file for Modeller

Python script for running Modeller 1. Model_generation.py

Model selection 1. DOPE score2. GA341 score

Page 47: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Rosetta: PDZ3

Page 48: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

48

Rosetta Protein Design (Rosetta 3.1 and up)

https://www.rosettacommons.org/

Basic procedure 1. Generate profile for the protein to be designed.

(make_fragments.sh)2. Idealize protein structure (design_ideal.sh)3. Fixed back bond design (design_fix.sh)4. Flexible back bond design (design_flex.sh)

Major parameters that control flexible backbone protein design

1. Residue Definition File2. Extended rotamer libraries : ex1, ex2, ex3

Page 49: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

49

Rosetta Protein Design: An Example

CommandRun_1be9.bat

Residue definition file1be9.resfile

Flag fileFlag

Model selection

Blue: X-Ray

Red: 1be9_0555

Score: -223.848

Magenta: 1be9_0278

Score: -215.518

Page 50: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

50

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Plot of CMR of 240 PDZ Sequences

GEEDIPREPRRIVIHRGSTGLGFNIIGGEDGEGIFISFILAGGPADLSGE

LRKGDQILSVNGVDLRNASHEQAAIALKNAGQTVTIIAQYKPEE

1 11 21 31 41

23-70

34-73

60-77

60-74

21-77

34-77

70-74

51-70

45-51

25-34

Page 51: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

51

Distribution of Rosetta Scores

Rosetta Score Distribution

0

5

10

15

20

25

30

35

-237.5 -232.5 -227.5 -222.5 -217.5 -212.5 -207.5 -202.5 -197.5 -192.5 -187.5 -182.5 -177.5 -172.5

Rosetta Score

Perc

en

t (%

)

Page 52: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

52

Statistical Coupling Analysis of 240 Rosetta Sequences

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

Page 53: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Modeling Reaction

Page 54: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

A Hybrid QM/MM Approach

The development of hybrid QM/MM approaches is guided by the general idea that

large chemical systems may be partitioned into an electronically important region

which requires a quantum chemical treatment and a remainder which only acts in a

perturbative fashion and thus admits a classical description.

Page 55: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

The QM/MM Modelling Approach

• Couple quantum mechanics and molecular mechanics approaches

• QM treatment of the active site

– reacting centre

– excited state processes (e.g. spectroscopy)

– problem structures (e.g. complex transition metal centre)

• Classical MM treatment of environment

– enzyme structure

– zeolite framework

– explicit solvent molecules

– bulky organometallic ligands

Page 56: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

QM/MM Methods

• Construct a Hamiltonian for the system consisting of a QM region and an MM region

/QM MM QM MMH H H H

• QM and MM regions interact mechanically and electronically (electrostatics, polarization)

• If bonds cross boundary between QM and MM region:

– Cap bonds of QM region with link atoms

– Use frozen or hybrid orbitals to terminate QM bonds

Page 57: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

The Simplest Hybrid QM/MM Model Hamiltonian for the molecular system in the Born-Oppenheimer approximation:

esChExternalofEffect

nuclei

i

esch

k ik

kielectrons

i

esch

k ik

knuclei

ij ij

jinuclei

i

nuclei

j

electrons

ij ij

electrons

iij

jelectrons

i

electrons

i

nuclei

ij ij

jinuclei

i

nuclei

j

electrons

ij ij

electrons

iij

jelectrons

i

electrons

i

R

QZ

R

Q

R

ZZ

rR

ZH

R

ZZ

rR

ZH

arg

argarg2

2

1

2

1

1

2

1

The main drawbacks of this simple QM/MM model are: • it is impossible to optimize the position of the QM part relative

to the external charges because QM nuclei will collapse on the negatively charged external charges.

• some MM atoms possess no charge and so would be invisible to the QM atoms

• the van der Waals terms on the MM atoms often provide the only difference in the interactions of one atom type versus another, i.e. chloride and bromide ions both have unit negative charge and only differ in their van der Waals terms.

“Standard” QM Hamiltonian

Page 58: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

A Hybrid QM/MM Model

So, it is quite reasonable to attribute the van der Waals parameters (as it is in the MM method) to every QM atom and the Hamiltonian describing the interaction between the QM and MM atoms can have a form:

The van der Waals term models also electronic repulsion and dispersion interactions, which do not exist between QM and MM atoms because MM atoms possess no explicit electrons.

A. Warshel, M. Levitt // Theoretical Studies of EnzymicReactions: Dielectric, Electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. // J.Mol.Biol. 103(1976), 227-49

nuclei

i

atomsMM

j ij

ij

ij

ijnuclei

i

atomsMM

j ij

jielectrons

i

atomsMM

j ij

j

MMQMR

B

R

A

R

QZ

r

QH

612/ˆ

Page 59: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

The Hybrid QM/MM Model

Now we can construct a “real” hybrid QM/MM Hamiltonian:

A “standard” MM force field can be used to determine the MMenergy. For example, AMBER-like force field has a form:

nuclei

i

atomsMM

j ij

ij

ij

ijnuclei

i

atomsMM

j ij

jielectrons

i

atomsMM

j ij

j

MMQMR

B

R

A

R

QZ

r

QH

612/ˆ

MMMMQMQM HHHH ˆˆˆˆ/

nonbonded ij

ji

ij

ij

ij

ij

dihedralsanglesbonds

bMM

R

qq

R

B

R

An

VKRRKH

6120

2

0 ))cos(1(2

)()(

nuclei

ij ij

jinuclei

i

nuclei

j

electrons

ij ij

electrons

iij

jelectrons

i

electrons

i

QM

R

ZZ

rR

ZH

1

2

1 2

Page 60: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Choice of QM method

... is a compromise between computational efficiency and practicality and the desired chemical accuracy.

The main advantage of semi-empirical QM methods is that their computational efficiency is orders of magnitude greater than either the density functional or ab initio methods

OP

OO

O

NOO

O

FF

F

NOO

Calculation times (in time units)

1800

1

36228

1

RHF/6-31G*

PM3

Page 61: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Hints for running QM/MM calculations

Choosing the QM region

There are no good universal rules here

One might want to have as large a QM region as possible

However, having more than 80-100 atoms in the QM region will lead to simulations that are very expensive.

for many features of conformational analysis, a good MM force field may be better than a semi-empirical or DFTB quantum description.

Page 62: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Hints for running QM/MM calculationsChoosing the QM region

Page 63: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

• At present all parts of the QM simulation are parallel except the density matrix build and the matrix diagonalisation.

• For small QM systems these two operations do not take a large percentage of time and so acceptable scaling can be seen to around 8 cpus.

• However, for large QM systems the matrix diagonalization time will dominate and so the scaling will not be as good.

Hints for running QM/MM calculationsParallel Simulations

Page 64: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

• Boundary through space (solute (QM) + solvent (MM))1. Unpolarized interaction: Solute (QM) + Solvent (MM) 2. Polarized QM/unpolarized MM3. Fully polarized interactions

• Boundary through bondLink atoms

Two Scenarios of Hybrid QM/MM

Page 65: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

• EVB attempts to combine empirical potential energy functions with valence bond ideas to describe chemical reactions efficiently and accurately.

• EVB starts with a NN potential energy matrix:

N diabatic states (diagonal)N (N-1) couplings (off-diagonal)

• Each diabatic state looks like a configuration in a standard non-reactive force field.

• Off-diagonal coupling elements: interaction between each diabatic state and the N-1 remaining states.

• Diagonalize V adiabatic states. The minimal value is the ground state.

Empirical valence bond: a method related to QM/MM

2

*

2

2

*

1

HOHHO

HOHHO

ba

ba

Page 66: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Basic Idea – to be continued

If two diabatic states …

Secular Equation

Overlap integral is neglected

2/1

2

12

2

22112211

222121

121211

2221

1211

2)(

2

1

0

VVV

VVV

EVESV

ESVEV

VV

VVV

Page 67: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Amber features new and significantly improved QM/MM support

The QM/MM facility supports gas phase, implicit solvent (GB) and periodic boundary (PME) simulations

Compared to earlier versions, the QM/MM implementation offers improved accuracy, energy conservation, and performance.

Amber QM/MM

Page 68: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Amber QM/MM Example

Page 69: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Amber QM/MM Example

Sample output

Page 70: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Summary of MMMS Couse

MM&MS Curriculum:

https://Mulan.swmed.edu/mmms/molecular_modelling.html

Page 71: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Application of MMMS in Biomedical Research

• Structure refinement

• Protein function

1. Mutagenesis

2. Dynamics

• Drug design

Page 72: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Basic Approaches of Free Energy Calculations

Zinc Database

(20 million cpds)

HTS Docking

(autodock)

Structure Refinement

(Minimization/MD)

Promising Hits

Binding Free Energy

Prediction

(MM/PB/GB-SA)

Candidates

to bioassay

Page 73: Lecture 8: Protein Modeling · •Case Study –Modelling the Drug binding site of hERG. Why Homology Model? • Solving protein structures is not trivial. • There are currently

Thank You For Your Attention

• Ajax Accounts (to be active until the end of 2016)

• Project Summary/Slides (in 4 weeks)

• TACC

• Computer for modeling