Download - 1 Molecular Modeling Methods & Ab Initio Protein Structure Prediction By Haiyan Jiang Oct. 16, 2006
1
Molecular Modeling Methods & Molecular Modeling Methods & Ab InitioAb Initio Protein Structure Prediction Protein Structure Prediction
By Haiyan JiangBy Haiyan Jiang
Oct. 16, 2006Oct. 16, 2006
2
About meAbout me
2003, Ph.D in Computational Chemistry, University of Science
and Technology of China
Research: New algorithms in molecular structure optimization
2004~2006, Postdoc, Computational Biology, Dalhousie
University
Research: Protein loop structure and the evolution of protein
domain
3
PublicationsPublications
Haiyan Jiang, Christian Blouin, Ab Initio Construction of All-atom Loop
Conformations, Journal of Molecular Modeling, 2006, 12, 221-228.
Ferhan Siddiqi, Jennifer R. Bourque, Haiyan Jiang, Marieke Gardner, Martin
St. Maurice, Christian Blouin, and Stephen L. Bearne, Perturbing the
Hydrophobic Pocket of Mandelate Racemase to Probe Phenyl Motion During
Catalysis, Biochemistry, 2005, 44, 9013-9021. (Responsible for building the
simulation model and performing molecular dynamics study)
Yuhong Xiang, Haiyan Jiang, Wensheng Cai, and Xueguang Shao, An Efficient Method Based on Lattice Construction and the Genetic Algorithm for Optimization of Large Lennard-Jones Clusters, Journal of Physical Chemistry A, 2004, 108, 3586-3592.
Xueguang Shao, Haiyan Jiang, Wensheng Cai, Parallel Random Tunneling Algorithm for Structural Optimization of Lennard-Jones Clusters up to N=330, Journal of Chemical Information and Computer Sciences, 2004, 44, 193-199.
4
PublicationsPublications
Haiyan Jiang, Wensheng Cai, Xueguang Shao., New Lowest Energy Sequence
of Marks’ Decahedral Lennard-Jones Clusters Containing up to 10000 atoms,
Journal of Physical Chemistry A, 2003, 107, 4238-4243.
Wensheng Cai, Haiyan Jiang, Xueguang Shao., Global Optimization of Lennard-
Jones Clusters by a Parallel Fast Annealing Evolutionary Algorithm, Journal of
Chemical Information and Computer Sciences, 2002, 42, 1099-1103.
Haiyan Jiang, Wensheng Cai, Xueguang Shao., A Random Tunneling Algorithm
for Structural Optimization Problem, Physical Chemistry and Chemical Physics,
2002, 4, 4782-4788.
Xueguang Shao, Haiyan Jiang, Wensheng Cai., Advances in Biomolecular
Computing, Progress in Chemistry (chinese) , 2002, 14, 37-46.
Haiyan Jiang, Longjiu Cheng, Wensheng Cai, Xueguang Shao., The Geometry
Optimization of Argon Atom Clusters Using a Parallel Genetic Algorithm,
Computers and Applied Chemistry (chinese), 2002, 19, 9-12.
5
Unpublished work
Haiyan Jiang, Christian Blouin, The Emergence of Protein Novel Fold and
Insertions: A Large Scale Structure-based Phylogenetic Study of Insertions in
SCOP Families, Protein Science, 2006. (under review)
6
ContentsContents
Molecular modeling methods and applications in ab initio protein
structure prediction
Potential energy function
Energy Minimization
Monte Carlo
Molecular Dynamics
Ab initio protein loop modeling
Challenge
Recent progress
CLOOP
7
Molecular Modeling MethodsMolecular Modeling Methods
Molecular modeling methods are the theoretical methods and
computational techniques used to simulate the behavior of
molecules and molecular systems
Molecular Forcefields
Conformational Search methods
Energy Minimization
Molecular Dynamics
Monte Carlo simulation
Genetic Algorithm
8
Ab InitioAb Initio Protein Structure Prediction Protein Structure Prediction
Ab initio protein structure prediction methods build protein 3D
structures from sequence based on physical principles.
Importance
The ab initio methods are important even though they are
computationally demanding
Ab initio methods predict protein structure based on physical models,
they are indispensable complementary methods to Knowledge-based
approach
eg.
Knowledge-based approach would fail in following conditions:
Structure homologues are not available
Possible undiscovered new fold exists
9
Applications of MM in Applications of MM in Ab InitioAb Initio PSP PSP
Basic idea
Anfinsen’s theory: Protein native structure corresponds to the
state with the lowest free energy of the protein-solvent system.
General procedures
Potential function
Evaluate the energy of protein conformation
Select native structure
Conformational search algorithm
To produce new conformations
Search the potential energy surface and locate the global minimum
(native conformation)
10
Protein Folding FunnelProtein Folding Funnel
Local mimina
Global minimum
Native Structure
11
Potential Functions for PSPPotential Functions for PSP
Potential function
Physical based energy function
Empirical all-atom forcefields: CHARMM, AMBER, ECEPP-3,
GROMOS, OPLS
Parameterization: Quantum mechanical calculations, experimental
data
Simplified potential: UNRES (united residue)
Solvation energy
Implicit solvation model: Generalized Born (GB) model, surface
area based model
Explicit solvation model: TIP3P (computationally expensive)
12
General Form of All-atom ForcefieldsGeneral Form of All-atom Forcefields
pairs ,ticelectrosta
pairs , der Waalsvan
612
Hbonds
1012
dihedralsangles
2
0
bonds
2
0totalcos1
jiij
ji
jiij
ij
ij
ij
ij
ij
ij
ij
b
r
r
B
r
A
r
D
r
C
nKKrrKV
Electrostatic term
H-bonding term
Van der Waals term
Bond stretching term
Dihedral termAngle bending term
r ΦΘ
+ ーO H rr r
The most time demanding part.
13
Search Potential Energy Surface
We are interested in minimum points on Potential Energy Surface (PES)
Conformational search techniques
Energy Minimization
Monte Carlo
Molecular Dynamics
Others: Genetic Algorithm,
Simulated Annealing
14
Energy MinimizationEnergy Minimization
Energy minimization
Methods
First-order minimization: Steepest descent, Conjugate gradient
minimization
Second derivative methods: Newton-Raphson method
Quasi-Newton methods: L-BFGS
Local miminum
15
Monte CarloMonte Carlo
Monte Carlo
In molecular simulations, ‘Monte Carlo’ is an importance sampling technique.1. Make random move and produce a new conformation
2. Calculate the energy change E for the new conformation
3. Accept or reject the move based on the Metropolis criterion
exp( )E
PkT
Boltzmann factor
If E<0, P>1, accept new conformation;
Otherwise: P>rand(0,1), accept, else reject.
16
Monte CarloMonte Carlo
Monte Carlo (MC) algorithmGenerate initial structure R and calculate E(R);
Modify structure R to R’ and calculate E(R’);
Calculate E = E(R’) E(R);
IF E<0, then R R’;
ELSE
Generate random number RAND = rand(0,1);
IF exp( E/KT) > RAND, then R R’;
ENDIF
ENDIF
Repeat for N steps;
Monte Carlo Minimization (MCM) algorithm
Parallel Replica Exchange Monte Carlo algorithm
17
Molecular DynamicsMolecular Dynamics
Molecular Dynamics (MD)
MD simulates the Movements of all the particles in a molecular system by
iteratively solving Newton’s equations of motion.
MC view many frozen butterflies in a museum; MD watch the butterfly fly.
18
Molecular DynamicsMolecular Dynamics
Algorithm
For atom i, Newton’s equation of motion is given by
Here, ri and mi represent the position and mass of atom i and Fi(t) is
the force on atom i at time t. Fi(t) can also be expressed as the
gradient of the potential energy
V is potential energy. Newton’s equation of motion can then relate
the derivative of the potential energy to the changes in position as a
function of time.
2
2
d
di
i i
tt m
t
rFi i iF m a
i iV F 2
2
d
di
i i
tV m
t
r
(1) (2)
(4)(3)
19
Molecular DynamicsMolecular Dynamics
Algorithm (continue)
To obtain the movement trajectory of atom, numerous numerical algorithms
have been developed for integrating the equations of motion. (Verlet algorithm,
Leap-frog algorithm)
Verlet algorithmVerlet algorithm
The algorithm uses the positions and accelerations at time t, and the positions
from the previous step to calculate the new positions
Selection of time stepSelection of time step
Time step is approximately one order of magnitude smaller than the fastest
motion
Hydrogen vibration ~ 10 fs (10-15 s), time step = 1fs
2( ) 2 ( ) ( ) ( )t t t t t t t r r r a
t
( )t tr
20
Molecular DynamicsMolecular Dynamics
MD Software
CHARMM (Chemistry at HARvard Molecular Mechanics) is a program for
macromolecular simulations, including energy minimization, molecular
dynamics and Monte Carlo simulations.
NAMD is a parallel molecular dynamics code designed for high-performance
simulation of large biomolecular systems.
http://www.ks.uiuc.edu/Research/namd/
Application in PSP
Advantage: Deterministic; Provide details of the folding process
Limitation: The protein folding reactions take place at ms level, which is at
the limit of accessible simulation times
It is still difficult to simulate a whole process of a protein folding using
the conventional MD method.
21
Time Scales of Protein Motions and MDTime Scales of Protein Motions and MD
MD Time Scale
10-15 10-610-910-12 10-3 100
(s)(fs) (ps) (μs)(ns) (ms)
Bond stretching
Elastic vibrations of proteins
α-Helix folding
β-Hairpin folding
Protein folding
22
MD is fun!MD is fun!
A small protein A small protein folding movie: folding movie: simulated with simulated with NAMD/VMDNAMD/VMD
23
Other Conformational Search AlgorithmsOther Conformational Search Algorithms
Global optimization algorithms
“Optimization” refers to trying to find the global energy minimum
of a potential surface.
Genetic Algorithm (GA)
Simulated Annealing (SA)
Tabu Search (TS)
Ant Colony Optimization (ACO)
A model system: Lennard Jones clusters
24
Applications of MM methods in PSPApplications of MM methods in PSP
Application in PSP
Combination of several conformational search techniques
Recent developments
Simplified force field: united residue force field
Segment assembly
Secondary structure prediction are quite reliable, so conformation can be
produced by assemble the segments
Ab initio PSP software
Rosetta is a five-stage fragment insertion Metropolis Monte Carlo method
ASTRO-FOLD is a combination of the deterministic BB global optimization
algorithm, and a Molecular Dynamics approach in torsion angle space
LINUS uses a Metropolis Monte Carlo algorithm and a simplified physics-
based force field
25
ASTRO-FOLD
26
ReferencesReferences
Hardin C, et. al. Ab initio protein structure prediction. Curr Opin
Struct Biol. 2002, 12(2):176-81.
Floudas CA, et. al. Advances in protein structure prediction and de
novo protein design: A review. Chemical Engineering Science, 2006,
61: 966-988.
Klepeis JL, Floudas CA, ASTRO-FOLD: a combinatorial and global
optimization framework for ab initio prediction of three dimensinal
structures of proteins from the amino acid sequence, Biophysical
Journal, 2003, 85: 2119-2146.
27
Ab InitioAb Initio Protein Loop Prediction Protein Loop Prediction
Protein loop
Protein loops are polypeptides
connecting more rigid structural
elements of proteins like helices and strands.
Challenge in Loop Structure Prediction
Loop is important to protein folding and protein function even
though their size is small, usually <20 residues
Loops exhibit greater structural variability than helices and strands
Loop prediction is often a limiting factor on fold recognition methods
28
Ab InitioAb Initio Protein Loop Prediction Protein Loop Prediction
Ab initio methods have recently received increased attention in the prediction of protein loop
Potential energy function
Molecular mechanics force field is usually better than statistical potential in protein loop modeling.
Recent progress
Dihedral angle sampling
Clustering
Select representative structures from ensembles
29
Ab InitioAb Initio Loop Prediction Methods Loop Prediction Methods
Loopy
Random tweak
Colony energy
Fiser’s method
MM methods:
Physical energy function
Energy Minimization + MD + SA
Forrest & woolf
Predict membrane protein loop
MM methods: MC + MD
Review:
Floudas C.A. et al, Advances in protein structure prediction and de novo protein
design: A review, Chemical Engineering Science, 2006, vol. 61, 966-988.
30
CLOOP:CLOOP: Ab Initio Ab Initio Loop Modeling Method Loop Modeling Method
CLOOP build all-atom ensemble of protein loop conformations (it
is not a real protein loop prediction method)
Paper
Haiyan Jiang, Christian Blouin, Ab Initio Construction of All-atom Loop
Conformations, Journal of Molecular Modeling, 2006, 12, 221-228.
CLOOP methods
Energy function: CHARMM
Dihedral sampling
Potential smoothing technique
The designed minimization (DM) strategy
Divided loop conformation construction
31
The Energy Function of CHARMM ForcefieldThe Energy Function of CHARMM Forcefield
CHARMM
elecvdwimpdiheangleUBbondsCHARMm EEEEEEEE
bonds
bbonds bbkE 20 )(
UB
UBUB SSkE 20 )(
angle
angle kE 20 )(
dihe
dihe nkE ))(cos(1(
imp
impimp kE 20 )(
nonbond ij
ij
ij
ijijvdw r
R
r
RE
6
min,
12
min, 2 nonbond ij
jielec r
qqE
04
32
CLOOPCLOOP
Dihedral sampling
Loop main-chain dihedral and are generated by sampling main-
chain dihedral angles from a restrained / set
The restrained dihedral range has 11 pair of / dihedral sub-
ranges. It was obtained by adding 100 degree variation on each
state of the 11 / set developed by Mault and James for loop
modeling.
Side chain conformations are built randomly.
33
CLOOPCLOOP
Potential smoothing techniquePotential smoothing technique
A soft core potential provided in CHARMM software package
was applied to smooth non-bonded interactions
softr is the switching distance for the soft core potential
is the distance of the two interacting atoms
CHARMMnonbond nonbondE E softr r
)( CHARMMnonbonded soft nonbondedE k r r E softr r
r
34
CLOOPCLOOP
The designed minimization (DM) strategy
Minimization methods:
steepest descent, conjugate gradient, and adopted basis
Newton-Raphson minimization method
Two stages:
1. Minimize the internal energy terms of loop conformations including
bond, angle, dihedral, and improper
2. The candidates were further minimized with the full CHARMM
energy function including the van der Waal and electrostatic energy
terms.
35
CLOOPCLOOP
Divided loop conformation construction
Generate position of middle residue
Build initial conformation of main chain with dihedral sampling
Build side chain conformation
Run DM and produce closed loop conformation
36
CLOOPCLOOP
Performance of CLOOP
CLOOP was applied to construct the conformations of 4, 8, and 12 residue long loops in Fiser’s loop test set. The average main-chain root mean square deviations (RMSD) obtained in 1000 trials for the 10 different loops of each size are 0.33, 1.27 and 2.77 Å, respectively.
The performance of CLOOP was investigated in two ways. One is to calculate loop energy with a buffer region, and the other is loop only. The buffer region included a region extending up to 10 Å around the loop atoms. In energy minimization, only the loop atoms were allowed to move and all non-loop atoms include those in the buffer region were fixed.
37
Loop Conformations built by CLOOPLoop Conformations built by CLOOP
a. 1gpr_123-126 b. 135l_84-91 c. 1pmy_77-88
38
Performance of CLOOPPerformance of CLOOP
39
ConclusionConclusion
CLOOP can be applied to build a good all-atom conformation
ensemble of loops with size up to 12 residues.
Good efficiency, CLOOP is faster than RAPPER
The contribution of the protein to which a loop is attached (i.e.
the ‘buffer region’ ) facilitates the discrimination of near-
optimal loop structures.
The soft core potentials and a DM strategy are effective
techniques in building loop conformations.
40
Thanks! Thanks!