protein structure refinement ─ a dynamic approach biophysical chemistry university of groningen...
TRANSCRIPT
Protein Structure Refinement─ a dynamic approach
Biophysical ChemistryUniversity of Groningen
The Netherlands
Hao FanHao Fan
• Background:
Why we need refinement?
Molecular dynamics
Test sets of proteins
• Tests of refinement protocols
Explicit water
Implicit water
Ongoing projects
Overview
* Structure is key to knowing functions of proteins
BBackground :: Why we need refinement?
* Predict protein structure from sequence
The gap is growing !
Friedberg et al, Curr. Opin. Stru. Biol. 14, 2004
Moult, Curr. Opin. Stru. Biol. 15, 2005
CASP6T0266
CASP6T0201
* Large progress in knowledge-based methods
* Refinement is required for critical applications (e.g. protein
– ligand/protein interactions)
Background :: Why we need refinement?
* Newton’s equations of motion
)(1
2
2
tfmtd
trdii
i
Ntotali
i rrrVr
tf
,,, 21
Background :: Molecular dynamics
* Boltzmann distribution of energy states
TkEE B exp
i
Biiii
i TkEAAA exp
GROMOS force field
Background :: Molecular dynamics
* Non-bonded interactions:
Electrostatic
van der Waals
* Bonded interactions:BondsAnglesDihedrals
* Semi ab initio models (Rosetta)
Test-set: 15 proteins used by Baker et al. to test Rosetta 4 models for each protein
Model quality: only main-chain and Cβ atoms most models > 4 Å backbone RMSD
Homology models (Modeller & Nest)
Test-set: 30 small proteins (crystal structures, resolution > 1.6 Å,)
Model quality: all atoms most models < 4 Å backbone RMSD
Background :: Test sets of proteins
Refine protein structures─ Can MD help?
System: periodic
Nonbonded: cut-off
Electrostatics: reaction-Field
pH: 7
Temperature: 300 K
interintratotal VVV
diheanglebondintra VVVV
vdwelstinter VVV
Explicit water
• Force field validation
• Refinement in brute-force MD
• Mimicking chaperone in refinement
(I) Solvent
(II) Chaperone cage
Force field validation
Is GROMOS force field precise enough?─ stability of protein native structures
X-ray NMR
Short summary
• Most protein core-regions remain near-native within the simulation time
• Large fluctuations happen in the flexible region (e.g. loops)
• NMR derived structures behave less stable than the structures derived by X-ray
• GROMOS 43a1 can be used for protein structure refinement
Force field validation
Example: mercury detoxification protein 1afi
Starting model: good quality
NMR structureRosetta modelRefinementMD 100 ns
Brute-force MD
RMSD 0.16 nm
RMSD 0.26 nm
Example: mercury detoxification protein 1afi
Starting model: good quality
Brute-force MD
Rosetta modelRefinement
MD 5 nsRefinementMD 400 ns
Example: mercury detoxification protein 1afi
Starting model: bad quality
Brute-force MD
RMSD 0.87 nm
RMSD 0.76 nmRMSD 0.70 nm
Example: mercury detoxification protein 1afi
Starting model: bad quality
Brute-force MD
• For structures close to native conformation, MD refinement can be very effective (10 -100 ns).
• For grossly misfolded structures, current timescales are probably too short.
Short summary
Molecular chaperones facilitate the folding of a wide range of proteins in vivo.
Brute-force MD
Mimicking chaperone
18 nm
GroEL/ES cycle
Mechanisms of assisting folding
• Prevent aggregation• Reaction cycle : iterative binding and folding
• Unfold non-native polypeptide
Hydrophobic interaction with GroEL
Mechanical stess from GroEL upon ATP binding
• Refold non-native polypeptide
Hydrophilic cavity favor burial of hydrophobic surfaces
Confinement limit conformational space
Mimicking chaperone
Increase partial charges (5 ns)
Decrease partial charges (5 ns)
Mimicking chaperone (I)
─ oscillating solvent environment
3. Fan and Mark, Protein Science (2004) 13, 992-999.
O-
H+ H+
SPC water
0.41e0.41e
-0.82e
5 cycle
X-ray structureRosetta model Refinement 50 ns
Mimicking chaperone (I)
Example: vaccinia virus DNA topoisomerase I, 1vccStarting model: bad quality
RMSD 0.59 nm RMSD 0.59 nm
Mimicking chaperone (I)
Rosetta model Refinement50 ns
NMR structure
Mimicking chaperone (I)
Example: S1 RNA binding domain, 1sroStarting model: bad quality
RMSD 0.58 nm(12-76 a.a. 0.45 nm)
RMSD 0.87 nm
Mimicking chaperone (I)
• Spherical folding cavity
• Channels for water flux
Nonpolar: CH2 Polar: NH, CO Repulsive: CY
Mimicking chaperone (II)
─ iterative annealing + spatial confinement
Unfold : binding
6.0 nm 1.5 nm
Mimicking chaperone (II)
5 ns
Refold : release
Control: refold in explicit water
Protocol I Protocol II
Mimicking chaperone (II)
10 ns
N
N
N
X-ray structureRosetta modelRefinement
150 ns
Mimicking chaperone (II)
Example: vaccinia virus DNA topoisomerase I, 1vccStarting model: bad quality
RMSD 0.59 nm RMSD 0.49 nm
Mimicking chaperone (II)
N
N
N
NMR structureRosetta modelRefinement
150 ns
Mimicking chaperone (II)
Example: S1 RNA binding domain, 1sroStarting model: bad quality
RMSD 0.87 nm RMSD 0.42 nm
Mimicking chaperone (II)
Mimicking chaperone (II)
• Repetivive changes in environment induce protein unfolding and refolding
• Spatial confinement facilitates refolding
• Current protocols seem to favor β-sheet formation
• Improved protocol may contribute to protein folding and refinement
Short summary
Mimicking chaperone
exttotal VVV int
iiistoch
ii
meanext
i vmfr
Vf
Implicit water
• Comparison of GB models
Refinement in brute-force SD+GB/SA
Implicit chaperone
i
iipolnppolsol AGGGG
• Poisson equation
rrr 4
vacsolreacreacpol dVrrG ;2
1
a
qGborn
211
2
1
• Born equation
N
i
N
ji ij
jiN
i i
ipol r
a
qG
1
12
111
2
1 2
Implicit water
)4/(exp;1
12
1 22
1 1jiijjiijGB
N
i
N
j GB
jipol rrf
f
qqG
iRrin
i
ipol
i
dVrR
G, 4
, 1
4
11
/112
11
“Pair-wise” approximation: fGB
fGB is nearly accurate as the numerical PB solution if i is correct !
Effective Born radius: i
i can be solved numerically or analytically
GB models
Implicit water
• Still model ( Qiu et al. )
• HCT model ( Hawkins et al. )
• AGB model ( Gallicchio et al. )
bondednon
ij
jstrench bend
ij
j
ij
j
ivdw r
CCFVP
r
VP
r
VP
PRi 4
4
4
3
4
2
1
1
4
11
22
22
2211 11
4ln
2
111
4
11
2
1
ijijij
jij
ij ij
ij
ijijij
ij
ijiji ULr
RS
U
L
rLU
r
ULR
i
j
ijjji
ijjijii V
VSSQSR
i 2
1;
4
111
No explicit volume integration (Vj/rij4)
Connection dependent scaling factors
Two-sphere integral
Atom type dependent scaling factors
Two-sphere integral
Local geometry dependent scaling factors
Implicit water
Implicit watermAGB model
Zhu et al, J. Phys. Chem. 109, 2005
Which GB model is the best ?
Energetics Peptide folding Protein dynamics
Test-set: ten proteins
GB model comparison
Description of proteinsProtein 1afi 1ail 1ctf 1lea 1pgb 1shg 1tuc 1ubi 2bby 2ci2
Reso. — 1.9 1.7 — 1.9 1.8 1.8 1.8 — 2.0
Nres 72 70 68 72 56 57 61 76 69 63
Nα 21 60 38 39 14 3 3 12 35 13
Nβ 21 0 18 6 30 28 25 23 4 22
Ncharge 3 2 -2 2 -4 1 0 0 3 -1
Analysis of structure deviation
• Positional root mean square deviation (RMSD)
• Deviation of structural properties:
Rg: Radius of gyration
SASA: Solvent accessible surface area
NHB: Main-chain hydrogen bonds
NC: Sidechain contacts
• Statistic analysis: two-way analysis of variance (ANOVA)
• Key factors:
Starting velocities, Simulation time scale, Friction coefficient of solvent
1exp
Value
ValueDEV simu
Null hyphothesis H0: μ1=μ2=μ…
GB model comparison
Analysis of RMSD
ANOVA
the probability of H0 being true is 26% and 32%
Mean RMSD ( Still ~ HCT ~ mAGB )
0.24 ~ 0.27 ~ 0.23 0.18 ~ 0.19 ~ 0.16
Not significant
GB model comparison
Analysis of Rg
Mean deviation of Rg ( Still ~ HCT ~ mAGB )
0.012 ~ -0.016 ~ 0.001
ANOVA
the probability of H0 being true is 0.5% Significant
GB model comparison
Short summary
• All three GB/SA models provide reasonable presentation of solvent effect
• mAGB model shows statistically apparent advantages in keeping native hydrogen bonds and radius of gyration
• Highlight the importance of statistical analysis in MD simulation
GB model comparison
• Brute-force SD+GB/SA refinement
• Implicit chaperone
• Replica-exchange MD
Ongoing projects
Spatial quadrature + mobile cavity
Replica Si (i=1, … , M) at temperature Ti
Exchange replicas with Metropolis criterion
Brute-force SD+GB/SA
Reparameterize Still and mAGB models on 4 other proteins ─ Monte-Carlo simulated annealing approach
Backbone RMSD in secondary structuresProteins 1afi 1ail 1ctf 1lea 1pgb 1shg 1tuc 1ubi 2bby 2ci2 ave
magb 0.18 0.16 0.13 0.22 0.27 0.13 0.11 0.10 0.12 0.16 0.16
still 0.16 0.20 0.18 0.11 0.29 0.11 0.18 0.09 0.20 0.27 0.18
magb_new 0.14 0.19 0.21 0.13 0.25 0.12 0.10 0.08 0.12 0.19 0.15
still_new 0.19 0.15 0.14 0.19 0.25 0.11 0.16 0.07 0.17 0.12 0.16
Protein dynamics: a comparison to previous study
ANOVA
Not Significant
NHB in secondary structuresProteins 1afi 1ail 1ctf 1lea 1pgb 1shg 1tuc 1ubi 2bby 2ci2 ave. dev.
exp. 29 49 41 25 31 19 16 22 21 17
magb 19 40 34 20 14 15 14 19 16 12 -0.245
still 18 37 28 20 15 13 10 18 12 11 -0.331
magb_new 21 39 30 22 22 15 14 19 16 13 -0.214
still_new 18 43 33 21 23 15 13 20 16 14 -0.199
RgProteins 1afi 1ail 1ctf 1lea 1pgb 1shg 1tuc 1ubi 2bby 2ci2 ave. dev.
exp. 1.09 1.28 1.12 1.14 1.06 1.03 1.09 1.17 1.20 1.12
magb 1.14 1.28 1.12 1.15 1.05 1.06 1.08 1.17 1.17 1.09 0.001
still 1.13 1.30 1.13 1.15 1.05 1.06 1.12 1.18 1.18 1.13 0.012
magb_new 1.13 1.30 1.10 1.11 1.04 1.05 1.06 1.18 1.16 1.09 -0.005
still_new 1.10 1.31 1.13 1.12 1.04 1.05 1.06 1.15 1.19 1.09 -0.004
Significant
Brute-force SD+GB/SA
bbrmsd Initial MD Still_new mAGB_new
min peak max min peak max min peak max
1aoy 0.59 0.63 0.71 0.78 0.60 0.69 0.73 0.63 0.74 0.78
1stu 0.62 0.65 1.00 1.00 0.61 0.75 0.81 0.52 0.56 0.68
1vif 0.62 0.63 0.78 0.82 0.61 0.65 0.75 0.66 0.73 0.79
1sro 0.45 0.40 0.44 0.50 0.36 0.47 0.55 0.38 0.56 0.62
1tuc 0.59 0.61 0.71 0.73 0.61 0.68 0.73 0.62 0.69 0.78
1sap 0.35 0.33 0.58 0.62 0.28 0.39 0.48 0.35 0.52 0.56
1afi 0.26 0.16 0.20 0.33 0.22 0.29 0.36 0.19 0.23 0.29
1vcc 0.59 0.57 0.68 0.73 0.52 0.62 0.66 0.55 0.59 0.69
2bby 0.53 0.47 0.67 0.71 0.53 0.73 0.79 0.53 0.63 0.68
2fmr 0.44 0.43 0.58 0.64 0.39 0.43 0.53 0.37 0.43 0.51
1a1z 0.55 0.41 0.46 0.54 0.43 0.47 0.64 0.36 0.40 0.57
1ail 0.54 0.53 0.61 0.79 0.45 0.53 0.59 0.54 0.59 0.65
1coo 0.58 0.57 0.81 0.87 0.57 0.71 0.75 0.55 0.75 0.84
1lea 0.40 0.41 0.50 0.54 0.41 0.53 0.66 0.41 0.62 0.67
2ezh 0.34 0.32 0.39 0.45 0.32 0.43 0.52 0.32 0.37 0.49
Brute-force SD+GB/SA
Short comments
• No significantly improvement from GB models on the semi ab initio models
• Homology models of better quality ?
• Combined with other advanced sampling techniques ?
Brute-force SD+GB/SA
Outlook to refinement
• Can we refine most protein models?
• Methods could be helpful:
Application of advanced sampling methods
Combination of simplified and atomic models
Combination of knowledge-based and physical potentials
Final solution for refinement even folding may lie ahead !
Currently No …
University of Groningen
Ruud Scheek
Utrecht University
Johan Kemmink
Alan Mark: my Ph.D supervisor
NMR refinement
Columbia University
Barry Honig
Jiang Zhu
GB models &
Structure refinement
University of Groningen
MD group
Xavier Periole
Tsjerk Wassenaar
Alessandra Villa
REMD
Statistic analysis
Force field development
University of Groningen
MD group
Siewert-Jan Marrink
Alex de Vries Membrane/protein
Max-Planck-Institute
Berk Hess
Stockholm University
Erik Lindahl
BioMaDe
George Robillard
Xiaoqin Wang
Gromacs
Hydrophobin