protein structure refinement ─ a dynamic approach biophysical chemistry university of groningen...

Protein Structure Refinement─ a dynamic approach

Biophysical ChemistryUniversity of Groningen

The Netherlands

Hao FanHao Fan

• Background:

Why we need refinement?

Molecular dynamics

Test sets of proteins

• Tests of refinement protocols

Explicit water

Implicit water

Ongoing projects

Overview

* Structure is key to knowing functions of proteins

BBackground :: Why we need refinement?

* Predict protein structure from sequence

The gap is growing !

Friedberg et al, Curr. Opin. Stru. Biol. 14, 2004

Moult, Curr. Opin. Stru. Biol. 15, 2005

CASP6T0266

CASP6T0201

* Large progress in knowledge-based methods

* Refinement is required for critical applications (e.g. protein

– ligand/protein interactions)

Background :: Why we need refinement?

* Newton’s equations of motion

)(1

2

2

tfmtd

trdii

i

Ntotali

i rrrVr

tf

,,, 21

Background :: Molecular dynamics

* Boltzmann distribution of energy states

TkEE B exp

i

Biiii

i TkEAAA exp

GROMOS force field

Background :: Molecular dynamics

* Non-bonded interactions:

Electrostatic

van der Waals

* Bonded interactions:BondsAnglesDihedrals

* Semi ab initio models (Rosetta)

Test-set: 15 proteins used by Baker et al. to test Rosetta 4 models for each protein

Model quality: only main-chain and Cβ atoms most models > 4 Å backbone RMSD

Homology models (Modeller & Nest)

Test-set: 30 small proteins (crystal structures, resolution > 1.6 Å,)

Model quality: all atoms most models < 4 Å backbone RMSD

Background :: Test sets of proteins

Refine protein structures─ Can MD help?

System: periodic

Nonbonded: cut-off

Electrostatics: reaction-Field

pH: 7

Temperature: 300 K

interintratotal VVV

diheanglebondintra VVVV

vdwelstinter VVV

Explicit water

• Force field validation

• Refinement in brute-force MD

• Mimicking chaperone in refinement

(I) Solvent

(II) Chaperone cage

Force field validation

Is GROMOS force field precise enough?─ stability of protein native structures

X-ray NMR

Short summary

• Most protein core-regions remain near-native within the simulation time

• Large fluctuations happen in the flexible region (e.g. loops)

• NMR derived structures behave less stable than the structures derived by X-ray

• GROMOS 43a1 can be used for protein structure refinement

Force field validation

Example: mercury detoxification protein 1afi

Starting model: good quality

NMR structureRosetta modelRefinementMD 100 ns

Brute-force MD

RMSD 0.16 nm

RMSD 0.26 nm


Starting model: good quality

Brute-force MD

Rosetta modelRefinement

MD 5 nsRefinementMD 400 ns


Starting model: bad quality

Brute-force MD

RMSD 0.87 nm

RMSD 0.76 nmRMSD 0.70 nm


Starting model: bad quality

Brute-force MD

• For structures close to native conformation, MD refinement can be very effective (10 -100 ns).

• For grossly misfolded structures, current timescales are probably too short.

Short summary

Molecular chaperones facilitate the folding of a wide range of proteins in vivo.

Brute-force MD

Mimicking chaperone

18 nm

GroEL/ES cycle

Mechanisms of assisting folding

• Prevent aggregation• Reaction cycle : iterative binding and folding

• Unfold non-native polypeptide

Hydrophobic interaction with GroEL

Mechanical stess from GroEL upon ATP binding

• Refold non-native polypeptide

Hydrophilic cavity favor burial of hydrophobic surfaces

Confinement limit conformational space

Mimicking chaperone

Increase partial charges (5 ns)

Decrease partial charges (5 ns)

Mimicking chaperone (I)

─ oscillating solvent environment

3. Fan and Mark, Protein Science (2004) 13, 992-999.

O-

H+ H+

SPC water

0.41e0.41e

-0.82e

5 cycle

X-ray structureRosetta model Refinement 50 ns


Example: vaccinia virus DNA topoisomerase I, 1vccStarting model: bad quality

RMSD 0.59 nm RMSD 0.59 nm


Rosetta model Refinement50 ns

NMR structure


Example: S1 RNA binding domain, 1sroStarting model: bad quality

RMSD 0.58 nm(12-76 a.a. 0.45 nm)

RMSD 0.87 nm


• Spherical folding cavity

• Channels for water flux

Nonpolar: CH2 Polar: NH, CO Repulsive: CY

Mimicking chaperone (II)

─ iterative annealing + spatial confinement

Unfold : binding

6.0 nm 1.5 nm


5 ns

Refold : release

Control: refold in explicit water

Protocol I Protocol II


10 ns

N

N

N

X-ray structureRosetta modelRefinement

150 ns


Example: vaccinia virus DNA topoisomerase I, 1vccStarting model: bad quality

N

N

N

NMR structureRosetta modelRefinement

150 ns


Example: S1 RNA binding domain, 1sroStarting model: bad quality

• Repetivive changes in environment induce protein unfolding and refolding

• Spatial confinement facilitates refolding

• Current protocols seem to favor β-sheet formation

• Improved protocol may contribute to protein folding and refinement

Short summary

Mimicking chaperone

exttotal VVV int

iiistoch

ii

meanext

i vmfr

Vf

Implicit water

• Comparison of GB models

Refinement in brute-force SD+GB/SA

Implicit chaperone

i

iipolnppolsol AGGGG

• Poisson equation

rrr 4

vacsolreacreacpol dVrrG ;2

1

a

qGborn

211

2

1

• Born equation

N

i

N

ji ij

jiN

i i

ipol r

qq

a

qG

1

12

111

2

1 2

Implicit water

)4/(exp;1

12

1 22

1 1jiijjiijGB

N

i

N

j GB

jipol rrf

f

qqG

iRrin

i

ipol

i

dVrR

G, 4

, 1

4

11

/112

11

“Pair-wise” approximation: fGB

fGB is nearly accurate as the numerical PB solution if i is correct !

Effective Born radius: i

i can be solved numerically or analytically

GB models

Implicit water

• Still model ( Qiu et al. )

• HCT model ( Hawkins et al. )

• AGB model ( Gallicchio et al. )

bondednon

ij

jstrench bend

ij

j

ij

j

ivdw r

CCFVP

r

VP

r

VP

PRi 4

4

4

3

4

2

1

1

4

11

22

22

2211 11

4ln

2

111

4

11

2

1

ijijij

jij

ij ij

ij

ijijij

ij

ijiji ULr

RS

U

L

rLU

r

ULR

i

j

ijjji

ijjijii V

VSSQSR

i 2

1;

4

111

No explicit volume integration (Vj/rij4)

Connection dependent scaling factors

Two-sphere integral

Atom type dependent scaling factors

Two-sphere integral

Local geometry dependent scaling factors

Implicit water

Implicit watermAGB model

Zhu et al, J. Phys. Chem. 109, 2005

Which GB model is the best ?

Energetics Peptide folding Protein dynamics

Test-set: ten proteins

GB model comparison

Description of proteinsProtein 1afi 1ail 1ctf 1lea 1pgb 1shg 1tuc 1ubi 2bby 2ci2

Reso. — 1.9 1.7 — 1.9 1.8 1.8 1.8 — 2.0

Nres 72 70 68 72 56 57 61 76 69 63

Nα 21 60 38 39 14 3 3 12 35 13

Nβ 21 0 18 6 30 28 25 23 4 22

Ncharge 3 2 -2 2 -4 1 0 0 3 -1

Analysis of structure deviation

• Positional root mean square deviation (RMSD)

• Deviation of structural properties:

Rg: Radius of gyration

SASA: Solvent accessible surface area

NHB: Main-chain hydrogen bonds

NC: Sidechain contacts

• Statistic analysis: two-way analysis of variance (ANOVA)

• Key factors:

Starting velocities, Simulation time scale, Friction coefficient of solvent

1exp

Value

ValueDEV simu

Null hyphothesis H0: μ1=μ2=μ…

GB model comparison

Analysis of RMSD

ANOVA

the probability of H0 being true is 26% and 32%

Mean RMSD ( Still ~ HCT ~ mAGB )

0.24 ~ 0.27 ~ 0.23 0.18 ~ 0.19 ~ 0.16

Not significant

GB model comparison

Analysis of Rg

Mean deviation of Rg ( Still ~ HCT ~ mAGB )

0.012 ~ -0.016 ~ 0.001

ANOVA

the probability of H0 being true is 0.5% Significant

GB model comparison

Short summary

• All three GB/SA models provide reasonable presentation of solvent effect

• mAGB model shows statistically apparent advantages in keeping native hydrogen bonds and radius of gyration

• Highlight the importance of statistical analysis in MD simulation

GB model comparison

• Brute-force SD+GB/SA refinement

• Implicit chaperone

• Replica-exchange MD

Ongoing projects

Spatial quadrature + mobile cavity

Replica Si (i=1, … , M) at temperature Ti

Exchange replicas with Metropolis criterion

Brute-force SD+GB/SA

Reparameterize Still and mAGB models on 4 other proteins ─ Monte-Carlo simulated annealing approach

Backbone RMSD in secondary structuresProteins 1afi 1ail 1ctf 1lea 1pgb 1shg 1tuc 1ubi 2bby 2ci2 ave

magb 0.18 0.16 0.13 0.22 0.27 0.13 0.11 0.10 0.12 0.16 0.16

still 0.16 0.20 0.18 0.11 0.29 0.11 0.18 0.09 0.20 0.27 0.18

magb_new 0.14 0.19 0.21 0.13 0.25 0.12 0.10 0.08 0.12 0.19 0.15

still_new 0.19 0.15 0.14 0.19 0.25 0.11 0.16 0.07 0.17 0.12 0.16

Protein dynamics: a comparison to previous study

ANOVA

Not Significant

NHB in secondary structuresProteins 1afi 1ail 1ctf 1lea 1pgb 1shg 1tuc 1ubi 2bby 2ci2 ave. dev.

exp. 29 49 41 25 31 19 16 22 21 17

magb 19 40 34 20 14 15 14 19 16 12 -0.245

still 18 37 28 20 15 13 10 18 12 11 -0.331

magb_new 21 39 30 22 22 15 14 19 16 13 -0.214

still_new 18 43 33 21 23 15 13 20 16 14 -0.199

RgProteins 1afi 1ail 1ctf 1lea 1pgb 1shg 1tuc 1ubi 2bby 2ci2 ave. dev.

exp. 1.09 1.28 1.12 1.14 1.06 1.03 1.09 1.17 1.20 1.12

magb 1.14 1.28 1.12 1.15 1.05 1.06 1.08 1.17 1.17 1.09 0.001

still 1.13 1.30 1.13 1.15 1.05 1.06 1.12 1.18 1.18 1.13 0.012

magb_new 1.13 1.30 1.10 1.11 1.04 1.05 1.06 1.18 1.16 1.09 -0.005

still_new 1.10 1.31 1.13 1.12 1.04 1.05 1.06 1.15 1.19 1.09 -0.004

Significant


bbrmsd Initial MD Still_new mAGB_new

min peak max min peak max min peak max

1aoy 0.59 0.63 0.71 0.78 0.60 0.69 0.73 0.63 0.74 0.78

1stu 0.62 0.65 1.00 1.00 0.61 0.75 0.81 0.52 0.56 0.68

1vif 0.62 0.63 0.78 0.82 0.61 0.65 0.75 0.66 0.73 0.79

1sro 0.45 0.40 0.44 0.50 0.36 0.47 0.55 0.38 0.56 0.62

1tuc 0.59 0.61 0.71 0.73 0.61 0.68 0.73 0.62 0.69 0.78

1sap 0.35 0.33 0.58 0.62 0.28 0.39 0.48 0.35 0.52 0.56

1afi 0.26 0.16 0.20 0.33 0.22 0.29 0.36 0.19 0.23 0.29

1vcc 0.59 0.57 0.68 0.73 0.52 0.62 0.66 0.55 0.59 0.69

2bby 0.53 0.47 0.67 0.71 0.53 0.73 0.79 0.53 0.63 0.68

2fmr 0.44 0.43 0.58 0.64 0.39 0.43 0.53 0.37 0.43 0.51

1a1z 0.55 0.41 0.46 0.54 0.43 0.47 0.64 0.36 0.40 0.57

1ail 0.54 0.53 0.61 0.79 0.45 0.53 0.59 0.54 0.59 0.65

1coo 0.58 0.57 0.81 0.87 0.57 0.71 0.75 0.55 0.75 0.84

1lea 0.40 0.41 0.50 0.54 0.41 0.53 0.66 0.41 0.62 0.67

2ezh 0.34 0.32 0.39 0.45 0.32 0.43 0.52 0.32 0.37 0.49


Short comments

• No significantly improvement from GB models on the semi ab initio models

• Homology models of better quality ?

• Combined with other advanced sampling techniques ?


Outlook to refinement

• Can we refine most protein models?

• Methods could be helpful:

Application of advanced sampling methods

Combination of simplified and atomic models

Combination of knowledge-based and physical potentials

Final solution for refinement even folding may lie ahead !

Currently No …

University of Groningen

Ruud Scheek

Utrecht University

Johan Kemmink

Alan Mark: my Ph.D supervisor

NMR refinement

Columbia University

Barry Honig

Jiang Zhu

GB models &

Structure refinement


MD group

Xavier Periole

Tsjerk Wassenaar

Alessandra Villa

REMD

Statistic analysis

Force field development


MD group

Siewert-Jan Marrink

Alex de Vries Membrane/protein

Max-Planck-Institute

Berk Hess

Stockholm University

Erik Lindahl

BioMaDe

George Robillard

Xiaoqin Wang

Gromacs

Hydrophobin

protein structure refinement ─ a dynamic approach biophysical chemistry university of groningen...

Documents

mercury detoxification

protein model quality

md refinement

afistarting model

bad qualitybruteforce

nsbruteforce mdrmsd

structures close

misfolded structures