conformational sampling dragos horvath laboratoire d’infochimie – umr 7177...

Post on 13-Jan-2016

225 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Conformational Sampling

Dragos HorvathDragos Horvath

Laboratoire d’InfoChimie – UMR 7177Laboratoire d’InfoChimie – UMR 7177

horvath@chimie.u-strasbg.frhorvath@chimie.u-strasbg.fr

• Presentation Outline

– The Basics: Molecules have Geometries!

• Intramolecular energy calculation: the Empirical Force Field

– Sampling Methods: a brief overview

– Molecular Casino: MonteCarlo Simulations

– Really Difficult Problems: Darwinism, God’s

Will and Massively Parallel Computing

3

Stable conformation :Minimum of energy

Unstable conformation :High potential energy

Two degrees of freedom

Different « conformations »or « geometries »

of a molecule

• The POTENTIAL ENERGY calculation is based on the EMPIRICAL FORCE FIELD APPROACH – Quantum chemical calculations are too time-consuming: atoms

and their interactions are approximated as “classical” objects – Atoms need to be “parameterized” in function of their chemical

environment: a C atom in an alkane does not carry a same partial charge as a carbonyl C=O!

– Covalent bonds are modeled as harmonic springs. The energy required to stretch or compress a bond by b with respect to its natural length b is expressed as Kbb2

– Valence angle bending modeled by harmonic potential K– Atoms that are not directly bonded or do not form an angle

interact “through space” by means of non-bonded interactions.• Van der Waals interactions• Electrostatics interactions – based on partial charges• Continuum Solvent models

5

Non-bonded interactions :

Coulomb :

Van der Waals :

Desolvation & Hydrophobic Term:

j,icoulomb

d**E

04

6

12

j,i

ji

j,i

jiVdW

dB*B

dA*AE

-

a1

a2

+

jikd

VQVQkE hphob

h

ji

ijjisolvSolv ,4

,

22

6

Global energy :

nbtorsbmolecule EEEEE

Torsional correction terms :

))3cos(1(* ttors kE

E=f(Geometry)

Torsions : the gateway to conformational sampling

- Energy Profile with respect to a torsion....

Torsions : the gateway to conformational sampling

- Energy Surface with respect to two torsions....

Torsions : the gateway to conformational sampling

- Alternative Contour Plot representation

The Ramachandran Plot

http://en.wikipedia.org/wiki/Ramachandran_plot

Key points on the energy surface...

1=0 36 72 108 144 180 216 252 288 324°

2 =

0 60 120 180 240 300

Computing a 2D torsion plot... Not that easy!

E

low

high

?

• Energy Minimization is only [the easy] part of the problem– Given a starting geometry, deterministic algorithms allow the

discovery of the adjacent local minimum– Descent methods follow the local gradient

)(:

,,,...,,,,,)(

,,,...,,,,,

22111

22111

currcurrnew

NNN

NNN

XEsXXyiterativel

z

E

y

E

x

E

y

E

x

E

z

E

y

E

x

EXE

zyxyxzyxXgeometrymolecular

E

X

Bad news: most molecules have more than 2 torsions...

- No visualization of the energy hypersurface is possible!

• Why care for conformational sampling?– Because experimental properties of a molecule are given by the

Boltzmann Average of properties of populated conformers

Boltzmann’s probability distribution:

Tk

EEenergyofconformerP

B

exp~

Boltzmann Averaging:

conformerspopulated

conformerPropertyconformerPopertyObservedPr )()(

Objective : finding the most probable

solutions

That is, the relevant minima

Energy

Geometry

The Challenge…

“Well”-docked(folded) zone

“Misdocked”(folded) conformers

“Misdocked”(folded) conformers

E

E#PDB

PDB

Absolu

te E

nergy

Absolu

te E

nergy

Minim

um

Minim

um

Native-like:

Native-like:

one local clash

one local clashEnergy=f(Geometry)

defined by the Empirical Force Field

Publisher’s Force Field:« Nice H bond »

My Force Field:« Bad Contact »

Microstates contributing to

macroscopic property

• Presentation Outline

– The Basics: Molecules have Geometries!

• Intramolecular energy calculation: the Empirical Force Field

– Sampling Methods: a brief overview

– Molecular Casino: MonteCarlo Simulations

– Really Difficult Problems: Darwinism, God’s

Will and Massively Parallel Computing

• Sampling methods– Systematic <3…4 torsions– Molecular Dynamics

• Solve Newton’s motion equations, given the atomic forces calculated by the force field: simulate “Brownian motion”

– Stochastic sampling:• Monte Carlo simulations• Genetic Algorithms

18

• Presentation Outline

– The Basics: Molecules have Geometries!

• Intramolecular energy calculation: the Empirical Force Field

– Sampling Methods: a brief overview

– Molecular Casino: Monte Carlo Simulations

– Really Difficult Problems: Darwinism, God’s

Will and Massively Parallel Computing

• The Monte Carlo Approach: win an Energy Optimum by Playing Dice!– Take a random geometry– Randomly choose a torsional axis– Apply a Random rotation around that axis– Recalculate the Energy of the thereof resulting

geometry• If lower – or, at least, not too (!) high, accept: make

new conformer new “default” geometry”• Otherwise, reject – restore ancient geometry

– Loop until no further energy drop is observed

21

• Presentation Outline

– The Basics: Molecules have Geometries!

• Intramolecular energy calculation: the Empirical Force Field

– Sampling Methods: a brief overview

– Molecular Casino: MonteCarlo Simulations

– Really Difficult Problems: Darwinism, God’s

Will and Massively Parallel Computing

23

Data representation :

« individual »or

« chromosome »=

list of itstorsional angles

Population of individuals :

… … ...... … … n…

… … ...... … … n…

… … ...... … … n…

… … ...... … … n…

nn-1…

• Genetic Algorithm– Applying a Darwinian Evolution Scenario to a population of

vectors (“chromosomes”) encoding the solution to a problem– Solution Quality is the “Fitness” score, and the fittest survive…

24

Generation of new offspring :

Crossover :… n…i+1i

…’

n…

’i+

1

’i

parent1 :

parent2 :

Mutation :… n…i+1iWild type :

…’

n

’i+1i…

… ni+1’i…’

child1 :

child2 :

… ni+1’i…mutant :

25

intermediate population...

n

... n

... n

... n

... n

... n

... n

... n

random

... n

... n

... n

... n

initial population

sorted

final population...

n

... n

... n

... n

sorted

Evolution of the average fitness,Evolution of the fitness of the best the algorithm converges

selection threshold

energies

Population Diversity Control is a Key Issue> Discarding of redundant chromosomes (requires a metricdefining how similar two encoded solutions are!)

> Multiple ‘Island’ models – parallel simulations occasionally swapping solutions

or God??

Genetic Algorithms: Chance, Selection & the CoinFlipper’s bet!

• Any problem admitting a vector as a solution may be coded by a “chromosome” and left in the hands of Darwin…

• I bet (1M€) I can find a person who won a coin-flipping challenge 10 times in a row, at his/her first attempt!!– In order to fulfill my promise, I need a total of 1024 coin flips

to happen,• 1024/10=102 pretendents, each with a chance of (1/2)10 to score 10

successive winning coin flips: ~90% chance to loose 1M€!

• If you read “Darwin’s Dangerous Idea” by D.C.Dennett, you are not allowed to bet !!

Selection is the Key!

1024 candidates / 512 flips

512 candidates / 256 flips

28

Hybrid strategies: (1) Selective Chromosome Initialization:

- Knowledge-based: favoring locally stable torsions…

polycycle : torsion nr. 1

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0 100 200 300

angle

pro

bab

ilit

é

polycycle : torsion nr. 3

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0 100 200 300

anglep

rob

abil

itie

s

biased torsion probabilities thanks to learning

biased torsion probabilities wrt local Hamiltonian

- ‘Traditionalism’: favoring torsion values seen in previously visited samples

29

Evolution stalled in local minimum,

Mutations will not help!

Add a constraint term forcing 1 to adopt ‘mutant’ value ’1

Gradient optimization, following the new energy

landscape…‘Lamarckian’ move towards

next optimum

Process in parallel to main GAstream in order to avoid halting evolution!

Hybrid Strategies (2): Directed or ‘Lamarckian’ Mutations

Hybrid Heuristics (3) The Taboo Search Dilemma

Evolved Solution

Evolved Solution

“Taboo”Phase space region

??

Search for Optimal Sampling Setups in the Strategy Parameter Space…

p1 p2 p3 p4 p5 p6 p14 p15

Population management

Population size

Number of parallel process

Migration rate between ‘islands’

Evolution management

Crossover rate

Mutation rate

One/two point crossover rate

Selection pressure

Dissimilarity limit

Maximal age

Convergence management

Apocalypse (population reset) frequency

Elitism

Global stop condition

CPUtimeTk

ETkFitness

b

ib .expln._

minimafound

323-fold repeat

Postprocessing…

Run 1

Run 2

Runn

Global Base of

Diverse Conformers

Base of diverse conformers[sampled at current setup]

µ-Fitness!!

Meta-algorithm defines parameter setup

News??

« Tabus »« Tradition »

Meta-GA picksnext set of

configurations

yes

GAMEOVER

no

DirectedMutations

The Island Model

GRID 5000-based ‘Planetary’ Model

If (free node)DEPLOY

Island Model

- Executables- Molecule File- Constraint Files- Seeds List- Taboo List- Operational Pars

-Stablest Chromosomes-Sampling Success Score

Solution Merger& Clusterer

Conformer & Cluster Database

‘Panspermia’ policy center‘recent’ clusters: seeds

‘old’ clusters: taboo

Sampling Success vs.Operational Pars

Stop:max. ‘Mission Nr.’

no new clusters sinceN ‘missions’

www.grid5000.fr

Operational ParsSelector

• Ab initio folding of Trp cage 1L2YTrp cage 1L2Y: native structure (reproducibly) found and ranked as most stable. D&C Planetary model: 20 nodes for 24 hours

PDB PDB

• Ab initio folding of the Villin headpiece 1VIIVillin headpiece 1VII: helical parts are seen to fold in a matter of days (40 nodes) – although not properly oriented.

PDB PDB

• Good news for the -hairpin of ChignolinChignolin: out of the top 10 best ranked conformers, 8 are native-like

• Number one is not – but in this case, that may not be a problem

PDB PDB

#1,#5#1,#5

• However, proper folding of 1LE1 could be achieved (though not reproducibly!) with previous force field versions – is the current setup too helix-specific?

• The 1LE1 -sheet is not the absolute energy minimum according to the current setup!

PDB PDB

• Docking simulations in presence of flexible loops, such as the hinge region of Casein Kinase 2 (3BQC)Casein Kinase 2 (3BQC)

– pose of ligand emodin and loop geometry are correctly predicted (3BQC not in FF training set).

Flexible hinge region

PDB, #1

PDB, #1

• Furthermore, a crystallographic water molecule can be simultaneously docked, being considered as another ligand – and is correctly placed.

Flexible hinge region

Water location converges

Water location converges

towards experimental

towards experimental

position

position

• Docking into GPCRs: (1) Turkey 1-Adrenergic Receptor – Cyanopindolol complex 2VT4, 190 degrees of freedom (ligand and side chains) – 30 days/20 nodes**. Ligand RMS =0.48 A (best pose)

** total run time required to visit ~40000 phase space cells** total run time required to visit ~40000 phase space cells

• Conclusions– Conformational Sampling is the Key Element for Understanding

of Molecular Behavior– It may range from very simple to extremely difficult, to impossible– If you don’t do it well, better don’t do it at all: empirical methods

based on molecular topology only may be more accurate than 3D models based on wrong – or too few – conformations

– Two main sources of errors: A.) wrong calculated energy-geometry landscape (poor Force Field parameterization) and B.) – insufficient sampling!

– Docking is just a specific case of conformational sampling, involving at least two molecules: a binding “site” and one or more “ligands”

– You will often hear that the knowledge of the “bioactive” conformer is paramount to understand binding. This is necessary, but sometimes not sufficient. Note: the “bioactive” conformer may sometimes be quite unstable and almost never populated in the free state.

top related