modeling molecular dynamics from simulations nina singhal hinrichs departments of computer science...

Modeling molecular dynamics from simulations

Nina Singhal HinrichsDepartments of Computer Science and Statistics

University of Chicago

January 28, 2009

Motivation

• Proteins are essential parts of living organisms– enzymes, cell signaling, membrane

transport . . .

• Composed of chain of amino acids• Fold to unique 3-dimensional

structure• Misfolding can cause diseases

– Alzheimer’s, Mad cow, Huntington’s . . .

• How do proteins fold?

Molecular dynamics

• Represent atoms of molecule and solvent

• Model forces on atoms

• Integrate laws of motion

• Small integration time step compared to motion timescales

Folding@Home: Distributed computing for biomolecular simulation

• Perform multiple simulations in parallel

• Total simulation times – hundreds of microseconds (hundreds of CPU-years)

Very powerful computational resource– ~200 Teraflops sustained performance– >1,000,000 total CPUs; 200,000 active

Challenge: How to analyze?

• Enormous datasets– Describe dynamics in microscopic detail

• Questions we want to answer– Rate of folding, mechanism of folding . . .

• How can we extract these properties from our data?

Outline

• Markovian state model for molecular motion– Model description, uses, examples

• New algorithms for building these models– Defining states and transition probabilities

• New methods for dealing with finite sampling– Model complexity, uncertainty analysis, targeted

sampling

Chemical intuition

Chemical reactions often exhibit stochastic behavior

n-butane

Chandler, Journal of Chemical Physics (1977)

1

2 34

5

Markovian state model

Define transition probabilities, or edges, between states

Define states in the conformation space

NNN

N

pp

pp

ppp

1

2221

11211

1

2 34

5

Uses of the model

• Populations of states over time

• Eigenvalues and eigenvectors – conformational changes

• Kinetic properties – virtually any kinetic property

• Mechanistic properties – most likely path, probability of transitions as graph algorithms

Chodera et al., Multiscale Modeling and Simulation (2006)

t

p

Example models

Chodera et al., Multiscale Modeling and Simulation (2006)

Kasson et al., PNAS (2006)

lipid vesicle fusionalanine peptide

Sorin and Pande, Biophysical Journal (2005) Jayachandran et al., Journal of Structural Biology (2006)

villin headpiece

alpha helix

• Building Markovian state model

– Defining states that are Markovian

– Calculating the transition probabilities

• Refining Markovian state model

– Finding the best model

– Determining model uncertainty

– Designing new simulations

Computational and statistical challenges

1

2 34

5

p11 p12 p1Np21 p22 pN1 pNN

1

2 34

5

• Challenge: Find appropriate states

• Individual conformations as states does not scale

• Group conformations into discrete states

• Structural clustering is insufficient

• Basic algorithm – combine structural and kinetic similarity

Automatic state decomposition

J. D. Chodera*, N. Singhal*, V. S. Pande, K. A. Dill, and W. C. Swope. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. Journal of Chemical Physics, 126, 155101 (2007). (*These authors contributed equally to this work)

• Building Markovian State Model– Defining states that are Markovian– Calculating the transition probabilities

Comparison of structural and kinetic clustering

structural clustering kinetic clustering

trpzip2Cochran et al. PNAS 98:5578, 2001.

State decomposition – splitting

Cluster conformations by root mean square distance (RMSD)

State decomposition – lumping

group states which inter-convert quickly

State decomposition – resplitting

Cluster conformations, restricted to each state

Blocked alanine peptide

60

-60 60

-60

1 2

3 4

6

5Chodera et al., Multiscale Modeling

and Simulation (2006)

Automatic state decomposition of alanine peptide

Black state sits on top of multiple other states!

Benefit of automatic algorithm

These conformations had an unusual peptide bond

Stability of decomposition

TrpZip peptide

N. Singhal, C. D. Snow, and V. S. Pande. Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a trp zipper beta hairpin. Journal of Chemical Physics, 121(1), 415-425 (2004).

Transition probabilities

1

23

4

5

Discretize trajectories into series of states

1223435

normalize

NNN

N

pp

pp

ppp

1

2221

11211

Count number of transitions between all pairs of states

NNN

N

zz

zz

zzz

1

2221

11211

transition counts transition probabilities

• Building Markovian State Model– Defining states that are Markovian– Calculating the transition probabilities

Model selection

• Challenge: How many states should we have?

– More states are more Markovian

– More states have more parameters

• How do we evaluate this tradeoff?

N. S. Hinrichs and V. S. Pande. Bayesian metrics for validating and improving Markovian state models for molecular dynamics simulations. (In preparation)

• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations

Hidden Markov Model formulation

• Formulate the problem as a Hidden Markov Model structure scoring question

• Different discretizations of continuous space

• Benefits of Bayesian scores– Naturally handles tradeoff between complexity of model and

amount of data– Avoids over-fitting of parameters

States

Observations

Alanine peptide results

Score of Hidden Markov models for different lag times

Last model is worse at shorter times but preferred at longer times

No previous evaluation methods could distinguish these models

Uncertainty analysis

Goal: Once we have the states, what is the uncertainty in the model?

Both are reasonable but give different transition probabilities

Different MFPT, Pfold, eigenvalues, eigenvectors ...

N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, 204909-204921 (2005).N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, 244101 (2007).

1

2 34

51

2 34

5

Uncertainty caused by finite sampling


Transition probabilities

Recall that we calculate transition probabilities by counting:

kik

ijij z

zp

)()|()|( *i*i*i ppp PcountsPcountsP

Instead of getting a single value, we can talk about the distribution of transition probabilities

Bayes’ Rule:

pij

i

70

30 k

j

i

700

300 k

j

Sampling approach

Possible solution to get distribution of eigenvalues:

Problem:sampling can be expensivesolving per sample can be expensive

solve for eigenvalue[pij] solve for

eigenvalue[pij] solve for

eigenvalue[pij]

Closed-form solution

Idea: trade exact distribution for efficient approximation

Taylor series expansion:

NNNN

pp

pp

pp

AAA

1212

1111

efficient to calculate using adjoint systems

Multivariate normal approximation of pi*

Closed-form normal distribution for

Eigenvalue equation:

0)det( A

IP

Uncertainty results

5000 trajectories from each state

Running times (6 states)

Sampling-based: 40 seconds Closed-form: < 0.01 seconds

4926620057

1849784000

034823158133

0022646041169

00014788211

002151534380

Alanine System Transition Counts

1 2

3 4

6

5

Running times (87 states)

Sampling-based: 3600 seconds Closed-form: < 0.07 seconds

Sampling strategies

Problem: Simulations are expensive. Even with Folding@Home, we run simulations for months

How to intelligently allocate our resources?

Common approaches:• equilibrium sampling – sample each conformation from

its equilibrium distribution• even sampling – sample equally from each state

New sequential approaches

N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, 204909-204921 (2005).N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, 244101 (2007).


Adaptive sampling

Goal: Reduce uncertainty of eigenvalue

Uncertainty analysis decomposes by transitions from each state

NNNN

NN

NN

NN

pp

pp

pp

pp

pp

pp

AA

AA

AA

11

22

2121

11

1111

Variance depends on both uncertainty of and sensitivity to transition probabilities

Adaptive sampling – alanine

On 6-state alanine system, select trajectories randomly for 3 sampling strategies

4926620057

1849784000

034823158133

0022646041169

00014788211

002151534380

Transition Counts

Adaptive sampling – villin

• Benefits– Very quickly reduce the

variance– Reduce the total number of

simulations– Need less computational

power– Can study more complex

systems

Villin HeadpieceJayachandran, et al.,

Journal of Chemical Physics (2006)

2454 states

Summary

• Markovian state models are convenient methods to describe molecular motion

• Automatic state decomposition– Scalable to large size systems

• Model selection– Evaluate tradeoff between model complexity and

amount of data

• Uncertainty analysis– Efficient and decomposable

• Adaptive sampling– Reduce number of simulations

Acknowledgements

• Vijay Pande – Stanford University adviser

• Bill Swope, Jed Pitera – IBM collaborators

• John Chodera – state decomposition work

modeling molecular dynamics from simulations nina singhal hinrichs departments of computer science...

Documents

model populations of

building markovian state

model uncertainty

active slide

outline markovian state

best model

targeted sampling slide

t p slide