modeling molecular dynamics from simulations nina singhal hinrichs departments of computer science...

35
Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Upload: holly-clark

Post on 25-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Modeling molecular dynamics from simulations

Nina Singhal HinrichsDepartments of Computer Science and Statistics

University of Chicago

January 28, 2009

Page 2: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Motivation

• Proteins are essential parts of living organisms– enzymes, cell signaling, membrane

transport . . .

• Composed of chain of amino acids• Fold to unique 3-dimensional

structure• Misfolding can cause diseases

– Alzheimer’s, Mad cow, Huntington’s . . .

• How do proteins fold?

Page 3: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Molecular dynamics

• Represent atoms of molecule and solvent

• Model forces on atoms

• Integrate laws of motion

• Small integration time step compared to motion timescales

Page 4: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Folding@Home: Distributed computing for biomolecular simulation

• Perform multiple simulations in parallel

• Total simulation times – hundreds of microseconds (hundreds of CPU-years)

Very powerful computational resource– ~200 Teraflops sustained performance– >1,000,000 total CPUs; 200,000 active

Page 5: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Challenge: How to analyze?

• Enormous datasets– Describe dynamics in microscopic detail

• Questions we want to answer– Rate of folding, mechanism of folding . . .

• How can we extract these properties from our data?

Page 6: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Outline

• Markovian state model for molecular motion– Model description, uses, examples

• New algorithms for building these models– Defining states and transition probabilities

• New methods for dealing with finite sampling– Model complexity, uncertainty analysis, targeted

sampling

Page 7: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Chemical intuition

Chemical reactions often exhibit stochastic behavior

n-butane

Chandler, Journal of Chemical Physics (1977)

Page 8: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

1

2 34

5

Markovian state model

Define transition probabilities, or edges, between states

Define states in the conformation space

NNN

N

pp

pp

ppp

1

2221

11211

1

2 34

5

Page 9: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Uses of the model

• Populations of states over time

• Eigenvalues and eigenvectors – conformational changes

• Kinetic properties – virtually any kinetic property

• Mechanistic properties – most likely path, probability of transitions as graph algorithms

Chodera et al., Multiscale Modeling and Simulation (2006)

t

p

Page 10: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Example models

Chodera et al., Multiscale Modeling and Simulation (2006)

Kasson et al., PNAS (2006)

lipid vesicle fusionalanine peptide

Sorin and Pande, Biophysical Journal (2005) Jayachandran et al., Journal of Structural Biology (2006)

villin headpiece

alpha helix

Page 11: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

• Building Markovian state model

– Defining states that are Markovian

– Calculating the transition probabilities

• Refining Markovian state model

– Finding the best model

– Determining model uncertainty

– Designing new simulations

Computational and statistical challenges

1

2 34

5

p11 p12 p1Np21 p22 pN1 pNN

1

2 34

5

Page 12: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

• Challenge: Find appropriate states

• Individual conformations as states does not scale

• Group conformations into discrete states

• Structural clustering is insufficient

• Basic algorithm – combine structural and kinetic similarity

Automatic state decomposition

J. D. Chodera*, N. Singhal*, V. S. Pande, K. A. Dill, and W. C. Swope. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. Journal of Chemical Physics, 126, 155101 (2007). (*These authors contributed equally to this work)

• Building Markovian State Model– Defining states that are Markovian– Calculating the transition probabilities

Page 13: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Comparison of structural and kinetic clustering

structural clustering kinetic clustering

trpzip2Cochran et al. PNAS 98:5578, 2001.

Page 14: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

State decomposition – splitting

Cluster conformations by root mean square distance (RMSD)

Page 15: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

State decomposition – lumping

group states which inter-convert quickly

Page 16: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

State decomposition – resplitting

Cluster conformations, restricted to each state

Page 17: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Blocked alanine peptide

60

-60 60

-60

1 2

3 4

6

5Chodera et al., Multiscale Modeling

and Simulation (2006)

Page 18: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Automatic state decomposition of alanine peptide

Black state sits on top of multiple other states!

Benefit of automatic algorithm

These conformations had an unusual peptide bond

Page 19: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Stability of decomposition

Page 20: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

TrpZip peptide

Page 21: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

N. Singhal, C. D. Snow, and V. S. Pande. Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a trp zipper beta hairpin. Journal of Chemical Physics, 121(1), 415-425 (2004).

Transition probabilities

1

23

4

5

Discretize trajectories into series of states

1223435

normalize

NNN

N

pp

pp

ppp

1

2221

11211

Count number of transitions between all pairs of states

NNN

N

zz

zz

zzz

1

2221

11211

transition counts transition probabilities

• Building Markovian State Model– Defining states that are Markovian– Calculating the transition probabilities

Page 22: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Model selection

• Challenge: How many states should we have?

– More states are more Markovian

– More states have more parameters

• How do we evaluate this tradeoff?

N. S. Hinrichs and V. S. Pande. Bayesian metrics for validating and improving Markovian state models for molecular dynamics simulations. (In preparation)

• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations

Page 23: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Hidden Markov Model formulation

• Formulate the problem as a Hidden Markov Model structure scoring question

• Different discretizations of continuous space

• Benefits of Bayesian scores– Naturally handles tradeoff between complexity of model and

amount of data– Avoids over-fitting of parameters

States

Observations

Page 24: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Alanine peptide results

Score of Hidden Markov models for different lag times

Last model is worse at shorter times but preferred at longer times

No previous evaluation methods could distinguish these models

Page 25: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Uncertainty analysis

Goal: Once we have the states, what is the uncertainty in the model?

Both are reasonable but give different transition probabilities

Different MFPT, Pfold, eigenvalues, eigenvectors ...

N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, 204909-204921 (2005).N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, 244101 (2007).

1

2 34

51

2 34

5

Uncertainty caused by finite sampling

• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations

Page 26: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Transition probabilities

Recall that we calculate transition probabilities by counting:

kik

ijij z

zp

)()|()|( *i*i*i ppp PcountsPcountsP

Instead of getting a single value, we can talk about the distribution of transition probabilities

Bayes’ Rule:

pij

i

70

30 k

j

i

700

300 k

j

Page 27: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Sampling approach

Possible solution to get distribution of eigenvalues:

Problem:sampling can be expensivesolving per sample can be expensive

solve for eigenvalue[pij] solve for

eigenvalue[pij] solve for

eigenvalue[pij]

Page 28: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Closed-form solution

Idea: trade exact distribution for efficient approximation

Taylor series expansion:

NNNN

pp

pp

pp

AAA

1212

1111

efficient to calculate using adjoint systems

Multivariate normal approximation of pi*

Closed-form normal distribution for

Eigenvalue equation:

0)det( A

IP

Page 29: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Uncertainty results

5000 trajectories from each state

Running times (6 states)

Sampling-based: 40 seconds Closed-form: < 0.01 seconds

4926620057

1849784000

034823158133

0022646041169

00014788211

002151534380

Alanine System Transition Counts

1 2

3 4

6

5

Running times (87 states)

Sampling-based: 3600 seconds Closed-form: < 0.07 seconds

Page 30: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Sampling strategies

Problem: Simulations are expensive. Even with Folding@Home, we run simulations for months

How to intelligently allocate our resources?

Common approaches:• equilibrium sampling – sample each conformation from

its equilibrium distribution• even sampling – sample equally from each state

New sequential approaches

N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, 204909-204921 (2005).N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, 244101 (2007).

• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations

Page 31: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Adaptive sampling

Goal: Reduce uncertainty of eigenvalue

Uncertainty analysis decomposes by transitions from each state

NNNN

NN

NN

NN

pp

pp

pp

pp

pp

pp

AA

AA

AA

11

22

2121

11

1111

Variance depends on both uncertainty of and sensitivity to transition probabilities

Page 32: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Adaptive sampling – alanine

On 6-state alanine system, select trajectories randomly for 3 sampling strategies

4926620057

1849784000

034823158133

0022646041169

00014788211

002151534380

Transition Counts

Page 33: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Adaptive sampling – villin

• Benefits– Very quickly reduce the

variance– Reduce the total number of

simulations– Need less computational

power– Can study more complex

systems

Villin HeadpieceJayachandran, et al.,

Journal of Chemical Physics (2006)

2454 states

Page 34: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Summary

• Markovian state models are convenient methods to describe molecular motion

• Automatic state decomposition– Scalable to large size systems

• Model selection– Evaluate tradeoff between model complexity and

amount of data

• Uncertainty analysis– Efficient and decomposable

• Adaptive sampling– Reduce number of simulations

Page 35: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Acknowledgements

• Vijay Pande – Stanford University adviser

• Bill Swope, Jed Pitera – IBM collaborators

• John Chodera – state decomposition work