modeling molecular dynamics from simulations nina singhal hinrichs departments of computer science...
TRANSCRIPT
Modeling molecular dynamics from simulations
Nina Singhal HinrichsDepartments of Computer Science and Statistics
University of Chicago
January 28, 2009
Motivation
• Proteins are essential parts of living organisms– enzymes, cell signaling, membrane
transport . . .
• Composed of chain of amino acids• Fold to unique 3-dimensional
structure• Misfolding can cause diseases
– Alzheimer’s, Mad cow, Huntington’s . . .
• How do proteins fold?
Molecular dynamics
• Represent atoms of molecule and solvent
• Model forces on atoms
• Integrate laws of motion
• Small integration time step compared to motion timescales
Folding@Home: Distributed computing for biomolecular simulation
• Perform multiple simulations in parallel
• Total simulation times – hundreds of microseconds (hundreds of CPU-years)
Very powerful computational resource– ~200 Teraflops sustained performance– >1,000,000 total CPUs; 200,000 active
Challenge: How to analyze?
• Enormous datasets– Describe dynamics in microscopic detail
• Questions we want to answer– Rate of folding, mechanism of folding . . .
• How can we extract these properties from our data?
Outline
• Markovian state model for molecular motion– Model description, uses, examples
• New algorithms for building these models– Defining states and transition probabilities
• New methods for dealing with finite sampling– Model complexity, uncertainty analysis, targeted
sampling
Chemical intuition
Chemical reactions often exhibit stochastic behavior
n-butane
Chandler, Journal of Chemical Physics (1977)
1
2 34
5
Markovian state model
Define transition probabilities, or edges, between states
Define states in the conformation space
NNN
N
pp
pp
ppp
1
2221
11211
1
2 34
5
Uses of the model
• Populations of states over time
• Eigenvalues and eigenvectors – conformational changes
• Kinetic properties – virtually any kinetic property
• Mechanistic properties – most likely path, probability of transitions as graph algorithms
Chodera et al., Multiscale Modeling and Simulation (2006)
t
p
Example models
Chodera et al., Multiscale Modeling and Simulation (2006)
Kasson et al., PNAS (2006)
lipid vesicle fusionalanine peptide
Sorin and Pande, Biophysical Journal (2005) Jayachandran et al., Journal of Structural Biology (2006)
villin headpiece
alpha helix
• Building Markovian state model
– Defining states that are Markovian
– Calculating the transition probabilities
• Refining Markovian state model
– Finding the best model
– Determining model uncertainty
– Designing new simulations
Computational and statistical challenges
1
2 34
5
p11 p12 p1Np21 p22 pN1 pNN
1
2 34
5
• Challenge: Find appropriate states
• Individual conformations as states does not scale
• Group conformations into discrete states
• Structural clustering is insufficient
• Basic algorithm – combine structural and kinetic similarity
Automatic state decomposition
J. D. Chodera*, N. Singhal*, V. S. Pande, K. A. Dill, and W. C. Swope. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. Journal of Chemical Physics, 126, 155101 (2007). (*These authors contributed equally to this work)
• Building Markovian State Model– Defining states that are Markovian– Calculating the transition probabilities
Comparison of structural and kinetic clustering
structural clustering kinetic clustering
trpzip2Cochran et al. PNAS 98:5578, 2001.
State decomposition – splitting
Cluster conformations by root mean square distance (RMSD)
State decomposition – lumping
group states which inter-convert quickly
State decomposition – resplitting
Cluster conformations, restricted to each state
Blocked alanine peptide
60
-60 60
-60
1 2
3 4
6
5Chodera et al., Multiscale Modeling
and Simulation (2006)
Automatic state decomposition of alanine peptide
Black state sits on top of multiple other states!
Benefit of automatic algorithm
These conformations had an unusual peptide bond
Stability of decomposition
TrpZip peptide
N. Singhal, C. D. Snow, and V. S. Pande. Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a trp zipper beta hairpin. Journal of Chemical Physics, 121(1), 415-425 (2004).
Transition probabilities
1
23
4
5
Discretize trajectories into series of states
1223435
normalize
NNN
N
pp
pp
ppp
1
2221
11211
Count number of transitions between all pairs of states
NNN
N
zz
zz
zzz
1
2221
11211
transition counts transition probabilities
• Building Markovian State Model– Defining states that are Markovian– Calculating the transition probabilities
Model selection
• Challenge: How many states should we have?
– More states are more Markovian
– More states have more parameters
• How do we evaluate this tradeoff?
N. S. Hinrichs and V. S. Pande. Bayesian metrics for validating and improving Markovian state models for molecular dynamics simulations. (In preparation)
• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations
Hidden Markov Model formulation
• Formulate the problem as a Hidden Markov Model structure scoring question
• Different discretizations of continuous space
• Benefits of Bayesian scores– Naturally handles tradeoff between complexity of model and
amount of data– Avoids over-fitting of parameters
States
Observations
Alanine peptide results
Score of Hidden Markov models for different lag times
Last model is worse at shorter times but preferred at longer times
No previous evaluation methods could distinguish these models
Uncertainty analysis
Goal: Once we have the states, what is the uncertainty in the model?
Both are reasonable but give different transition probabilities
Different MFPT, Pfold, eigenvalues, eigenvectors ...
N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, 204909-204921 (2005).N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, 244101 (2007).
1
2 34
51
2 34
5
Uncertainty caused by finite sampling
• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations
Transition probabilities
Recall that we calculate transition probabilities by counting:
kik
ijij z
zp
)()|()|( *i*i*i ppp PcountsPcountsP
Instead of getting a single value, we can talk about the distribution of transition probabilities
Bayes’ Rule:
pij
i
70
30 k
j
i
700
300 k
j
Sampling approach
Possible solution to get distribution of eigenvalues:
Problem:sampling can be expensivesolving per sample can be expensive
solve for eigenvalue[pij] solve for
eigenvalue[pij] solve for
eigenvalue[pij]
Closed-form solution
Idea: trade exact distribution for efficient approximation
Taylor series expansion:
NNNN
pp
pp
pp
AAA
1212
1111
efficient to calculate using adjoint systems
Multivariate normal approximation of pi*
Closed-form normal distribution for
Eigenvalue equation:
0)det( A
IP
Uncertainty results
5000 trajectories from each state
Running times (6 states)
Sampling-based: 40 seconds Closed-form: < 0.01 seconds
4926620057
1849784000
034823158133
0022646041169
00014788211
002151534380
Alanine System Transition Counts
1 2
3 4
6
5
Running times (87 states)
Sampling-based: 3600 seconds Closed-form: < 0.07 seconds
Sampling strategies
Problem: Simulations are expensive. Even with Folding@Home, we run simulations for months
How to intelligently allocate our resources?
Common approaches:• equilibrium sampling – sample each conformation from
its equilibrium distribution• even sampling – sample equally from each state
New sequential approaches
N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, 204909-204921 (2005).N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, 244101 (2007).
• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations
Adaptive sampling
Goal: Reduce uncertainty of eigenvalue
Uncertainty analysis decomposes by transitions from each state
NNNN
NN
NN
NN
pp
pp
pp
pp
pp
pp
AA
AA
AA
11
22
2121
11
1111
Variance depends on both uncertainty of and sensitivity to transition probabilities
Adaptive sampling – alanine
On 6-state alanine system, select trajectories randomly for 3 sampling strategies
4926620057
1849784000
034823158133
0022646041169
00014788211
002151534380
Transition Counts
Adaptive sampling – villin
• Benefits– Very quickly reduce the
variance– Reduce the total number of
simulations– Need less computational
power– Can study more complex
systems
Villin HeadpieceJayachandran, et al.,
Journal of Chemical Physics (2006)
2454 states
Summary
• Markovian state models are convenient methods to describe molecular motion
• Automatic state decomposition– Scalable to large size systems
• Model selection– Evaluate tradeoff between model complexity and
amount of data
• Uncertainty analysis– Efficient and decomposable
• Adaptive sampling– Reduce number of simulations
Acknowledgements
• Vijay Pande – Stanford University adviser
• Bill Swope, Jed Pitera – IBM collaborators
• John Chodera – state decomposition work