geometric and markov chain based methods for data analysisgeometric and markov chain based methods...

60
Geometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Diffusion Maps Transition Path Theory Allostery of BirA Protein Part 2 Point Cloud Discretization for Committor Functions Multiscale Geometric Methods Conclusion Geometric and Markov Chain Based Methods for Data Analysis Danielle Middlebrooks Advisors: Maria Cameron Kasso Okoudjou Applied Mathematics, Statistics and Scientific Computation University of Maryland- College Park January 22, 2017

Upload: others

Post on 17-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Geometric and Markov Chain Based Methodsfor Data Analysis

Danielle Middlebrooks

Advisors:Maria Cameron Kasso Okoudjou

Applied Mathematics, Statistics and Scientific ComputationUniversity of Maryland- College Park

January 22, 2017

Page 2: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

1 Introduction

2 Part 1Diffusion MapsTransition Path TheoryAllostery of BirA Protein

3 Part 2Point Cloud Discretization for Committor FunctionsMultiscale Geometric Methods

4 Conclusion

Page 3: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Introduction

The use of networks is apopular tool for datarepresentation andorganization.

In general, a networkconsist of a set of nodesalong with a set of edges.

New tools need to bedeveloped for the analysisof complex networks.

Particularly interested inapplications arising fromchemical physics.

Page 4: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation

Suppose a protein has two metastable states. We are interestedin a way to quantify the transition between the two states.

Approach:

Goal

Develop computational techniques for data analysis and thestudy of transition processes between metastable sets of states.

Page 5: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation

Suppose a protein has two metastable states. We are interestedin a way to quantify the transition between the two states.Approach:

Goal

Develop computational techniques for data analysis and thestudy of transition processes between metastable sets of states.

Page 6: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation

Suppose a protein has two metastable states. We are interestedin a way to quantify the transition between the two states.Approach:

Goal

Develop computational techniques for data analysis and thestudy of transition processes between metastable sets of states.

Page 7: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Outline

Discuss existing computational tools for network analysis aswell as describe ongoing projects.

Part 1:

Diffusion Maps (Coifman and Lafon, 2006)

Transition Path Theory (E and Vanden-Eijnden, 2006)

The study of allostery in BirA protein (with S. Matysiakand G. Custer)

Part 2:

Computation of the committor function on point clouds(Lai and Lu, 2017)

Multiscale Geometric Methods (Maggioni andCollaborators, 2011)

Page 8: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Outline

Discuss existing computational tools for network analysis aswell as describe ongoing projects.

Part 1:

Diffusion Maps (Coifman and Lafon, 2006)

Transition Path Theory (E and Vanden-Eijnden, 2006)

The study of allostery in BirA protein (with S. Matysiakand G. Custer)

Part 2:

Computation of the committor function on point clouds(Lai and Lu, 2017)

Multiscale Geometric Methods (Maggioni andCollaborators, 2011)

Page 9: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

1 Introduction

2 Part 1Diffusion MapsTransition Path TheoryAllostery of BirA Protein

3 Part 2Point Cloud Discretization for Committor FunctionsMultiscale Geometric Methods

4 Conclusion

Page 10: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation

Methodology for finding hidden geometric structure in largehigh-dimensional data sets.

Idea: The diffusion map embeds the data into Euclidean spaceof lower dimension so that the Euclidean distance is equal tothe diffusion distance in original space.

Page 11: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation

Consider a discrete time Markov chain. Suppose we take arandom walk on our data, jumping between data points.

The probability of jumping between two data points, xi and xjin one step of the random walk is defined by the stochasticmatrix Pij = p(xi , xj).

The probability of transition from xi to xj in t jumps is givenby pt(xi , xj) or Pt . The idea is taking larger powers of P willallow us to integrate the local geometry and thus revealrelevant geometric structures of the dataset at different scales.

Page 12: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Diffusion Distance to Diffusion Map

First, define a positive symmetric kernelk(xi , xj) = h(‖xi − xj‖/ε) with scaling parameter ε. Often, thiskernel is of the form

k(xi , xj) = e−‖xi−xj‖

2

ε (1)

This kernel is converted into a stochastic matrix P. Thediffusion distance is defined by

Dt(xi , xj)2 = ||pt(xi , ·)− pt(xj , ·)||2L2 , t = 1, 2, 3 . . . (2)

where pt(xi , ·) are the rows of the matrix Pt .

Page 13: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Diffusion Distance to Diffusion Map

Up to some relative precision δ, we have

Dt(xi , xj)2 =

s∑l=1

λ2tl (ψl(xi )− ψl(xj))2 with s = s(δ, t) (3)

where λl , ψl are the are the eigenvalues and eigenvectors thematrix P. Thus the family of diffusion maps Ψtt∈N is givenby

xi 7→ Ψt(xi ) =

λt1ψ1(xi )λt2ψ2(xi )

...λtsψs(xi )

(4)

The components of Ψt(xi ) are the diffusion coordinates of xi .The map Ψt embeds the data set into a s-dimensionalEuclidean space.

Page 14: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Key Points

Diffusion maps method is robust to noise perturbation.

Allows for geometric analysis at different scales.

Accuracy of results is heavily determined by user-definedparameter ε

Page 15: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

1 Introduction

2 Part 1Diffusion MapsTransition Path TheoryAllostery of BirA Protein

3 Part 2Point Cloud Discretization for Committor FunctionsMultiscale Geometric Methods

4 Conclusion

Page 16: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Markov Jump Processes

A

B

1

2

3

45

6

7

8

9

10

11

12

1314

1516

17

18

19

2021

22

23

24

25

26

Consider a Markov jump processon discrete state space S withinfinitesimal generatorL = (Li ,j)i ,j∈S :Li ,j ≥ 0, ∀i , j ∈ S , i 6= j

Li ,i = −∑j 6=i

Li ,j , ∀i ∈ S

Assume L is irreducible and the MJP is ergodic with respect toequilibrium probability distribution π = (πi )i∈S which satisfies

πTL = 0∑i∈S

πi = 1

Page 17: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

What is Transition Path Theory?

Transition Path Theory (TPT) is a framework to analyze thestatistical properties of reactive trajectories in which transitionsoccur between two disjoint subsets

TPT can be applied to

Physics

Chemistry

Biology

Social Sciences

Page 18: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Key Concepts of Transition Path Theory

For simplicity we consider the time reversible case.The committor function q = (qi )i∈S is the probability thatstarting at a state i, the trajectory will reach set B prior to setA and satisfies

∑j∈S Li ,jqj = 0, if i ∈ S \ (A ∪ B)

qi = 0, if i ∈ A

qi = 1, if i ∈ B

Probability current of reactive trajectories is given by

fi ,j =

(1− qi )πiLi ,jqj , if i 6= j

0, otherwise

Reactive current

Fi ,j = fi ,j − fj ,i = πiLi ,j(qi − qj)

Transition rateνAB =

∑(i ,j)∈cut

Fij

Page 19: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Key Concepts of Transition Path Theory

For simplicity we consider the time reversible case.The committor function q = (qi )i∈S is the probability thatstarting at a state i, the trajectory will reach set B prior to setA and satisfies

∑j∈S Li ,jqj = 0, if i ∈ S \ (A ∪ B)

qi = 0, if i ∈ A

qi = 1, if i ∈ B

Probability current of reactive trajectories is given by

fi ,j =

(1− qi )πiLi ,jqj , if i 6= j

0, otherwise

Reactive current

Fi ,j = fi ,j − fj ,i = πiLi ,j(qi − qj)

Transition rateνAB =

∑(i ,j)∈cut

Fij

Page 20: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Key Concepts of Transition Path Theory

For simplicity we consider the time reversible case.The committor function q = (qi )i∈S is the probability thatstarting at a state i, the trajectory will reach set B prior to setA and satisfies

∑j∈S Li ,jqj = 0, if i ∈ S \ (A ∪ B)

qi = 0, if i ∈ A

qi = 1, if i ∈ B

Probability current of reactive trajectories is given by

fi ,j =

(1− qi )πiLi ,jqj , if i 6= j

0, otherwise

Reactive current

Fi ,j = fi ,j − fj ,i = πiLi ,j(qi − qj)

Transition rateνAB =

∑(i ,j)∈cut

Fij

Page 21: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Key Concepts of Transition Path Theory

For simplicity we consider the time reversible case.The committor function q = (qi )i∈S is the probability thatstarting at a state i, the trajectory will reach set B prior to setA and satisfies

∑j∈S Li ,jqj = 0, if i ∈ S \ (A ∪ B)

qi = 0, if i ∈ A

qi = 1, if i ∈ B

Probability current of reactive trajectories is given by

fi ,j =

(1− qi )πiLi ,jqj , if i 6= j

0, otherwise

Reactive current

Fi ,j = fi ,j − fj ,i = πiLi ,j(qi − qj)

Transition rateνAB =

∑(i ,j)∈cut

Fij

Page 22: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Example

Examples of reactive trajectories are shown by red arrows.

Values of the committor are expressed by color. q = 0 aregreen, q = 1 are blue. Sequence of values of the committorstrictly increase along reactive trajectories.

Page 23: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Key Points

TPT gives quantitative characteristics of transitionprocesses in Markov chains between any two disjointsubsets of states.

Can find rates of transition even in Markov chains whichare time-irreversible.

Limitations lie in the ability to solve the system for thecommittor function.

Page 24: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

1 Introduction

2 Part 1Diffusion MapsTransition Path TheoryAllostery of BirA Protein

3 Part 2Point Cloud Discretization for Committor FunctionsMultiscale Geometric Methods

4 Conclusion

Page 25: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Analysis of Allostery in BirA Protein

Collaboration with Prof. S. Matysiak and G. Custer.

Goal: Study the information transfer between two regions inthe BirA protein.

Figure: BirA Protein. Wood,Zachary A., et al.

Consider two sets:

Ligand Binding Region(residues 212-223 and116-124)

Dimer-Interface region(residues 193-196 and140-146)

Page 26: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Calculating Edge Weights

Idea: Model the BirA protein using a network and determinesignal propagation between the two regions using TPT.

Ribeiro et al. [7] would define edge weights as

ωij =

ωb, if i and j are covalently bound

χij , otherwise

With ωb a predefined value and χij calculated by averageinteraction energy between all pairs of noncovalently boundresidues. χij ≡ 0.51− (εij − εav )/5εrmsd

For our analysis, the weights were calculated by taking theabsolute value of interaction energy between pairs of residues.

Page 27: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Calculating Edge Weights

Idea: Model the BirA protein using a network and determinesignal propagation between the two regions using TPT.

Ribeiro et al. [7] would define edge weights as

ωij =

ωb, if i and j are covalently bound

χij , otherwise

With ωb a predefined value and χij calculated by averageinteraction energy between all pairs of noncovalently boundresidues. χij ≡ 0.51− (εij − εav )/5εrmsd

For our analysis, the weights were calculated by taking theabsolute value of interaction energy between pairs of residues.

Page 28: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Calculating Edge Weights

Idea: Model the BirA protein using a network and determinesignal propagation between the two regions using TPT.

Ribeiro et al. [7] would define edge weights as

ωij =

ωb, if i and j are covalently bound

χij , otherwise

With ωb a predefined value and χij calculated by averageinteraction energy between all pairs of noncovalently boundresidues. χij ≡ 0.51− (εij − εav )/5εrmsd

For our analysis, the weights were calculated by taking theabsolute value of interaction energy between pairs of residues.

Page 29: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Committor Function via Transition Path Theory

Wild-type apo BirA protein

Page 30: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Finding Pathways from A to B

Algorithm (Finding Pathways)

For all nodes i in Asort edges i → j according to the current they carryfind all jmax > min threshold and add i and jmax to pathprint path(path,jmax)

sort edges jmax → k according to the current they carryfind all kmax > min threshold and add kmax to the pathprint path(path,kmax)

continue until path reaches set B or current is less than min threshold

Page 31: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Results for Reactive Current

Wild-type apo BirA protein

Page 32: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Results for Reactive Current

Wild-type apo BirA protein without covalent bonds

Page 33: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Questions for Further Analysis

What is the best way to define the weights of edges?

How do mutations affect energy process?

Should covalent bonds be taken into account?

Page 34: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

1 Introduction

2 Part 1Diffusion MapsTransition Path TheoryAllostery of BirA Protein

3 Part 2Point Cloud Discretization for Committor FunctionsMultiscale Geometric Methods

4 Conclusion

Page 35: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Point Cloud Discretization of q

Consider stochastic process of the overdamped Langevinequation

dXt = −∇U(Xt)dt +√

2β−1dWt

Again, q solves the following PDE with Dirichlet boundaryconditions given on A and B:

Lq = 0, in Ω \ (A ∪ B)

q = 0, in A ,

q = 1 in B

withL = −β−1∆ +∇U · ∇

With transition rate

νAB =

∫ΣJR(x) · n(x)dσ(x)

and JR = Z−1exp(−βU)∇q

Page 36: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation

Solving the PDE is non-trivial due to the curse ofdimensionality.

Main Idea

Transition between regions A and B lie in low-dimensionalmanifold and thus can approximate q on point cloud ratherthan entire configuration space.

Goal

Directly solve the committor equation based on point clouddiscretization (Local Mesh Method).

Page 37: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation

Solving the PDE is non-trivial due to the curse ofdimensionality.

Main Idea

Transition between regions A and B lie in low-dimensionalmanifold and thus can approximate q on point cloud ratherthan entire configuration space.

Goal

Directly solve the committor equation based on point clouddiscretization (Local Mesh Method).

Page 38: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation

Solving the PDE is non-trivial due to the curse ofdimensionality.

Main Idea

Transition between regions A and B lie in low-dimensionalmanifold and thus can approximate q on point cloud ratherthan entire configuration space.

Goal

Directly solve the committor equation based on point clouddiscretization (Local Mesh Method).

Page 39: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Local Mesh Method

Rongjie Lai and Jianfeng Lu

Figure: Projection of KNNwith first ring

At a point pi project its K -nearest neighborhood on thetangent plane of pi .

Approximate stiffness matrix:A =∑

F∈R(i)

∫F e−βU∇Fηi · ∇Fηj

Symmetrized stiffness matrix:S =

max (Aij ,Aji ) if Aij ≤ 0 and Aji ≤ 0

min (Aij ,Aji ) if Aij ≥ 0 and Aji ≥ 0

min (Aij ,Aji ) if Aij · Aji < 0

−∑

k 6=i Sik if i = j

Page 40: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Rugged Mueller Potential Function

Potential Urm(x , y) =U + γ sin(2kπx) sin(2kπy) withU orginal Mueller potential

Consider setsA = U(x , y) < −120 ∩ y >0.75andB = U(x , y) < −82 ∩ y <0.35

Page 41: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Committor Function and Reactive Current

Figure: Committor function viaFEM

Figure: Reactive current

Page 42: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Results of Committor Function

Figure: Finite ElementMethod

Figure: DiffusionMaps Method

Figure: Local MeshMethod

Page 43: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Comparison of Results

Figure: Committor function viaFEM and DM

Figure: Committor function viaFEM and LM

Page 44: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

“Umbrella” Potential Function

Potential U =10u2(u2− 1) + 0.07x2 + y2 + 15

19√

(19)z2

with u = x2+z2

4 + y

Consider sets containing two localminima along canopy of umbrella

Page 45: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Result of Committor Function

Figure: Committor function via Local Mesh method

Page 46: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Key Points

The local mesh method provides a tool to analyze thestochastic system in the framework of TPT.

This method is sensitive to the the number of points used.

We would like to first learn the manifold in order to get abetter arrangement of points.

Page 47: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

1 Introduction

2 Part 1Diffusion MapsTransition Path TheoryAllostery of BirA Protein

3 Part 2Point Cloud Discretization for Committor FunctionsMultiscale Geometric Methods

4 Conclusion

Page 48: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation: Principal Component Analysis

One main assumption of these high-dimensional data sets isthey are often modeled as low-dimensional sets embedded inhigh dimensional space.

Question: How do we estimate the intrinsic dimension of highdimensional datasets?

Represent data matrix X ∈ RDxn. Can compute Singular ValueDecomposition

X = UΣV T

U: Orthogonal D x D matrixΣ: diagonal D x n matrix of singular valuesV : Orthogonal n x n matrix

Page 49: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation: Principal Component Analysis

One main assumption of these high-dimensional data sets isthey are often modeled as low-dimensional sets embedded inhigh dimensional space.

Question: How do we estimate the intrinsic dimension of highdimensional datasets?

Represent data matrix X ∈ RDxn. Can compute Singular ValueDecomposition

X = UΣV T

U: Orthogonal D x D matrixΣ: diagonal D x n matrix of singular valuesV : Orthogonal n x n matrix

Page 50: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation: Principal Component Analysis

If data lies on k-dimensional hyperplane, only k singular valuesare nonzero and the first k columns of U span the desiredhyperplane.

What if the data is non-linear?

Page 51: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Motivation: Principal Component Analysis

If data lies on k-dimensional hyperplane, only k singular valuesare nonzero and the first k columns of U span the desiredhyperplane.

What if the data is non-linear?

Page 52: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Multiscale SVD (MSVD)

Problem Description

Consider data xini=1 sampled from a manifold of dimension k

embeded in RD with k << D. Our observations will be ¯xini=1

with xi = xi + σηi and σηi are drawn i.i.d. from noise randomvariable N.

Page 53: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Algorithm for Estimating Intrinsic Dimension

Algorithm (MSVD)

Set noisy data: X Fix: z ∈ x1, ..., xnFor each z , r > 0, i = 1, ...,D

Compute: Xn,z,r = Xn ∩ Bz(r)cov(Xn,z,r ) = 1

n

∑ni=1(xi − En[X ])⊗ (xi − En[X ])

λ(z,r)i = λi (cov(Xn,z,r ))

Observe k large eigenvalues and D − k smaller noiseeigenvalues.

Rmin ← smallest scale for which (λ(z,r)

k+1)1/2 is decreasing.

Rmax ← largest scale > Rmin for which (λ(z,r)

k+1)1/2 is

nondecreasing.k ← largest i such that

r ∈ (Rmin,Rmax), (λ(z,r)i )1/2 is linear and (λ

(z,r)i+1 )1/2 is

quadratic in r

Page 54: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Example

S9(1000, 100, 0.1) : 1000 points uniformly sampled on a9-dimensional unit sphere, embeded in 100 dimensions withGaussian noise of standard deviation 0.1 in every direction.

Page 55: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Key Points

While PCA fails due to non-linearity of the data, thismethod introduces a multiscale geometric approach toestimating the intrinsic dimension of nonlinear data.

Able to distinguish between the underlying k- dimensionalstructure of the data from the effects of noise andcurvature.

Depending on the dataset, finding a “good” range of radiimay be difficult.

Page 56: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

1 Introduction

2 Part 1Diffusion MapsTransition Path TheoryAllostery of BirA Protein

3 Part 2Point Cloud Discretization for Committor FunctionsMultiscale Geometric Methods

4 Conclusion

Page 57: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Conclusion

The use of networks to understand complex systems isincreasingly becoming popular to analyze systems in avariety of fields.

In many applications, it is of interest to study thetransitions that occur between two groups of nodes. Thiscan be analyzed by popular methods such as diffusionmaps and transition path theory.

Often, one can take advantage of the fact that complexhigh dimensional systems often have a low-lying intrinsicdimension. PCA and more recently MSVD are methodsthat help use determine these dimensions.

While these methods are extremely popular, new tools stillneed to be developed for high dimensional data analysis.

Page 58: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Research Plans

Continue the analysis of the allostery of BirA protein.

Develop method for learning manifold from point clouds.

Compute other functions on point clouds by solving othertypes of PDEs.

Page 59: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

References

1 Weinan, E., and Vanden-Eijnden, Eric. “Towards a Theory of TransitionPaths.” Journal of statistical physics 123.3 (2006).

2 Coifman, Ronald R., and Lafon, Stephane, “Diffusion maps.” Applied andcomputational harmonic analysis 21.1 (2006): 5-30

3 Little, Anna V., Maggioni, Mauro, and Rosasco, Lorenzo. “Multiscalegeometric methods for estimating intrinsic dimension.” Proc. SampTA 4.2(2011).

4 Lai, Rongjie, and Lu, Jianfeng. “Point cloud discretization of Fokker-Planckoperators for committor functions.” arXiv preprint arXiv:1703.09359 (2017).

5 Kim, Sang Beom, et al. “Systematic characterization of protein foldingpathways using diffusion maps: Application to Trp-cage miniprotein.” TheJournal of chemical physics 142.8 (2015): 02B6131.

6 Metzner, Philipp, Schutte, Christof, and Vanden-Eijnden, Eric. “Transitionpath theory for Markov jump processes.” Multiscale Modeling & Simulation7.3 (2009): 1192-1219

7 Ribeiro, Andre AST, and Ortiz, Vanessa. “Determination of signalingpathways in proteins through network theory: importance of the topology.”Journal of chemical theory and computation 10.4 (2014): 1762-1769.

Page 60: Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Geometric andMarkov Chain

BasedMethods for

Data Analysis

Middlebrooks

Introduction

Part 1

Diffusion Maps

Transition PathTheory

Allostery of BirAProtein

Part 2

Point CloudDiscretizationfor CommittorFunctions

MultiscaleGeometricMethods

Conclusion

Acknowledgements

Thank you to my advisors and committee members

National Science Foundation COMBINE-NRT Programunder Grant No. DGE-1632976