probabilistic programming and · probabilistic programming is an emerging paradigm on par with deep...

36
Probabilistic programming and the protein folding problem Thomas Hamelryck Dpt. of Biology and Dpt. of Computer Science (DIKU) University of Copenhagen, Denmark AI Seminar, KU, Sep. 2019 1

Upload: others

Post on 17-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Probabilistic programming and the protein folding problem

Thomas HamelryckDpt. of Biology and Dpt. of Computer Science (DIKU)

University of Copenhagen, DenmarkAI Seminar, KU, Sep. 2019

1

Page 2: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

The cell - 50% protein

2Picture: A bacterial cell by David Goodsell

Page 3: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Protein structure

3Pictures: Protein structures by David Goodsell

Page 4: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Protein sequence determines fold

4

ANERGHPKLAAFGVGHSAQWPNTSVHGFDZPLQACVTSSHKLRTNNMKLW

Christian Anfinsen (1916-1995)

Page 5: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

It’s complicated

5Pictures: PHA-L, 231 amino acids (PDB: 1FAT, Hamelryck et al., 1996)

Page 6: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Protein folding and dihedral angles

6

Page 7: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Simons et al., J. Mol. Biol., 1997

1997: Rosetta

7Pictures: Kaufman et al., Biochemistry, 2010 and Wikipedia

Page 8: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

2011: EVfoldMarks et al., PLoS ONE, 2011

8

Page 9: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

2018: DeepMind’s AlphaFold● “Unprecedented progress in the ability of computational

methods to predict protein structure.”, CASP13● AlphaFold: 25/43 most accurate prediction

○ Second best: 3/43

9Pictures: AlphaFold

Page 10: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

2018: AlQuraishi’s end-to-end prediction

10

● Mohammed AlQuraishi, Cell Systems, 2019 (bioRxiv, 2018)

Page 11: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Procrustes problems

11Pictures: Wikipedia / ancientgreecereloaded.com

Theseus slaying Procrustes.

Page 12: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Point estimates versus distributions● Epistemic randomness

○ Randomness due to modelling○ Measurement noise○ Limited data○ Poor model○ Estimation procedures

● Aleatory randomness○ Intrinsic randomness

12Picture: Wikipedia

An "ensemble" of 44 crystal structures of lysozyme.

Page 13: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

The Bayesian calculus

13

Thomas Bayes (1701-1761)

Page 14: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

The Bayesian calculus

14

Pierre-Simon Laplace (1749-1827)

Page 15: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

The theory that would not die

15

Page 16: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Bayes in the 21st century

Picture: Olivier Grisel's talk on ML 16

Page 17: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Automatic inference

17

SamplingHamiltonian Monte Carlo / NUTS (2011)

OptimisationVariational inference

Pictures: Mathieu Lê / John Winn, Microsoft Research Cambridge (adapted)

Page 18: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Deep probabilistic programming

18

Pyro

Page 19: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Bayesian linear model

19

y

x

Picture (right): Wikipedia

Page 20: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Bayesian linear model in PyMC3

20

Page 21: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

How I got involved

21References: TorusDBN, PNAS, 2008, 2014; EvoTorusDBN, Mol. Biol. Evol., 2017

Page 22: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Generative models of local structure● TorusDBN, Boomsma et al., PNAS, 2008, 2014● EvoTorusDBN, Golden et al., Mol. Biol. Evol., 2017

22

Page 23: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Bayesian end-to-end prediction

23

ANERGHPKLAAFGVGHSAQWPNTSVHGFDZPLQACVTSSHKLRTNNMKLWERHKLNCASPTWKHERGHPKL

Page 24: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Protein structure superposition

24

● Minimize the root mean square deviation (RMSD) of the atomic positions ○ Kabsch method○ Acta Crystallographica, 1976○ Singular value decomposition

● Problems due to unrealistic assumptions○ Assumes atoms have equal variance ○ Assumes atoms are uncorrelated

Page 25: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Protein structure superposition● Minimize the root mean square deviation

(RMSD) of the atomic positions ○ Kabsch method○ Acta Crystallographica, 1976○ Singular value decomposition

● Problems due to unrealistic assumptions○ Assumes atoms have equal variance ○ Assumes atoms are uncorrelated

25

Page 26: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Theseus maximum likelihood model

26

● Douglas Theobald, PNAS, 2006○ Based on Mardia-Dryden model, 1989

Page 27: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Bayesian Theseus-PPL model

27

● Implemented in Pyro / PyTorch● Priors and likelihood

○ Student’s t for M and likelihood○ Normal prior for T○ HalfNormal for diag(U)○ Quaternion prior for R

● MAP estimation ○ Gradient descent

● Bayesian posterior○ Hamiltonian Monte Carlo/NUTS○ Variational inference

Page 28: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Quaternions and rotations

28

Page 29: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

First example: 2CPD

29

Page 30: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

First example: 2CPD

30

Page 31: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Second example: 2YS9

31

Page 32: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Second example: 2YS9

32

Page 33: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Towards a deep generative model

33

Page 34: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Conclusions ● Probabilistic Programming is an emerging paradigm

on par with Deep Learning and Big Data Analytics● We conjecture it will soon be possible to perform

Bayesian end-to-end protein structure prediction using deep probabilistic programming

● Theseus-PPL provides a suitable likelihood function○ Rotation and translation invariant○ Robust w.r.t. prediction errors○ Low dimensional and on the “right space”

● Protein structure prediction can serve as a paradigm problem for deep probabilistic programming

34

Page 35: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

35

Jotun Hein, Michael Golden, Oxford

Kanti Mardia, Leeds Douglas Theobald, Brandeis

Ahmad Salim Al-Sibahi, DIKU

Lys Sanz Moreta, DIKU

Fritz Henglein, DIKU

Page 36: Probabilistic programming and · Probabilistic Programming is an emerging paradigm on par with Deep Learning and Big Data Analytics We conjecture it will soon be possible to perform

Funding & reference

“A probabilistic programming approach to protein structure superposition”, 16th IEEE-CIBCB conference, Tuscany, Italy, July 2019

36