machine learning algorithms for structural and functional ......casp14: deepmind’s alphafold2 orf8...

29
Jian Peng Department of Computer Science College of Medicine, Institute of Genomic Biology University of Illinois at Urbana-Champaign Machine learning algorithms for structural and functional genomics

Upload: others

Post on 23-Aug-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Jian PengDepartment of Computer Science

College of Medicine, Institute of Genomic BiologyUniversity of Illinois at Urbana-Champaign

Machine learning algorithms for structural and functional genomics

Page 2: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

The protein folding problem

Englander, Mayne, PNAS, 2014

Initial structure

Intermediate states

Native structure

Ener

gy

Page 3: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Structure provides insights on function

ATP

Substrate

Protein Kinase

Images from rcsb.org

Page 4: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Protein structure prediction

Amino acid sequence 3D structure

2018

Successful prediction algorithms Progress in recent CASPs

From CASP13 evaluation by Abriata and Dal Peraro

Page 5: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Structure prediction

Page 6: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Current status: Protein structure predictionsequence

Alignment of homologs

PDBCo-evolutionary analysis

Deep learning models

Residue-residue constraints

BackboneOptimization

Template-based modeling Contact-assisted modeling

Fragment assembly

Page 7: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Exploiting co-evolution for contact prediction

Multiple sequence alignment

Local residue preference

Residue contact

Evolutionary and structural constraints

Page 8: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Learning couplings from protein alignment

AQKLYLTHIDAEVDGDADTLYMTKIHHQFQGDADRLFITEVKQVFEGDADRLYMTKIHHTFEGDADKLYCTLIHNSFDGDADRLYMTKIHHEFEGDADTLYLTMIHQKFQADTDTLYITHIDETFQGDADTLYLTQIRNKFQGDTSRMYITKIGQEFEGDADRLYMTKIHHEFEGDADRLYITHIHHSFEGDADRLYMTKIHHEFEGD

Multiple sequence alignment

Amino acid i

Capture independent sites

Single potentials

Local preference

or equivalently

Page 9: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

AQKLYLTHIDAEVDGDADTLYMTKIHHQFQGDADRLFITEVKQVFEGDADRLYMTKIHHTFEGDADKLYCTLIHNSFDGDADRLYMTKIHHEFEGDADTLYLTMIHQKFQADTDTLYITHIDETFQGDADTLYLTQIRNKFQGDTSRMYITKIGQEFEGDADRLYMTKIHHEFEGDADRLYITHIHHSFEGDADRLYMTKIHHEFEGD

Multiple sequence alignment

Amino acid i Amino acid j

Co-evolution

Pairwise potentials

Co-evolution strength

Capture pairwise interactions

Single potentials

Local preference

Markov random fieldIsing (Potts) model

Undirected graphical model

Learning couplings from protein alignment

Page 10: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Partitionfunction

Singletonpotentials

Pairwisepotentials

Learning with Markov Random Fields

Local AA preference Pairwise AA couplings

Learning algorithms:• Mean fields approximation: EVFold, DirectInfo• Gaussian approximation: PSICOV• Pseudolikelihood: GREMLIN, CCMpred

Balakrishnan, et al. Proteins 2008; Marks, et al. Plos one 2011; Morcos, et al PNAS, 2011; Jones, et al. Bioinformatics, 2011. Seemayer, et al. Bioinformatics 2014

Page 11: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Deep convolutional NNs recognize image patterns

Lecun, Bengio, Hinton. Nature 2015

Page 12: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

sequence alignment

distance contact mapevolutionary couplings

Deep convolutional NNs recognize coevolutionary patterns

Liu, Palmedo, et al. Cell Systems 2018

Page 13: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

DeepContact for contact prediction

Liu, Palmedo, et al. Cell Systems 2018

Page 14: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Deep learning improves coevolution-based contact prediction

Ranked at the top in CASP12 in 2016 in Z-score rankingon par with two other deep-learning based methods (RaptorX-Contact and (Deep) MetaPSICOV) on other metrics

Page 15: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Why is deep learning effective?

Deep neural network learns contact patterns

Page 16: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Inter-residue distance and angles

Recent developments go beyond contact prediction

Xu. PNAS 2019; Senior, et al. Nature 2020; Yang et al. PNAS 2020

Page 17: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

CASP14: DeepMind’s AlphaFold 2

ORF8 ORF3a

Blue: PredictedGreen: Actual

Page 18: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Function Prediction

Page 19: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Antibody Fluorescent protein CRISPR/Cas9

Binding affinity Fluorescence Specificity

Optimization of protein function

Sequence Function

Page 20: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Sequence Machine learning model Function

…DNGVDGEWTYDDATKTFTVTE 1.0 …DNGCDGEWTYDDATKTFTVTE 0.2…DNGVWGEWTYDDATKTFTVTE 3.9…DNGVWGEWTYDDATKTFTFTE 5.4…DNGVMGEWTYDDATKTFTDTE 0.1

Sequence Fitness Low-order

High-order

Need to differentiate function levels of closely related sequences

Need to model non-additive effect (epistasis)

Need to generalize to unseen sequences/mutations

Image from Schmiedel et al., 2019

Sequence-to-function modeling

?

Page 21: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Successful sequence-to-function models

protein sequence protein sequence substrate representation

?

Page 22: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Idea: unsupervised representation learning using language models using unlabeled data

Too general, not specific/sensitive to a single or few changes in the sequence

G K L P V

V

Trained on Pfam / UniProt database with unlabeled data

Hid

den

i

Hid

den

i+1

Hid

den

L-1

Hid

den

L

Challenge: labeled data are expensive to get

Rives, Goyal, et al. bioRxiv 2019; Rao, Bhattacharya, et al. NeurIPS 2019; Yang et al. Nature Methods 2019; Elnaggar, et al. arxiv 2020

Page 23: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

homologous sequences

evolutionary selection +++++++

Survived?

Not enough for fitting language models but enough for getting good features

Another idea: Learning from natural evolution

Image adopted from Hopf, et al. Nature Biotech 2017

Page 24: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Evolutionary couplings

Epistasis

Co-evolution correlates with function

Hopf, et al. Nature Biotech 2017; Riesselman, et al. Nature Methods 2018; Luo, et al. RECOMB 2020

Page 25: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

ECNet: integrating evolutionary contexts for protein function prediction

Specific and sensitive to mutations

Capturing global semantics

Luo, et al. RECOMB 2020

Page 26: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Evaluation on single-mutation datasets

Page 27: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Generalization to high-order mutations

Test on high-order (4~11) GFP variants High-order (3~15) resistant TEM

Page 28: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Engineering of inhibitor-resistant TEM1 beta-lactamase

Experimental validation

Variants construction & fitness measurement (ampicillin resistance)

ECNet

Fraction of improved predictions

Page 29: Machine learning algorithms for structural and functional ......CASP14: DeepMind’s AlphaFold2 ORF8 ORF3a Blue: Predicted Green: Actual Function Prediction Antibody Fluorescent protein

Thank you & acknowledgements

Students: Sheng Wang, Yunan Luo, Hoon Cho, Nate Russell, Wesley Qian, Yang Liu, Yufeng Su, Palash Sashittal,

Collaborators: Bonnie Berger, Vik Khurana, Trey Ideker, Mikko Taipale, Jianzhu Ma, Jianyang Zeng, Qiang Liu, Mohammed El-Kebir, Huimin Zhao, Martin Burke, David Heckerman, ChengXiang Zhai, Jiawei Han, Aditya Parameswaran, Jadwiga Bienkowska, Lenore Cowen, Alex Schwing, Jonathan Carlson, Sumaiya Nazeen, Dina Zielinski