![Page 1: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/1.jpg)
Protein structure prediction using metagenome sequence data
Alex ChuCS 371 - Prof. Ron Dror
January 23, 2018
![Page 2: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/2.jpg)
4,752 contain at least one experimentally solved 3D structure
4,886 can be modeled to some extent using comparative homology modeling
5,211 with no known structure and no structurally characterized homologs...
14,849 Pfam protein families
![Page 3: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/3.jpg)
Ab initio/de novo structure prediction is not very good… so use evolutionary couplings
Balakrishnan et al, Proteins 2011
![Page 4: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/4.jpg)
ApproachUsed to calculate a simple covariance matrix, but too many false positives. (Positions that covary, but are not structurally linked.)
GREMLIN is one technique that mitigates this error (learns a probabilistic graphical model from a multiple sequence alignment)
![Page 5: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/5.jpg)
ApproachCombine GREMLIN with existing de novo prediction software from Rosetta
… but still limited by the amount of sequence data available.
![Page 6: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/6.jpg)
~2 billion partial and full-length proteins from ~5000 metagenomes from the Integrated Microbial Genomes database
Use metagenomics data!
Wooley et al, PLOS Comput Biol 2010
![Page 7: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/7.jpg)
Nf: a metric that describes how amenable a protein family is to this method, by relating the length of the protein, the number of sequences in the family, and the diversity of the sequences
How useful was this extra data?
![Page 8: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/8.jpg)
How do we assess the quality of a predicted structure?
It turns out structure prediction convergence is a good measure for the quality of the prediction
![Page 9: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/9.jpg)
FindingsMetagenomic data allowed prediction 33% of unmodeled Pfams (compared with 16% without metagenome data)
612 new Pfams predicted
137 of these are novel folds
![Page 10: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/10.jpg)
StrengthsLeveraged advances over the last ~10 years in high-throughput sequencing (especially in metagenomics), and in evolutionary coupling analysis.
This methods has generated one of the largest advances ever in structural genomics (quality predictions generated for >600 new families and >100 novel folds discovered), with promises of more as more sequence data becomes available.
![Page 11: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/11.jpg)
LimitationsThe algorithm doesn’t explicitly model the presence of membrane bilayers or endogenous ligands.
How generalizable is this method? Even for families with Nf > 64, prediction calculations converge just over half of the time.
Bacterial genomes and proteins are highly overrepresented in metagenomics studies.
![Page 12: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/12.jpg)
Limitations
We still can’t predict structure for over half of the families with unknown structure. (But is this even a fair criticism?)
![Page 13: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/13.jpg)
Potential Next StepsCan this be used at all to improve current homology modeling methods?
Can we improve the method to predict more structures with less sequences (i.e. at lower Nf values)?
Can we predict or incorporate functional sites into the predictions?
![Page 14: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/14.jpg)
Questions?
![Page 15: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/15.jpg)
Accurate De Novo Predic0on of Protein Contact Map by Ultra-
Deep Learning Model ShengWang,SiqiSun,ZhenLi,RenyuZhang,JinboXu
AnkitBaghel
CS37101/23/18
![Page 16: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/16.jpg)
Crea0ng Protein Contact Maps
![Page 17: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/17.jpg)
Protein Contact Maps
hBp://www.bioinformaJcs.org/cmview/screenshots.html
![Page 18: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/18.jpg)
Exis0ng Contact Predic0on Methods
• EvoluJonaryCouplingAnalysis(ECA)• PSICOV• plmDCA• Gremlin• CCMpred
• SupervisedMachineLearning• MetaPSICOV• SVMSEQ• CMAPpro• PconsC2• PhyCMAP• CoinDCA-NN• CMAPpro
**NotanexhausJvelist.
![Page 19: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/19.jpg)
ECA Driven Contact Predic0on Uses Correla0ons Between Residues
![Page 20: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/20.jpg)
Supervised Machine Learning Incorporates More Context
PconsC2ContactPredic5on
![Page 21: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/21.jpg)
Crea0ng Protein Contact Maps
Deep Residual Learning
![Page 22: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/22.jpg)
![Page 23: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/23.jpg)
Deep Residual Learning for RaptorX Contact Predic0on
![Page 24: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/24.jpg)
Protein Features for the First Residual Network
PosiJon-SpecificScoringMatrix(PSSM)
(Lx20Matrix)
PredictedSecondaryStructure(3x3Matrix)
PredictedSolventAccessibility(3x3Matrix)+ +
ConcatenatedtocreateaLx26matrixforuseinthefirstresidualnetwork.
1DResidualNetwork
MatrixConcatenaJon
LearnedSequenJalFeatures(LxnMatrix)
![Page 25: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/25.jpg)
Protein Features for Second Residual Network
LearnedSequenJalFeatures(LxnMatrix)
Conversionto2D
PairwiseFeatures(LxLx3n) MutualinformaJonEvoluJonaryCouplinginformaJonprovided
byCCMpred
Pair-wisecontactpotenJal
+ ++
Merge
2DResidualNetwork
LogisJcRegression
PredictedContactMap(LxLMatrix)
![Page 26: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/26.jpg)
Deep Residual Learning
Accuracy of Predicted Contact Maps
![Page 27: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/27.jpg)
Training Set
• SubsetofPDB25• Allproteinshavelessthan25%sequenceidenJtywithanyotherprotein• 6767proteins• Containsonly~100membraneproteins
![Page 28: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/28.jpg)
Test Set
• 150Pfamfamilies• 105CASP11testproteins• 76hardCAMEOtestproteinsfrom2015• 398membraneproteins
• 400residuesatmost• Atmost40%sequenceidenJty
![Page 29: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/29.jpg)
Contact Predic0on via Deep Residual Learning Has High Accuracy
Shortsequencedistancebetweentworesiduesisintherange[6,11]Mediumsequencedistancebetweentworesiduesisintherange[12,23]Longsequencedistancebetweentworesiduesisintherange≥24
AccuracyisdefinedasthepercentofthetopL/kpredictedcontactsthatcorrespondtonaJvecontactswhereListhelengthoftheprotein.RaptorXContactPredic5on(bo:omrow)hashigheraccuracythanthecomparedmethodsforallsequencedistanceranges.
![Page 30: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/30.jpg)
Contact Predic0on via Deep Residual Learning Has High Accuracy
Shortsequencedistancebetweentworesiduesisintherange[6,11]Mediumsequencedistancebetweentworesiduesisintherange[12,23]Longsequencedistancebetweentworesiduesisintherange≥24
AccuracyisdefinedasthepercentofthetopL/kpredictedcontactsthatcorrespondtonaJvecontactswhereListhelengthoftheprotein.RaptorXContactPredic5on(bo:omrow)hashigheraccuracythanthecomparedmethodsforallsequencedistanceranges.
![Page 31: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/31.jpg)
Contact Predic0on via Deep Residual Learning Has High Accuracy
Shortsequencedistancebetweentworesiduesisintherange[6,11]Mediumsequencedistancebetweentworesiduesisintherange[12,23]Longsequencedistancebetweentworesiduesisintherange≥24
AccuracyisdefinedasthepercentofthetopL/kpredictedcontactsthatcorrespondtonaJvecontactswhereListhelengthoftheprotein.RaptorXContactPredic5on(bo:omrow)hashigheraccuracythanthecomparedmethodsforallsequencedistanceranges.
![Page 32: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/32.jpg)
Contact Predic0on via Deep Residual Learning Has High Accuracy
AccuracyisdefinedasthepercentofthetopL/kpredictedcontactsthatcorrespondtonaJvecontactswhereListhelengthoftheprotein.RaptorXContactPredic5on(bo:omrow)hashigheraccuracythanthecomparedmethodsforallsequencedistanceranges.
Shortsequencedistancebetweentworesiduesisintherange[6,11]Mediumsequencedistancebetweentworesiduesisintherange[12,23]Longsequencedistancebetweentworesiduesisintherange≥24
![Page 33: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/33.jpg)
Increased Performance Due to Deep Residual Learning is Independent of Available Homologous Informa0on
![Page 34: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/34.jpg)
Accuracy of Predicted Contact Maps
Contact-Assisted Protein Folding Results
![Page 35: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/35.jpg)
Contact-Assisted Protein Folding Benefits from Deep Residual Learning
![Page 36: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/36.jpg)
Learned Features Are Not Template-Based
![Page 37: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/37.jpg)
Contact-Assisted Protein Folding Results
CAMEO Blind Tests
![Page 38: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/38.jpg)
CAMEO Blind Tests of Contact Predic0on
![Page 39: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/39.jpg)
CAMEO Blind Tests of Contact Predic0on
![Page 40: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/40.jpg)
CAMEO Blind Tests
Strengths/Weaknesses
![Page 41: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/41.jpg)
Strengths • VerythoroughincomparingitspredicJonsagainstdifferenttypesofproteinsandpredicJonapproaches.
• Usesanon-redundanttrainingset.
• Considersallresiduepairsforcontactsimultaneously.
• BlindtesJngthroughCAMEO.
• Performssurprisinglywellonmembraneproteins.
![Page 42: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/42.jpg)
Weaknesses • Useofextensivehiddenlayersmakeslearnedfeaturesdifficulttodescribe.
• DoesnotquanJfyitsfalse-posiJverate.
• Isnotasuniqueanapproachasimplied(seePConsC2).
• DoesnotcompareitsmethodofcontactmappredicJontothatofPConsC2.
• Testedmembraneproteinswereconstrained
![Page 43: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/43.jpg)
Supervised Machine Learning Incorporates More Context
PconsC2ContactPredic5on
![Page 44: Protein structure prediction using metagenome sequence dataProtein structure prediction using metagenome sequence data Alex Chu CS 371 - Prof. Ron Dror January 23, 2018. 4,752 contain](https://reader035.vdocuments.us/reader035/viewer/2022063023/5ff085e33f2b7c51f85110bf/html5/thumbnails/44.jpg)
Test Set
• 150Pfamfamilies• 105CASP11testproteins• 76hardCAMEOtestproteinsfrom2015• 398membraneproteins
• 400residuesatmost• Atmost40%sequenceidenJty