probabilistic ensembles for improved inference in protein -structure determination
DESCRIPTION
Probabilistic Ensembles for Improved Inference in Protein -Structure Determination. Ameet Soni* and Jude Shavlik Dept . of Computer Sciences Dept. of Biostatistics and Medical Informatics. Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/1.jpg)
Probabilistic Ensembles for Improved Inference in
Protein-Structure Determination
Ameet Soni* and Jude ShavlikDept. of Computer SciencesDept. of Biostatistics and Medical Informatics
Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011
![Page 2: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/2.jpg)
Protein Structure Determination
2
Proteins essential to mostcellular function Structural support Catalysis/enzymatic activity Cell signaling
Protein structures determine function
X-ray crystallography is main technique for determining structures
![Page 3: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/3.jpg)
Task Overview3
Given A protein sequence Electron-density map
(EDM) of protein
Do Automatically produce a
protein structure that Contains all atoms Is physically feasible
SAVRVGLAIM...
![Page 4: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/4.jpg)
Challenges & Related Work4
1 Å 2 Å 3 Å 4 Å
Our Method: ACMI
ARP/wARPTEXTAL & RESOLVE
Resolution is a
property of the protein
Higher Resolution : Better Quality
![Page 5: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/5.jpg)
Outline5
Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results
![Page 6: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/6.jpg)
Outline6
Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results
![Page 7: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/7.jpg)
Our Technique: ACMI7
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
bk
bk-1
bk+1*1…M
![Page 8: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/8.jpg)
Results[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007]
8
![Page 9: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/9.jpg)
ACMI Outline9
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
bk
bk-1
bk+1*1…M
![Page 10: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/10.jpg)
Phase 2 – Probabilistic Model
10
ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF)
LEU4 SER5GLY2 LYS3ALA1
![Page 11: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/11.jpg)
Probabilistic Model11
# nodes: ~1,000# edges:
~1,000,000
![Page 12: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/12.jpg)
Approximate Inference12
Best structure intractable to calculatei.e., we cannot infer the underlying structure analytically
Phase 2 uses Loopy Belief Propagation (BP) to approximate solution Local, message-passing scheme Distributes evidence between nodes
![Page 13: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/13.jpg)
Loopy Belief Propagation13
LYS31 LEU32
mLYS31→LEU32
pLEU32pLYS31
![Page 14: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/14.jpg)
Loopy Belief Propagation14
LYS31 LEU32
mLEU32→LEU31
pLEU32pLYS31
![Page 15: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/15.jpg)
Shortcomings of Phase 215
Inference is very difficult ~1,000,000 possible outputs for one amino
acid ~250-1250 amino acids in one protein Evidence is noisy O(N2) constraints
Approximate solutions, room for improvement
![Page 16: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/16.jpg)
Outline16
Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results
![Page 17: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/17.jpg)
Ensembles: the use of multiple models to improve predictive performance
Tend to outperform best single model [Dietterich ‘00] Eg, Netflix prize
Ensemble Methods17
![Page 18: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/18.jpg)
Phase 2: Standard ACMI18
Protocol
MRF
P(bk)
![Page 19: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/19.jpg)
Phase 2: Ensemble ACMI19
Protocol 1
MRF
Protocol 2
Protocol C
P1(bk)
P2(bk)
PC(bk)
…
…
![Page 20: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/20.jpg)
Probabilistic Ensembles in ACMI (PEA)20
New ensemble framework (PEA) Run inference multiple times, under
different conditions Output: multiple, diverse, estimates of each
amino acid’s location
Phase 2 now has several probability distributions for each amino acid, so what?
![Page 21: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/21.jpg)
ACMI Outline21
Perform Local Match Apply Global Constraints Sample Structure
Phase 1 Phase 2 Phase 3bk
bk-1
bk+1*1…M
prior probability of
each AA’s location
posterior probabilityof each AA’s location
all-atom protein structures
![Page 22: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/22.jpg)
Place next backbone atom
Backbone Step (Prior work)22
(1) Sample bk from empirical Ca- Ca- Ca pseudoangle distribution
bk-1b'k
bk-2
????
?
![Page 23: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/23.jpg)
Place next backbone atom
Backbone Step (Prior work)23
0.25…
bk-1
bk-2
(2) Weight each sample by its Phase 2 computed marginal
b'k0.20
0.15
![Page 24: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/24.jpg)
Place next backbone atom
Backbone Step (Prior work)24
0.25…
bk-1
bk-2
(3) Select bk with probability proportional to sample weight
b'k0.20
0.15
![Page 25: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/25.jpg)
Backbone Step for PEA25
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? Aggregator
w(b'k)
![Page 26: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/26.jpg)
Backbone Step for PEA: Average
26
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? AVG
0.14
![Page 27: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/27.jpg)
Backbone Step for PEA: Maximum
27
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? MAX
0.23
![Page 28: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/28.jpg)
Backbone Step for PEA: Sample
28
bk-1
bk-2
b'k0.23 0.15 0.04
PC(b'k)P2(b'k)P1(b'k)
? SAMP
0.15
![Page 29: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/29.jpg)
Review: Previous work on ACMI
29
Prot
ocol
P(bk)
0.25
…
bk-1
bk-2
0.20
0.15
Phase 2 Phase 3
![Page 30: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/30.jpg)
Prot
ocol
Prot
ocol
Review: PEA30
Prot
ocol
bk-1
bk-2
0.14
…
0.26
0.05
Phase 2 Phase 3AG
G
![Page 31: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/31.jpg)
Outline31
Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results
![Page 32: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/32.jpg)
Experimental Methodology32
PEA (Probabilistic Ensembles in ACMI) 4 ensemble components Aggregators: AVG, MAX, SAMP
ACMI ORIG – standard ACMI (prior work) EXT – run inference 4 times as long BEST – test best of 4 PEA components
![Page 33: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/33.jpg)
Phase 2 Results33
*p-value < 0.01
![Page 34: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/34.jpg)
Protein Structure Results34
*p-value < 0.05
Correctness Completeness
![Page 35: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/35.jpg)
Protein Structure Results35
![Page 36: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/36.jpg)
Impact of Ensemble Size36
![Page 37: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/37.jpg)
Conclusions37
ACMI is the state-of-the-art method for determining protein structures in poor-resolution images
Probabilistic Ensembles in ACMI (PEA) improves approximate inference, produces better protein structures
Future Work General solution for inference Larger ensemble size
![Page 38: Probabilistic Ensembles for Improved Inference in Protein -Structure Determination](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816361550346895dd43126/html5/thumbnails/38.jpg)
Acknowledgements38
Phillips Laboratory at UW - Madison UW Center for Eukaryotic Structural Genomics
(CESG)
NLM R01-LM008796 NLM Training Grant T15-LM007359 NIH Protein Structure Initiative Grant
GM074901
Thank you!