contact map prediction and ab initio structure...
Post on 09-Jul-2018
214 Views
Preview:
TRANSCRIPT
Contact map guided ab
initio structure prediction
S M Golam Mortuza
Postdoctoral Research Fellow
I-TASSER Workshop 2017
North Carolina A&T State University, Greensboro, NC
Outline
• Ab initio structure prediction: QUARK
• Contact map prediction: NeBcon
• Contact guided ab initio structure prediction: C-QUARK
• Ab initio GPCR structure prediction: GPCR-AIM
3/20/2017 2
Ab initio structure prediction
• In the absence of homologous templates ,I-TASSER based models are often less useful for biomedical studies due to less accuracy of the models
• Ab initio protein folding method assembles protein structures without using templates
• Ab initio structure modeling represents the most challenging problem in structure prediction
3/20/2017 3
QUARK: Fragment generation and distance profile
3/20/2017 5
Xu et al., Proteins-Structure Function and Bioinformatics, 81(2), pp. 229-239 (2012)
QUARK: Energy Function 𝐸𝑡𝑜𝑡= 𝐸𝑝𝑟𝑚 + 𝑤1𝐸𝑝𝑟𝑠 + 𝑤2𝐸𝑒𝑣 + 𝑤3𝐸ℎ𝑏 + 𝑤4𝐸𝑠𝑎 + 𝑤5𝐸𝑑ℎ + 𝑤6𝐸𝑑𝑝 + 𝑤7𝐸𝑟𝑔 + 𝑤8𝐸𝑑𝑎𝑏 +
𝑤9𝐸ℎ𝑝 + 𝑤10𝐸𝑏𝑝
1. Backbone atomic pair-wise potential (𝐸𝑝𝑟𝑚)
2. Side-chain center pair-wise potential (𝐸𝑝𝑟𝑠)
3. Excluded volume (𝐸𝑒𝑣)
4. Hydrogen bonding (𝐸ℎ𝑏)
5. Solvent accessibility (𝐸𝑠𝑎)
6. Backbone torsion potential (𝐸𝑑ℎ)
7. Fragment-based distance profile (𝐸𝑑𝑝)
8. Radius of gyration (𝐸𝑟𝑔)
9. Strand-helix-strand packing (𝐸𝑑𝑎𝑏) 10. Helix packing (𝐸ℎ𝑝)
11. Strand packing (𝐸𝑏𝑝)
6
Problems with Metropolis Monte Carlo
1. Possibility of getting trapped at local energy basin
2. Increasing T can overcome local energy barrier, but it cannot detect low-energy regions
E
X
Low Temperature
E
X
High Temperature
𝑝𝑎𝑐𝑐𝑒𝑝𝑡 ~ 𝑒−𝑑𝐸/𝑇
3/20/2017 7
Replica Exchange Monte Carlo Initial Random Configuration
Make Random Change
Calculate dE
𝑝𝑎𝑐𝑐𝑒𝑝𝑡= 𝑒−𝑑𝐸/𝑇1
Initial Random Configuration
Make Random Change
Calculate dE
𝑝𝑎𝑐𝑐𝑒𝑝𝑡= 𝑒−𝑑𝐸/𝑇2
Initial Random Configuration
Make Random Change
Calculate dE
𝑝𝑎𝑐𝑐𝑒𝑝𝑡= 𝑒−𝑑𝐸/𝑇3
T1 T2 T3
𝑃𝑠𝑤𝑎𝑝𝑖,𝑗 = 𝑒𝐸𝑖−𝐸𝑗
1𝑡𝑖−1𝑡𝑗
3/20/2017 8 Tmax =2.4 + 0.016L Tmin= 0.6+ 0.00067L
Benchmark Results: QUARK vs. Rosetta Data set: 51 small proteins (70-100 AA) and 94 medium proteins (100-150 AA)
RMSD: 96/145 targets QUARK models are better than Rosetta (p-value: 1.51X10-4)
TM-score: 95/145 targets QUARK models are better than Rosetta (p-value: 2.87X10-7)
3/20/2017 9 Xu et al., Proteins-Structure Function and Bioinformatics, 80(7), pp. 1715-1735(2012)
Benchmark Results: QUARK vs. Rosetta
Data set
Methods
First (best in top five) cluster
center model
RMSD TM-score
51 small proteins with (70-100
residues)
Rosetta 10.1 (8.5) 0.350 (0.393)
QUARK 9.1 (7.7) 0.404 (0.441)
94 medium proteins with (100-150
residues)
Rosetta 13.0 (11.5) 0.317 (0.346)
QUARK 12.5 (10.7) 0.334 (0.374)
3/20/2017 10
Xu et al., Proteins-Structure Function and Bioinformatics, 80(7), pp. 1715-1735(2012)
Benchmark Results: QUARK vs. Rosetta
Red: Native Blue: Rosetta Green: QUARK
3/20/2017 11
Xu et al., Proteins-Structure Function and Bioinformatics, 80(7), pp. 1715-1735(2012)
QUARK in CASP Experiments CASP9 CASP10 CASP11
Groups Z Groups Z Groups Z
QUARK 31.6 QUARK 17.1 QUARK 33.5
Multicon-Refine 22.4 TASSER-VMT 13.9 RBO_Aleph 29.6
Chunk-TASSER 20.7 Pcons-net 13.7 Multicom-con 21.4
RaptorX 19.8 PMS 11.7 RaptorX-FM 17.6
Baker-Rosetta 19.0 RaptorX-Roll 11.3 myprotein-me 15.9
Jiang_Assembly 14.7 HHpred-thread 10.9 TASSER-VMT 15.8
Gws 13.9 Multicom-clust 10.6 Baker-Rosetta 15.7
BioSerf 13.6 RBO-MBS 9.1 Seok-server 15.6
SAM-T08-server 12.7 MUFold_CRF 8.8 FUSION 15.5
Seok-server 12.6 Baker-Rosetta 8.1 nns 15.4
Here, Z-score (Z) represents the significance of the structure predictions by each group compared to the average performance 3/20/2017 12
QUARK modeling of T0837-D1 (128 AA) in CASP 11
Assessor’s comment: T0837-D1_499_1 represents the FM model with biggest improvement for PDB templates in CASP11 experiment
QUARK fragments RMSD ~ 0.1-2.6 A
13
Why Zhang-Server performs better than QUARK in CASP experiments??
• Models built by QUARK are compared with threading templates by LOMETS
• The templates are then re-ranked by their similarity to the QUARK models before they are subjected to the I-TASSER structure-assembly simulations.
3/20/2017 14
Zhang et al., Proteins, 84, pp.76-86 (2015)
Limitations in current methods • Fold small proteins (<150 residues) • Can only fold beta-protein with simple topology
R0014 CASP10
3/20/2017 15
Contact maps in ab initio protein structure prediction
• Sequence-based contact map prediction can be useful for 3D structure folding of larger size proteins that have complicated topologies
• Incorrectly predicted contacts can be harmful to 3D structure construction.
• Contact prediction should have an accuracy of at least 22% to generate a positive effect to the ab initio structure prediction
3/20/2017 16
Basic information on contact maps
• Residues are in contact if the distance between 𝐶𝛼 or 𝐶𝛽 atoms of the residues is < 8 Å
• Contact classification: • Short range: Sequence
separation 6-11 residues
• Medium range: Sequence separation 12-24 residues
• Long range: Sequence separation >24 residues
3/20/2017 17
Short range
Medium range Long
range
TTSQKHRDFVAEPGEKPVGSLAGIGEVLGKKLEERG 1 7 13 26
Programs for predicting contact maps
• Machine Learning: o BETACON
o SVMcon
o SVMSEQ
• Coevolution: o PSICOV
o CCMpred
o mfDCA
o Gremlin
• Meta: oSTRUCTCH oMetaPSICOV oPconsC2, PconsC31
3/20/2017 18
Naïve Bayes Classifier (NBC) 𝑋𝑖𝑗= (𝑋
𝑖𝑗
1
, 𝑋𝑖𝑗
2
, ⋯ , 𝑋𝑖𝑗
𝑚
)
𝑃 𝐶 𝑋𝑖𝑗 =𝑃 𝐶 𝑃 𝑋𝑖𝑗
𝑚 𝐶𝑁𝑚=1
𝑃 𝑋𝑖𝑗
=𝑃 𝐶 𝑃 𝑋𝑖𝑗
𝑚 𝐶𝑁𝑚=1
𝑃 0 𝑃 𝑋𝑖𝑗𝑚 0𝑁
𝑚=1 + 𝑃 1 𝑃 𝑋𝑖𝑗𝑚 1𝑁
𝑚=1
𝑋𝑖𝑗
𝑚
is the confidence score for the ith and jth residues to be in contact as predicted by mth contact predictor.
0 =in contact 1 =not in contact
𝑃 0 𝑋𝑖𝑗 =𝑃 0 𝑃 𝑋𝑖𝑗
𝑚 0𝑁𝑚=1
𝑃 0 𝑃 𝑋𝑖𝑗𝑚 0𝑁
𝑚=1 + 𝑃 1 𝑃 𝑋𝑖𝑗𝑚 1𝑁
𝑚=1
Under the “naïve” assumption, the confidence scores from different contact predictors are independent from each other
𝑃 𝐶 𝑋𝑖𝑗 =𝑃 𝐶 𝑃(𝑋𝑖𝑗|𝐶)
𝑃(𝑋𝑖𝑗)
3/20/2017 20
Contact prediction accuracy comparison
0.406 0.341
0.288
0.406 0.432 0.364
0.459
0.709 0.798
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Acc
ura
cy
50 easy targets Top L/5 long range
Accuracy of the prediction: Acc = Ncorr/NT •Ncorr = # of correctly predicted contacts in the contact map •NT = # of predicted contacts in the contact map
0.198 0.167 0.181
0.134 0.119 0.094
0.242
0.312
0.451
0.000.050.100.150.200.250.300.350.400.450.50
Acc
ura
cy
48 hard targets Top L/5 long range
3/20/2017 21
Contact prediction accuracy comparison (all ranges)
Methods
Short (6-11)
Medium (12-24)
Long (>24)
BETACON
0.540 (1×10-9) 0.430 (3×10-10) 0.310 (2×10-12)
SVMSEQ 0.475 (2×10-12) 0.393 (2×10-12) 0.236 (2×10-12)
SVMcon
0.564 (4×10-9) 0.455 (1×10-8) 0.255 (2×10-12)
PSICOV 0.204 (2×10-12) 0.246 (2×10-12) 0.262 (2×10-12)
CCMpred
0.206 (2×10-12) 0.238 (2×10-12) 0.227 (2×10-12)
FreeContact 0.234 (2×10-12) 0.278 (2×10-12) 0.278 (2×10-12)
STRUCTCH
0.605 (3×10-4) 0.487 (4×10-5) 0.353 (2×10-12)
MetaPSICOV 0.576 (5×10-6) 0.572 (5×10-1) 0.515 (2×10-7)
NeBcon
0.651 0.574 0.628
3/20/2017 22
Contact prediction accuracy comparison (long range)
Average ACC of MetaPSICOV = 0.515 Average ACC of NBC = 0.546 P-value= 0.03
Average ACC of NeBcon= 0.628 Average ACC of NBC = 0.546 P-value= 3.5×10-8
3/20/2017 23 He et al., Bioinformatics (2017)
Diversity of contact maps
𝐻 = − 𝑝𝑖 log2 𝑝𝑖
100
𝑖
𝑝𝑖 is the fraction of the top-L contacts at ith cell, where L is the length of the protein
Hmin = 0 All contacts are accumulated in one cell Hmax=6.64 (=log2100) All contacts are evenly distributed when L>100
3/20/2017
Diversity of contact maps Methods Long All
BETACON 2.656 (8.4*10-16) 3.912 (6.9*10-25)
SVMSEQ 3.540 (4.9*10-7) 4.146 (5.6*10-13)
SVMcon 3.289 (1.5*10-16) 3.962 (1.2*10-24)
PSICOV 3.505 (6.2*10-2) 3.959 (1.23*10-2)
CCMpred 4.415 (6.9*10-9) 5.016 (1.1*10-6)
FreeContact 4.478 (4.5*10-10) 4.977 (5.0*10-6)
STRUCTCH 3.477 (2.6*10-8) 4.072 (7.7*10-17)
MetaPSICOV 3.552 (4.0*10-5) 4.217 (9.7*10-6)
NeBcon 3.665 (6.5*10-5) 4.273 (3.3*10-9)
Native 3.815 4.473
3/20/2017 25
C-QUARK: Contact map guided ab initio structure prediction
3/20/2017 27
NeBcon
Knowledge-based
potentials:
C-QUARK in CASP 12 Groups Z
C-QUARK 65.1
Baker-Rosetta 60.3
GOAL 49.9
RaptorX 44.2
ToyPred_email 40.4
Multicom-Novl 19.4
Seok-server 9.2
IntFOLD4 9.1
FFAS-3D 8.4
FALCON_TOPO 6.3
Here, Z-score (Z) represents the significance of the structure predictions by each group compared to the average performance
3/20/2017 28
References • Xu, D., and Zhang, Y., "Ab initio protein structure assembly using
continuous structure fragments and optimized knowledge-based force field," Proteins-Structure Function and Bioinformatics, 80(7), pp. 1715-1735. (2012)
• Xu, D., and Zhang, Y., “Toward optimal fragment generation for ab initio protein structure assembly," Proteins-Structure Function and Bioinformatics, 81(2), pp. 229-239 (2012)
• Zhang et al., "Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11," Proteins, 84, pp.76-86 (2015)
• He, B., Mortuza, S.M., Shen, H., Wang, Y., Zhang, Y. “NeBcon: Protein contact map prediction using neural network training coupled with naïve Bayes classifiers.” Bioinformatics (2017) (In press)
• He, B., Mortuza, S.M., Wang, Y., Zhang, Y. “NeBcon used to improve structure prediction.” (2017) (In preparation)
• Wu, H., Zhang, C., Zhang, Y., “Assemble atomic structure of G protein-coupled receptors from primary sequences”. (2017) (In preparation)
3/20/2017 30
Thank You!!
http://zhanglab.ccmb.med.umich.edu/C-QUARK/
http://zhanglab.ccmb.med.umich.edu/QUARK/
http://zhanglab.ccmb.med.umich.edu/GPCR-AIM/
http://zhanglab.ccmb.med.umich.edu/NeBcon/
top related