computational prediction of protein-dna interactionscomputational prediction of protein-dna...
TRANSCRIPT
![Page 1: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/1.jpg)
Computational Prediction of Protein-DNA Interactions
Xide Xia Advisor: Dr. Mohammed AlQuraishi
![Page 2: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/2.jpg)
Position Weight Matrix (PWM)
PWMs are often represented graphically as sequence logos.
![Page 3: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/3.jpg)
Phase 1. Select a Model
![Page 4: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/4.jpg)
Protein DNAAtom pairs: {Protein atom, DNA atom} 27*37 = 999
![Page 5: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/5.jpg)
3DNA: Base mutation ➔ Input Feature Vector:
Output vector P: PWM
999
N* 999
4* N* 999
![Page 6: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/6.jpg)
Kullback–Leibler divergence (KLD)
KLD = 1.3
KLD = 4.4
PWM (P)
Prediction (Q)
PWM (P)
Prediction (Q)
![Page 7: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/7.jpg)
Define: X (Length = 999*N)
4* (999 * N)
➔
Custom Model (Similar to Multinomial Logistic Regression)
Prediction Q
Goal: Minimizing
![Page 8: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/8.jpg)
cvx_begin variable X Q = exp(X * phi) penalty = KL_div(P, Q)
minimize (penalty + |X|) cvx_end
CVX (a Matlab-based modeling system for convex optimization)
Data PWM Protein Sequence
Protein Structure
DNA Structur
ePhase 1 ✓ ✓ ✓ ✓
Data (size = 700)
Result
Nbins=2 Nbins=3 Nbins=4 Nbins=5
KLD 2.5421 2.4352 2.5435 2.5641
Binwidth = 1.3Å
![Page 9: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/9.jpg)
Phase 2. Train Model on More Structure Data
3d-footprint
PDB name & Protein ID
PWM
Data PWM Protein Sequence
Protein Structure
DNA Structur
ePhase 1 ✓ ✓ ✓ ✓
Phase 2 ✓ ✓ ✓ ✓
(size: 1200 * 10 ≈ 12,000)
![Page 10: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/10.jpg)
Fit the PWM along the DNA strand
![Page 11: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/11.jpg)
Best structural location of PWM
-12.876
-0.8429
![Page 12: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/12.jpg)
New Result
Lambda 0.1 0.05 0.01 0.005 0.0025
Nbin=3 1.9854 1.9145 1.5917 1.4569 1.3290
Nbin=4 1.9602 1.8704 1.4684 1.3089 1.1848
Predicting Protein-DNA Interactions based on Structures
Binwidth = 1.3Å
Old data
2.4352
2.5435
![Page 13: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/13.jpg)
Phase 3. Train Model on Sequence PWM Protein
SequenceProtein
StructureDNA
Structure
Phase 1 ✓ ✓ ✓ ✓
Phase 2 ✓ ✓ ✓ ✓
Phase 3 ✓ ✓ ✗ ✗
CIS-BP
PWM
Protein Sequence
There are many databases available such as CIS-BP, UniProbe, and JASPAR.
(size: 5000 * 10 ≈ 50,000)
![Page 14: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/14.jpg)
HHalign
![Page 15: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/15.jpg)
Modeller
Green: Template Red: New structure
>1bdmA structureX: 1bdm.pdb MKAPVRVAVTGAAGQIGYSLLFRIAAGEMLGKDQPVILQLLEIPQAMKALEGVVMELEDCAFPLLAGLEATDDPDVAFKDADYALLVGAAPRLQVNGKIFTEQGRALAEVAKKDVKVLVVGNPANTNALIAYKNAPGLNPRNFTAMTRLDHNRAKAQLAKKTGTGVDRIRRMTVWGNHSSIMFPDLFHAEVDGRPALELVDMEWYEKVFIPTVAQRGAAIIQARGASSAASAANAAIEHIRDWALGTPEGDWVSMAVPSQGEYGIPEGIVYSFPVTAKDGAYRVVEGLEINEFARKRMEITELLDEMEQVKAL--GLI*
>TvLDH sequence:TvLDH MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKAAFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPENFSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFAQGG*
![Page 16: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/16.jpg)
Method 1 ( Naive)
Method 2 ( similar to EM Algorithm)
1. Among all the output structures of HHalign, select all templates that have the probability to be a true positive higher than 0.9. Take them as inputs of Modeller.
2. Among all the output structures of Modeller, select the one with highest model score to be the “simulated” structure of input protein sequence.
1. Among all the output structures of HHalign,
select all templates that have the probability to be a true positive higher than 0.9. Take them as inputs of Modeller.
2. Among all the output structures of Modeller, select the one has minimal KL divergence result with true PWM on the proposed model.
3. Run the model on selected data and adjust the optimal parameters.
4. Repeat 2~3 until the model is always selecting the same structures. (Converge!)
New constructed structures
New model
N
Y
![Page 17: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/17.jpg)
Application
• Make prediction on interaction when mutation caused by diseases happen.
• Mutation on proteins (trans) • Mutation on DNA (cis)
![Page 18: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/18.jpg)
Acknowledgements
• Dr. Mohammed AlQuraishi • Prof. Peter Sorger • Saroja Somasundaram • Samuel Cho • All LSP/HiTS Lab members
![Page 19: Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA Interactions Xide Xia Advisor: Dr. Mohammed AlQuraishi. Position Weight Matrix (PWM) PWMs](https://reader030.vdocuments.us/reader030/viewer/2022040322/5e6329b14bdb5a749a58620b/html5/thumbnails/19.jpg)
Thank you!