dat dc 11 hiv machine learning
TRANSCRIPT
![Page 1: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/1.jpg)
Genetic predictors of antibody-specific neutralization of HIV
Machine learning for HIV vaccine development
![Page 2: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/2.jpg)
The Problem
Identify binding site (epitope) of an antibody that is capable of neutralizing HIV
![Page 3: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/3.jpg)
Image: Wikipedia.com
![Page 4: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/4.jpg)
Image: Burton et al., 2012
![Page 5: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/5.jpg)
The Observation• Virus strains with variable genetic sequences
are neutralized/not neutralized by specific antibodies to varying degrees
The Assumption• Genetic variation is causative of this observed
variation in neutralization
![Page 6: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/6.jpg)
HIV Genetic & Functional Variation
Image: bnaber.org HIV Strain
![Page 7: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/7.jpg)
The Approach
1) Model neutralization/non-neutralization as a function of genetic features (classification)
2) Perform feature selection to identify the most predictive genetic features
3) Plug selected features into secondary predictive model to validate selection
4) Test hypothesis against a) existing literature b) laboratory test methods
![Page 8: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/8.jpg)
Feature Vectorization
• Position/residue pairs– Ex: 789=K
• Potential N-Linked Glycosylation Sites– Regex (N[^P][ST])– Ex: 197=PNGS
![Page 9: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/9.jpg)
Naïve Bayes(… first swing & a miss...)
ROC AUC: 0.887 Log Loss: 3.77
![Page 10: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/10.jpg)
Expected predictive features
![Page 11: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/11.jpg)
Feature Selection
• Trimming data set• Decision tree• Random forest
![Page 12: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/12.jpg)
Feature Selection
![Page 13: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/13.jpg)
Feature Selection
Decision tree w/ ROC AUC
![Page 14: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/14.jpg)
Validation with Logistic Regression
![Page 15: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/15.jpg)
Analysis Across 4 AntibodiesAntibody Most predictive
features supported by literature
Model MCC Literature MCC (Gananakananalnasaanan)
2F5 1) 789=K2) 791=A
0.83 0.81
PG9 1) 197=PNGS198=V vs. 198=1
0.53 0.43
VRC01 1) 561=R2) 564=G3) 587=E4) 359=N
0.51 N/A
2G12 1) 363=PNGS2) 408=N, 411=S *3) 479=PNGS
0.66 N/A
![Page 16: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/16.jpg)
Future Directions
• More sophisticated feature vectorization– Chemically similar amino acids– Pairwise features– Small chunks of sequence (n-grams)– Structural modeling
• Better feature selection – Minimum Redundancy Max Relevance (mRMR)– Correct for cross-clade correlations
• Regression model
![Page 17: Dat DC 11 HIV Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a64d1e1a28ab6e368b5f97/html5/thumbnails/17.jpg)
Sources• Bnaber database: http://www.bnaber.org/
• Burton, Dennis R., et al. "Broadly neutralizing antibodies present new prospects to counter highly antigenically diverse viruses." Science 337.6091 (2012): 183-186.
• Chuang, Gwo-Yu, et al. "Residue-level prediction of HIV-1 antibody epitopes based on neutralization of diverse viral strains." Journal of virology 87.18 (2013): 10047-10058.
• Gnanakaran, S., et al. "Genetic signatures in the envelope glycoproteins of HIV-1 that associate with broadly neutralizing antibodies." PLoS Comput Biol6.10 (2010): e1000955.
• Hepler, N. Lance, et al. "IDEPI: Rapid Prediction of HIV-1 Antibody Epitopes and Other Phenotypic Features from Sequence Data Using a Flexible Machine Learning Platform." PLOS Comput Biol 10.9 (2014): e1003842.
• LANL HIV database CATNAP tool: http://www.hiv.lanl.gov/components/sequence/HIV/neutralization/user.comp
• Libbrecht, Maxwell W., and William Stafford Noble. "Machine learning applications in genetics and genomics." Nature Reviews Genetics 16.6 (2015): 321-332.
• Pillai, Satish K., et al. "Semen-specific genetic characteristics of human immunodeficiency virus type 1 env." Journal of virology 79.3 (2005): 1734-1742.
• West, Anthony P., et al. "Computational analysis of anti–HIV-1 antibody neutralization panel data to identify potential functional epitope residues."Proceedings of the National Academy of Sciences 110.26 (2013): 10598-10603.