the future of data science and bioinformatics -anchored...
TRANSCRIPT
![Page 1: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/1.jpg)
Assoc. Vice-President for Health Sciences & Chief Knowledge OfficerDirector for Precision Health and Cancer Informatics, (Cancer Ctr)Associate Director, BIO5 Institute, Director, Center for Biomedical Informatics and Biostatistics
The Future of Data Scienceand Bioinformatics-Anchored Advances in Human Health
Yves A. Lussier, M.D.Professor of Medicine
Yves @ email.Arizona.edu
Fellow, IGSB, The University of Chicago & Argonne Laboratory
Lead Investigator, Beagle Supercomputing, CI, Un of Chicago & Argonne Lab
orcid.org/0000-0001-9854-1005Scopus Author ID: 7004101223ResearcherID: N-4891-2017
@LussierY
![Page 2: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/2.jpg)
Disclosures: overlapping Interests
Intellectual property Boards at for profit orgs (inactive) Boards non-profit orgs
Institutions with former research trainees
![Page 3: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/3.jpg)
Preview of take home points
• Personal dynamic transcriptomes can be interpreted affordably– challenged in vitro to unveil personal genomic response to environment
• Personal intergenic and non-coding genetic interactions are druggable
– as they determine one’s human transcriptome
![Page 4: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/4.jpg)
Plan
• Precision Medicine Initiative (Bipartisan / White House)• Recent developments
– Dynamics of personal transcriptome & interpretation (GE)
– Mechanisms of non coding and intergenic SNPs unveiled through genetic interactions (GG)
• Route to application
![Page 5: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/5.jpg)
Your Personal Nutrition Test Results
Biochemistry
Genetics
Transcriptome (mRNA)
?! ?!
![Page 6: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/6.jpg)
Problem: unproductive assumptions for discovery of transcriptome biomarkers in common diseases
• 30,000 NIH “biomarker” grants in 25 yrs (> $2.5 billion/year) [1]
o unproductive: only 12 FDA-approved cancer biomarkers (2012-2017)o limited success in clinical practice
• Conventional transcriptome biomarker discovery designed for an average patient : o single biomolecule assumed concordantly altered across patientso patient-specific biomarker signal remains undetected
[1] What is a biomarker? Research investments and lack of clinical integration necessitate a review of biomarkerterminology and validation schema, Scand J Clin Lab Invest Suppl, 2010. 242: p. 6-14.
![Page 7: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/7.jpg)
Insights from mice Genome by Environment interactions (GE)
• Wells A, Barrington W, Threadgill D, Dearth S, Campagna S, Saxton A, Voy B. Gene, Sex and Diet Interact to Control the Tissue Metabolome. The FASEB Journal. 2016 Apr 1;30(1 Supplement):127-2.
• Barrington WT. Individual Variation in Diet Response: Health Effects of Dietary Interventions on Mice with Diverse Genetic Backgrounds. North Carolina State University; 2016.
• Wyatt, Brantley S., et al. "Sex-and Strain-dependent Effects of Bisphenol: A Consumption in Juvenile Mice." JOURNAL OF DIABETES & METABOLISM 7.8 (2016).
![Page 8: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/8.jpg)
Problem : interpreting personal ’omics
![Page 9: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/9.jpg)
Problem: drugs for non-coding genome?
• 3% of the genome is protein-coding– vast majority of drugs target proteins
• 97% of disease-associated polymorphisms– yet, ”It's all druggable”
Nat Genet. 2017 Jan 31;49(2):169. doi: 10.1038/ng.3788• siRNA• CRISPR/CAS9
Nat Genet. 2017 Jan 31;49(2):169
![Page 10: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/10.jpg)
![Page 11: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/11.jpg)
![Page 12: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/12.jpg)
White House (WH) Announces: University of Arizona's (UA) Lussier GroupInvolvement in National Precision Medicine Initiativehttps://uanews.arizona.edu/story/white-house-announces-ua-s-involvement-in-national-precision-medicine-initiative
Precision Medicine Initiative (PMI) Summit White House, 2/25/2016
• UA Drs. Ojo & Lussier funded to recruit 150,000 subjects for DNA Sequencing
• UA/WH Lussier group is launching new precision medicine initiatives• System-wide dissemination of an on-demand "case-based reasoning" system that
intelligently searches and analyzes entire databases of electronic medical records. This will give clinicians the power to develop an individualized and effective treatment plan for unusual or complex clinical conditions, grounded on practice-based evidence.
• expand the clinical utility of its open-source, patient-centric analytic methods to aid physicians in interpreting the dynamic disease-associated gene expression changes (dynamic transcriptome) arising from patients’ own DNA blueprint
• Development of genetic assays to predict an individual's response to therapy and prevention of adverse reactions, termed "pharmacogenomics”.
![Page 13: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/13.jpg)
Convergent pathway deregulation of diseases with distinct causes (genetic, epigenetic, environmental )- coagulopathies can be inherited (Mendelian)- or acquired (GE)
In other words, convergent phenotypes attributable to distinct gene products dysregulation within the same pathway
Coagulopathy
Scalar theory & diseases of molecular pathway deregulation
CoagulopathyClinical Phenotype
Transcriptome DeregulatedCoagulation pathway
Molecular cause- inherited nonsense
or missense protein- acquired low expression
Prop
erty
em
erge
nce
![Page 14: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/14.jpg)
Hemophilia AHemophilia B
Nonsense and missense mutations in hemophilia A*Disruption of the Metabolic Pathways of Inherited Coagulopathies
*Am J Hum Genet. 1988 ;42(5): 718–725
![Page 15: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/15.jpg)
genome/epigenome environment regulation of ”bleeding time” traitAcquired (environmental) Coagulopathies: decrease expression of pathway genes in absence of mutations
Coagulopathy secondary to chronic liver disease
(e.g. cirrhosis of alcohol abuse)
![Page 16: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/16.jpg)
Model for Network MedicineBarabasi et al, Nature Review Genetics, 12, 56-68, 2011.
16Gaps between networks of mechanisms at different scales
![Page 17: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/17.jpg)
Pathway (KEGG)
Genetic (OMIM)
Disease (SNOMED)
Clinical system
(SNOMED)227500 +F7F7 DEFICIENCY
134820 +FIBRINOGEN, AALPHA POLYPEPTIDE
281833003Hematologicalsystem(body structure)
hsa04601Complement andcoagulationcascade
Factor VII deficiency(disorder) 37193007
Hereditary factor XIdeficiency disease(disorder) 49762007
Afbrinogenemia(disorder) 278504009
264900 +F11F11 DEFICIENCY
202400 #FibrinogenAFIBRINOGENIMIA,CONGENITAL
134830 +FIBRINOGEN, BBETA POLYPEPTIDE
is a
diseaseidentity
Sematics (SNOMED)
Summative effect of diverse genetics = similar transcriptome
diseaseidentity
![Page 18: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/18.jpg)
Background: single-subject transcriptome analyses of altered pathways
1. One transcriptome against a reference cohort • Pathifier (PNAS 2013;110:6388); IndividPath (Brief Bioinform 2016;17:78); iPAS (Bioinform2014;30:I422) • Limitation: requires an heterogenic cohort , may lead to false positives and false negatives due to heterogeneity
and distinct environments
2. Two paired transcriptomes (e.g. before & after treatment, over time for a disease, control tissue vs affected tissue)• N-of-1-pathways methods (Lussier Group). Wilcoxon: J Am Med Inform Assoc 2014;21:1015; Mahalanobis
Distance: Bioinformatics 2015;31:i293; ClusterT: Statistical Methods in Medical Research 2017; MIxEnrich: BMC Medical Genomics 2017;10:27; kMen: J Biomed Inform 2017;66:32.
• Advantage: o well controlled biologically o statistically powerful comparison to an isogenic baseline just a in studies of cell lines or isogenic animal
models (e.g. mouse)3. Multiple paired isogenic measures (>2) (e.g. time series replication, etc)• Timevector ( Bioinformatics 2017)• Advantage: more powerful than two paired transcriptomes• Limitation: limited availability of clinical tissue
![Page 19: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/19.jpg)
Gene Expression
Cohort
Case Control
Cohort
Gene Expression
Case / ControlPaired samples
Individual
CommonPathway signature
IndividualPathwaysignature
Gene Expression
Control / CasePaired samples
CommonGene / pathway signature
Transcriptome analysis
![Page 20: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/20.jpg)
Understanding personal disease mechanisms
Design the right tool
![Page 21: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/21.jpg)
II. N-of-1-pathways Mahalanobis Distance (MD)
Deviations from Equality
Schissler G, (…) Piegorsh W, Lussier Y. Bioinformatics J, 2015 Jun 15;31(12):i293-302.
Gene-level Differential Expression
![Page 22: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/22.jpg)
III. N-of-1-pathways k-means clustering and enrichment (kMen)Li Q, (…), Zhang HH, Lussier YA. kMEn: Analyzing noisy and bidirectional transcriptional pathway responses in single subjects. Journal of biomedical informatics. 2017 Feb 28;66:32-41.
![Page 23: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/23.jpg)
Single-subject (SS) pathway-level studies emerging cross-subject pathway signal• Hypothesis: pathway-level signal emerges from heterogeneous dysregulated genes in each patient (responsive genes),
as they coordinate to alter a multi-gene function (e.g. pathway)
• Pathway biomarker Framework:o Identify responsive genes (red & blue below) and altered pathways in each subject (single-subject studies)o Followed by cross-subjects pathway-level statistics
Figure: Three SS studies. Same altered pathway in each patient, discoverable in each single subject study
Patient 1 = SS study 1 Patient 2 = SS study 2 Patient 3 = SS study 3
Simulation parameters:20% responsive genes50% up-regulated genes
![Page 24: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/24.jpg)
Conventional cohort-based analyses (heterogenic)
Cohort Pvalues, effect size
![Page 25: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/25.jpg)
Single-subject study (powerful isogenic conditions)
Single subjectPvalues, effect size(e.g., kMen)
![Page 26: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/26.jpg)
Single-subject studies followed by metanalysis across studies
![Page 27: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/27.jpg)
Evaluation (PSB 2018)
Geneset Size 200N = 30 patients
Legend:Black = SS anchored discoveryRed= Conventional discovery
Prec
ision
Recall
![Page 28: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/28.jpg)
Problem: Accurately predicting the biologic and statistical significance of the deregulation of a pathway from one normal tissue and one cancer tissue of an individual patient’s transcriptome
N-of-1-pathways” unveils personal deregulated mechanisms from a single pair of RNA-Seq samples: towards precision medicine
J Am Med Inform Assoc. 2014 Nov-Dec;21(6):1015-25
TCGA lung adenocarcinoma dataset
![Page 29: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/29.jpg)
Patient individual comparison to external GS(heterogeneity index of dysregulated pathways)
J Am Med Inform Assoc. 2014 Nov-Dec;21(6):1015-25
![Page 30: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/30.jpg)
A genome-by-environment interaction classifier for precision medicine: personal transcriptome response to rhinovirus identifies children prone to asthma exacerbations
Gardeux V, Berghout J, (…), Martinez FD*, Lussier YA*. J Am Med Inform Assoc,2017 ocx069
![Page 31: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/31.jpg)
A genome-by-environment interaction classifier for precision medicine: personal transcriptome response to rhinovirus identifies children prone to asthma exacerbations
Gardeux V, Berghout J, (…), Martinez FD*, Lussier YA*. J Am Med Inform Assoc,2017 ocx069
![Page 32: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/32.jpg)
32
AsthmaResults
Virogramassay
+Classifier
prediction
Clinicalphenotype
![Page 33: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/33.jpg)
Comparison of Gene Set Analytic Methods Applied to Single Subjects
DEG Enrich /GSEA
N-of-1-pathwaysMD
pathway testing
pathway visualization
N-of-1-pathwaysWilcoxon
Sing
le S
ubje
ct
Anal
ysis clinically
relevant
unpaired samples
paired samples
FAIME
~
Transcript-level interpretation
DEG / DESeq
![Page 34: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/34.jpg)
II. N-of-1-pathways Mahalanobis Distance (MD)
Deviations from Equality
Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within path-ways of single subjects predicts breast cancer survival. Bioinformatics J, 2015 Jun 15;31(12):i293-302.
![Page 35: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/35.jpg)
N-of-1-pathways Mahalanobis Distance Producing a Clinically Relevant Metric (CRM)
Gene-level Differential Expression
(CRM)
Pathway-level Dysregulation
Interpreted as (adjusted) pathway log fold-changeNormal, N (Log2 Expression)
Tum
or, T
(Log
2Ex
pres
sion)
Gene j
Gene m
dj
Pathway Bivariate Gene Expression
![Page 36: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/36.jpg)
Analysis of aggregated cell-cell statistical distances within pathways unveils therapeutic-resistance mechanisms in circulating tumor cells.
Schissler AG, Li Q, (…) Billheimer D, Li H, Piegorsch WW, Lussier YA*. Bioinformatics, 2016 Jun 15;32(12):i80-i89
![Page 37: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/37.jpg)
Case study: Drug resistance in CTCs
Schaefer et al., Nucleic Acids Research, 2009
EZT-Naïve(41 CTCs)
EZT-Resistant(36 CTCs)
13 Patients77 CTC RNA-seq1 to 12 per patient
scRNA-seq Data of CTCsfrom prostate cancer patients
Knowledge Base Pathway Interaction Database (PID)
187 pathways w/ at least15 gene products andhigh curation confidence
Signaling pathways oftenimplicated in cancer
Miyamoto et al., Science, 2015
![Page 38: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/38.jpg)
Single-cell statistics for precision medicine in circulating tumor cells (CTCs)Bioinformatics. 2016 Jun 15;32(12):i80-i89
Examples: 2 distinct subjects
![Page 39: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/39.jpg)
Comparison of single-cell RNA-seq analysis methods
scLatent Variable
Model1
Aggregatedcell-cell distances
Cell-centric stats & viz
SCDE2 + Enrichment
Small # of cells
2-sample DEP tests
2-sample DEG tests
Identify cell subpopulations
1Buettner et al., Nature Biotech, 20152Kharchenko et al., Nature Methods, 2014
![Page 40: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/40.jpg)
1Ceter for Biomedical Informatics and Biostatistics, 2Bio5 Institute, 3Department of Medicine, 4Graduate Interdisciplinary Program in Statistics, 5Department of Mathematics , 6Cancer Center, The University of Arizona7Institute for Genomics and Systems Biology, The University of Chicago §Equal contribution
Visualization: Single Patient Deregulated Pathways
Isogenic single subject method MixEnrich is more accurate than heterogenic cohort-based using larger samples (FDR 5%)Cohort-based methods were performed across 3, 6 and 12 patients (Pt). The gold standard was built using paired head and neck adenocarcinoma vs non-tumor tissue in 45subjects
![Page 41: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/41.jpg)
Haiquan Li, Ikbel Achour, Lisa Bastarache, Joanne Berghout, Vincent Gardeux, Jianrong Li, Younghee Lee, Lorenzo Pesce, Xinan Yang, Kenneth S. Ramos, Ian
Foster, Joshua C. Denny, Jason H. Moore, and Yves A. Lussier
Integrative genomics analyses unveil downstream biological effectors of disease-specific polymorphisms
buried in intergenic regions
Nature partner journals, Genomic MedicineArticle number: 16006 (2016)
![Page 42: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/42.jpg)
Intergenic SNP Intragenic SNP
RNA
Protein
DownstreamBiologicalMechanisms?
What is the central dogma ofnon-coding (dark) DNA?
Pathway
![Page 43: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/43.jpg)
Intergenic SNP
EnhancerDNaseI seq
Transcription Factorbinding sites (TFBS)ChIPseq
Suppressor
The Encyclopediaof DNA Elements
Measure + Annotation
Chromatin interactionChIA-PET
Enhancer
![Page 44: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/44.jpg)
Intergenic SNP
EnhancerDNaseI seq
Transcription Factorbinding sites (TFBS)ChIPseq
Suppressor
Chromatin interactionChIA-PET
Enhancer
StudiesGWAS SNPs in
Regulatory Elements
![Page 45: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/45.jpg)
Intergenic(noncoding) SNPs
Intragenic SNPs
shared biological
mechanisms?
Our Hypothesis
SNPs associated to same disease
![Page 46: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/46.jpg)
SNP1 Disease1GWAS association
SNP2 Disease2GWAS association
shared biological mechanisms between disease risks loci?
![Page 47: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/47.jpg)
Genome-wide expression Quantitative Trait Loci (eQTL)Cheung et al. Natural variation in human gene expression assessed in lymphoblastoid cells. Nature Genetics - 33, 422 - 425 (2003)
C
GCC CG GG
Gen
e Ex
pres
sion
Population Sample
polymorphism
![Page 48: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/48.jpg)
• The transcription of a gene is governed by DNA binding transcription factors (TFs) that switch the gene on or off that are modulated by promoters and enhancers.
• Cis effect eQTL: an intragenic polymorphisms might have a clear effect on the expression of a nearby gene (e.g. mutation in a promoter region)
• Trans effect eQTL: An intergenic (noncoding) polymorphism may affect the transcription (expression) of distant genes (even on other chromosomes, e.g. distal enhancer)
Causal mechanisms for eQTLIn pursuit of design principles of regulatory sequences. Nature reviews. Genetics. 2014, 5;7
![Page 49: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/49.jpg)
SNP1 Disease1GWAS association
SNP2 Disease2GWAS association
eQTL
as
soci
atio
neQ
TL
asso
ciat
ion
mRNAx
mRNAz
(1) shared mRNAs or
(2)similar function
(1) shared mRNAs or
(2)similar function
![Page 50: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/50.jpg)
Intergenic SNP1
RNAx RNAy
eQTL
shared mRNAs?
eQTL
Intragenic or intergenic SNP
![Page 51: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/51.jpg)
Methods: Significance of shared mRNA over-representation between two SNPs
SNP1 SNP2
mRNAy
mRNAx
mRNAv
mRNAz
• Conservative permutation resampling keeping node degree of eQTL-associated mRNAs to GWAS-associated SNPs constant
• Hypergeometric distribution considered anticonservative
![Page 52: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/52.jpg)
Intergenic SNP Intragenic SNP
RNA
Proteindownstreambiological mechanism ?
Pathway
RNA
eQTL
![Page 53: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/53.jpg)
Intergenic SNPIntragenic
or intergenic SNP
RNA
Protein
Pathwayk
RNA
eQTL
Gene ontology
Pathwayi Identity or similarity
eQTL
Gene ontology
Identity?
Protein
![Page 54: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/54.jpg)
Methods: from gene expression to pathway imputation using ontologies
Gene Ontology - gene sets annotated to biological processes- organized as a directed acyclic graph
![Page 55: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/55.jpg)
Methods: Reducing Network size with Information Theoretic Semantic Similarity
SNP1 SNP2
mRNA
mRNA
mRNA
mRNA
mRNA
Share biologicalmechanisms ?
GO GO
![Page 56: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/56.jpg)
– Common theoretical bases• Information content of a
concept c:
• Shared information content between a and b:
a b c
e (3)d (2)
f (5)
ms(a,b) is d
p(c) is the occurrence frequency of c and all its descendants in the dataset
ms: minimal subsumer, the common ancestors with minimal descendants- log p(ms(a,b))
- log p(c)
simLin(a,b)= – 2 log p(ms(a,b))– log p(a) –log p(b)
Methods: 1st level - Semantic Similarity in DAG Ontologies (e.g. GO)Lin D. An information-theoretic definition of similarity; Proc 5th Int Conf Machine Learning (ICML'98); 1998; 296–304.
![Page 57: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/57.jpg)
SimLin(a, b)=
a. Double-stranded DNA binding b. Single-stranded DNA binding
Gene Ontology
… …Binding
…
… …
82,678
19
2,034
33
Minimal Subsumer
Note: the number beside each concept is the number of occurrences of the concept and all its descendants in a studied GOA file
similarity score between two concepts (Lin et al.)
![Page 58: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/58.jpg)
– Eliminates comparison of irrelevant concepts– Standardized (maximum=1; using Lin’s equation)
mRNAA (geneset A)
P : the set of paired-up conceptst : the threshold value to filter out noise|A|+|B|: the total number of concepts in
group A and B
Methods: 2nd level - similarity between Two groups of GO terms associated to mRNAs*
* Tao Y, … Lussier Y, Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics. 2007 Jul 1;23(13):i529-38.
mRNAB (geneset B)
![Page 59: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/59.jpg)
Approach
Methods: 3rd level - nested information theoretic similarity: SNP (mRNAs[GO])
• SNPs1 is associated with a set of mRNAs G(s1),
• |G(s1)| is the cardinality of the set G(s1),
• GENEITS is the information theoretic biological similarity of two mRNAs*
• The SNP_ITS provides a score that ranges from 0 to 1; a value of 1 indicated two SNPs with common GO–MFs or GO–BPs, and a value of 0 corresponded to two SNPs with unrelated GO–BPs or GO–MFs.
* Bioinformatics. 2007 Jul 1;23(13):i529-38.
Li (…) Lussier. Nature partner journals, Genomic Medicine; 16006 (2016)
![Page 60: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/60.jpg)
Methods: Reducing Network size with GO Information Theoretic Semantic Similarity (ITS)
SNP1 SNP2
mRNA
mRNA
mRNA
mRNA
mRNA
GO GOGO_ITS=0.7
GO_ITS=1
![Page 61: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/61.jpg)
Methods: Reducing Network size with mRNA Information Theoretic Semantic Similarity (ITS)
SNP1 SNP2
mRNA
mRNA
mRNA
mRNA
mRNA mRNA_ITS=0.7
mRNA_ITS=0.7
![Page 62: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/62.jpg)
Methods: Reducing network size with 3 nested similarites: SNP_ITS{ mRNA_ITS [GO_ITS] }
SNP1 SNP2SNP_ITS=0.88
Li (…) Lussier. Nature partner journals, Genomic Medicine; 16006 (2016)
![Page 63: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/63.jpg)
Intergenic SNP1Intragenic
or intergenic SNP2
RNA
Protein
Pathwayz
RNA
eQTL
Gene ontology
same disease
GWAS1 GWAS2
Pathwayi identity or similarity
eQTL
Gene ontology
identity
Protein
![Page 64: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/64.jpg)
Methods: Assessing the pvalues of SNP_ITS{ mRNA_ITS [GO_ITS] }
Li (…) Lussier. Nature partner journals, Genomic Medicine; 16006 (2016)
• Scale-free permutation resampling: node degree of each GO terms, each mRNA, each SNP remains constant in each permutation, simply with different molecules connected to each other.
• Theoretical network over-representations/enrichment calculations were shown anticonservative.
• ~ 20,000,000 core hours of high-throughput computations were conducted• Beagle GLOBUS61 GRID computing • Cray XE6 Supercomputer of the Computation Institute at the Argonne National
Laboratory (http://beagle.ci.uchicago.edu/).• peak performance of 151 teraflops generated by 17,424 compute cores
![Page 65: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/65.jpg)
Q-Q plot
Methods: big data / multiscale permutation resampling were performed for imputing complex disease mechanisms in our scale-free network. Theoretical distributions are anti-conservative
theoretical distribution (Fisher’s Exact; FDR)0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0
Hexbin
Mul
ti-sa
le p
erm
utat
ion
resa
mpl
ing
(FDR
)
![Page 66: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/66.jpg)
Approach: Input
~ 2 MillionLead Pairs
1092 intergenic SNPs1266 intragenic SNPsassociated to 467 diseases6301 mRNAs eQTLs
GWAS
![Page 67: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/67.jpg)
GWAS (NHGRI)SNPs disease
ENCODE
ChIP-seq, ChIA-PETRegulome-DB
Datasets
eQTLs (LCLs, Liver)SNPs mRNAs
Gene OntologyGene/mRNA Molecular Function (MF)Gene/mRNA Biological Process (BP)
Protein-Protein Interaction (PPI, STRING)
![Page 68: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/68.jpg)
Approach
Approach: Input, excludes SNPs in Linkage Disequilibrium (LD)
Pairwise Analysis
~ 2 MillionLead Pairs (LD r2<0.8)
![Page 69: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/69.jpg)
Approach
Approach overview
Pairwise Analysis Prioritization Shared mechanisms
~ 2 MillionLead Pairs
BiologicalKnowledge(Gene Ontology)
![Page 70: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/70.jpg)
Approach
Approach overview
Pairwise Analysis Prioritization Shared mechanisms
ValidationShared genetics & molecular mechanisms
~ 2 MillionLead Pairs
BiologicalKnowledge(Gene Ontology)
1-GeneticInteractionvalidation
2-EncodeValidation
![Page 71: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/71.jpg)
Approach
Result 1.2 – Prioritized Lead SNP-pairs?
at FDR<0.05 Count of distinct Lead SNPs among5011 Lead SNP pairs sharing similar mechanisms
![Page 72: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/72.jpg)
Result 1.4 Cicos plot of SNP-pairs prioritized within the same diseaseshowing SNP sharing mechanism across chromosomes!
![Page 73: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/73.jpg)
Result 1.5 – SNP-SNP network
![Page 74: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/74.jpg)
Result 1.5 – SNP-SNP network across diseases within class
![Page 75: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/75.jpg)
Approach
Result 1.6 – Prioritized Lead SNP-pairs?
at FDR<0.05 Same mechanisms shared among prioritized SNP pairs associatedwith same disease
![Page 76: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/76.jpg)
Result 2.1 – Prioritized Lead SNP-pairs for Rheumatoid Arthritis (RA)
![Page 77: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/77.jpg)
Approach
Result 2.3 – Example of Lead SNPs sharing mechanism in RA
![Page 78: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/78.jpg)
Approach
Result 2.4 – Disease specific enrichment?
Contingency tableOdds ratiop-value
![Page 79: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/79.jpg)
Approach
Result 2.5 – Disease-specific enrichment?
> Prioritized SNP pairs sharingsimilar mechanisms are morelikely to be associated to thesame disease to unrelatedpathologies
![Page 80: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/80.jpg)
Genetic Validation
Result 3 – Genetic Interaction validation in RA PheWAS
![Page 81: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/81.jpg)
Disease Prioritized SNP pairs SNPs with synergistic effects
EntropyP-value
Alzheimer’srs4509693–rs753129
(chr10, inter) (chr4, inter)rs7081208*–rs9331888*
(chr10, FRMD4A) (chr8, CLU, MIR6843)
rs4509693–rs753129–rs7081208* 0.046
Bladder cancer
rs8102137–rs1014971(chr19, inter) (chr22, inter) rs8102137–rs1014971 0.039
Result 3 – Genetic Interaction validation in GWAS
Non-additive genetic interaction of prioritized inter–inter and inter–intra Lead SNP pairs validated in independent GWAS studies
![Page 82: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/82.jpg)
IntragenicIntergenic SNP
Enhancer Transcription Factorbinding sites (TFBS)
Suppressor Enhancer
RNA
Protein
Pathway
Gene
Intergenic SNP
Result 4 – ENCODE validation
![Page 83: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/83.jpg)
Approach Result 4 – ENCODE validation datasets
Legend: TF=transcriptional factor PPI=protein-protein interaction network (STRING)ChIP-seq= ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing (seq) to identify the binding sites of DNA-associated proteins.ChIA-PET= Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) is a technique that incorporates chromatin immunoprecipitation (ChIP)-based enrichment, chromatin proximity ligation, Paired-End Tags (PET), and High-throughput sequencing to determine de novo long-range chromatin interactions genome-wide.
![Page 84: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/84.jpg)
Approach Result 4 – ENCODE validation, any prediction of SNP-pair
> Prioritized Lead SNPs are in enriched in similar regulatory elements
![Page 85: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/85.jpg)
Conclusion
1- most likely affect:> expression of the same mRNAs> mRNAs involved in similar biological pathways> mRNAs governed by similar regulatory mechanisms
(e.g. chromatin interactions)2- as many as 40% could display synergetic and antagonistic genetic interactions with SNPs of the same disease or those of another one
3- provide druggable targets downstream of noncoding intergenic SNPs for novel common disease risk prevention and treatment
intergenic and intragenic SNPs associated to the same disease
![Page 86: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/86.jpg)
Take home message• Therapies for disease risks found in the intergenic genome– by extending the central dogma of molecular biology using Bois’ information theory– pathologic mechanisms identified at the convergence of eQTL signals– Validations of convergent signal
• overrepresentation of drug bank targets• Overrepresentation of ENCODE mechanisms• diseases sharing intergenic mechanisms found comorbid in EHRs• predicts novel genetic interactions in Alzheimer’s and in bladder cancer confirmed in GWAS
• Genome-by-environment interaction (G E) assays & interpretation:– enables single-subject studies– smaller classifier sizes– democratizing costs: pathway-level signal using qPCRs and representative transcript
![Page 87: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/87.jpg)
Funding
BEAGLE Cray Super Computer
R01-LM010685 K22-LM008308 LM009012LM010098LM010685
CTSA - UL1TR000050 CTSA - UL1TR000445
UL1RR024975
NCI P30CA0230741S10RR029030-01
![Page 88: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/88.jpg)
Jianrong Li李建荣, MSc
Ikbel Achour, PhDHaiquan Li 李海泉,PhD
Xinan Yang, PhDYounghee Lee, PhD
Students
Vincent Gardeux, PhD Joanne Berghout, PhD
Yves Lussier, MD Grant Schissler Qike Li 李奇科 Nima Pouladi
Collaborators
J Moore, PhDI Foster, PhD
WW Piegorsch, PhDD Billheimer, PhD JL Chen, MD
JC Denny, PhD
Colleen Kenost, EdD
![Page 89: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/89.jpg)
Joanne Berghout, PhD Haiquan Li, PhD
Grant Schissler, MS Qike Li, MS
Ikbel Achour, PhD Don Saner, MSc Jianrong Li, MScColleen Kenost, MSc
![Page 90: The Future of Data Science and Bioinformatics -Anchored ...allergen.ca/wp-content/uploads/Lussier-slides.pdf · discovery of transcriptome biomarkers in common diseases • 30,000](https://reader034.vdocuments.us/reader034/viewer/2022050110/5f4797f7b1a02522a22279ad/html5/thumbnails/90.jpg)
Jianrong Li, MSc
Ikbel Achour, PhDHaiquan Li, PhD
Joanne Berghout, PhD
Yves A. Lussier, MD
Nima Pouladi, MD, PhD Qike Li, MSA. Grant Schissler, MS
Colleen Kenost, EdD
@LussierY@UA_CB2#Genetics#Genomics#Bioinformatics#PrecisionMedicine
[email protected] post-doc fellows& Research Assistant Professors