f inding c onsistent s ubnetworks across m icroarray dataset fan qi gs5002 journal club
TRANSCRIPT
![Page 1: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/1.jpg)
FINDING CONSISTENT SUBNETWORKS ACROSS MICROARRAY DATASETFan Qi
GS5002 Journal Club
![Page 2: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/2.jpg)
2
OUTLINE
Introduction
Methodology
Results & Discussions
Conclusions
![Page 3: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/3.jpg)
3
INTRODUCTION
Identify Differential Gene Expression Identify significant genes w.r.t a phenotype
Importance: Testing effectiveness of treatment Biological insights of diseases Develop new treatment Disease Prophylaxis Any others ?
![Page 4: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/4.jpg)
4
CURRENT METHODS
Individual Genes Search for individual differentially expressed
genes Fold-change, t-test, SAM
Gene Pathway Detection Looking at a set of genes instead of individual
genes Bayesian learning and Boolean network learning
Gene Classes Adding existing biological insights Over-representation analysis (ORA), Functional
Class Scoring(FCS), GSEA, NEA, ErmineJ
![Page 5: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/5.jpg)
5
CHALLENGE
Different Results from Different Dataset of the SAME disease!
Zhang M [1] demonstrated inconsistency in SAM:Datasets DEGs POG nPOG
Prostate cancer
Top 10 0.3 0.3
Top 50 0.14 0.14
TOP 100 0.15 0.15
Lung cancer
Top 10 0.00 0.00
Top 50 0.20 0.19
TOP 100 0.31 0.30
DMD
Top 10 0.20 0.20
Top 50 0.42 0.42
TOP 100 0.54 0.54
Reconstruct from Table 1 in [1]
Inconsistencyamong datasets
![Page 6: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/6.jpg)
6
NEW APPROACH
SNet [2] Proposed in 2011 Utilize gene-gene relationship in analysis
Gene-gene relationship Activates VS. Inhibits
Gene Subnetwork Gene is the Vertex, Relationship is an edge
From Fig 1 in [2]
RHOA VAVPIK3R
2
ARHGEF1
RAC1 IQGAP1 Partially adapted
from Fig 2 in [2]
![Page 7: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/7.jpg)
7
METHODOLOGY
Input: Genes labeled with phenotype
Gain from microarray experiment
Third-party Info: Gene Pathway Info Gene Reaction Info
Attributes of Subnetwork Size, Score
Output: A set of significant sub-network
Subnetwork
Extraction
Subnetwork
Scoring
Subnetwork
Significance
![Page 8: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/8.jpg)
8
METHODOLOGY –STEP 1
P3 P2P1
Phenotypes
……..
Patient’s Gene Ranked List
![Page 9: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/9.jpg)
9
METHODOLOGY –STEP 1
P1 P1
Only top genes is kept
for patient
Repeat for every phenotype group
![Page 10: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/10.jpg)
10
METHODOLOGY –STEP 1
P1 (d)
Select one phenotype as others as
select genes occur in of patients
𝛽=50
𝐺𝐿
P1 P1 P1 P1
…….
![Page 11: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/11.jpg)
11
METHODOLOGY –STEP 1
Partition into multiple pathwaysGenerate Subnetwork
𝐺𝐿
………
𝑎1
𝑎5𝑎3
𝑎4 𝑎7
𝑎6
𝑎2
𝑎1
𝑎5𝑎3
𝑎4 𝑎7
𝑎6
𝑎2
A list of Subnetworks w.r.t
![Page 12: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/12.jpg)
12
METHODOLOGY – STEP 2 For each Subnetwork in in the and Patient ,
compute overall expression level: = , where a gene in that is highly expressed in # patients in who have highly expressed : total # patients in
For Patients and compute t-test
𝑆 𝑠𝑝𝑠𝑝 ,𝑑=¿𝑆𝑁𝑒𝑡𝑠𝑝 ,1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛>¿
𝑆 𝑠𝑝𝑠𝑝 ,¬𝑑=¿𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛+1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,𝑛+2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑚>¿𝑆𝑆𝑝 𝑠𝑝 , 𝑡
T test
Assign to each Subnetwork
𝑎1
𝑎5𝑎3
𝑎4 𝑎7
𝑎6
𝑎2
P1 (d)
![Page 13: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/13.jpg)
13
METHODOLOGY – STEP 3
A. Randomly Swap Phenotype labels of patient, recreating subnetworks and t-test scores (step 1-2)
B. Repeat [A] for 1,000 permutations.• Forms a 2-D histogram ()
C. Estimate the nominal p-value of each Subnetwork
D. Select Subnetwork with -Null-hypo: subnetwork with is not significant
Fig 5 in original paper
![Page 14: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/14.jpg)
14
RESULTS AND DISCUSSIONS
Dataset: Leukemia: Golub VS Armstrong ALL: Ross VS Yeoh DMD: Haslett VS Pescatori Lung: Bhattacharjee VS Garber
Performance Comparison: Subnetwork Overlap (with GSEA) Gene Overlap (GSEA, SAM, t-Test)
Other Comparisons: Network Size, Gene Validity with t-Test
![Page 15: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/15.jpg)
15
RESULTS AND DISCUSSIONS
Subnetwork Overlap
Disease Dataset 1 Dataset 2 SNET GSEA SNET
GSEA
Leukemia Golub Armstrong
83.33% 0% 20 0
ALL Ross Yeoh 47.63% 23.1% 10 6
DMD Haslett Pescatori 58.33% 55.6% 7 10
Lung Bhattacharjee
Garber 90.90% 0% 9 0
Synthesized from Table 1, 2 from [2]Higher the better
![Page 16: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/16.jpg)
16
RESULTS AND DISCUSSIONS
Gene Overlap
Disease Snet GSEA T-Test (p <0.05)
T-Test(top)
SAM(p <0.05)
SAM(top)
Leukemia 91.30% 2.38% 73.01% 14.29% 49.96% 22.62%
ALL 93.01% 4.0% 60.20% 57.33% 81.25% 49.33%
DMD 69.23% 28.9% 49.60% 20.00% 76.98% 42.22%
Lung 51.18% 4.0% 65.61% 26.16% 65.61% 24.62%
Synthesized from Table 3, 4,5 from [2]Higher the better
![Page 17: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/17.jpg)
17
RESULTS AND DISCUSSIONS
Size of subnetworks
Disease T-Test SNet
Size of Network 2 3 4 5 5 6 7 >8
Leukemia 84 8 1 0 0 2 3 2 1
Subtype 75 5 1 1 1 1 0 1 6
DMD 45 3 1 0 0 1 0 0 5
Lung 65 3 2 1 0 5 3 0 1
Reconstructed from Table 6 from [2]
![Page 18: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/18.jpg)
18
RESULTS AND DISCUSSIONS
Validity Compare the genes in EACH Subnetwork with
those in t-test Genes in each Subnetwork appears in T-Test is
around 70%- 100% Selected Results (too large to present full) Subnetwork Name Percentage Subnetwork Name Percentage
Leukaemia_B Cell-VAV1 81.82% SNET_CTNNB1 100%
Leukaemia_UBC 100% SNET_TNFSF10 60%
Leukaemia_RAC1 57.15% SNET_PYGM 60%
DMD_RHOA 75% DMD_ACTB 83.33%
DMD_SDC3 88.89% Leaukaemia_POU2F2 75.00%
MLLBCR_ACAA1 28.67% BCR_T_RASA1 44.44%
MLLBCR_BLNK 72.73% BCR_ABL1 75.00%
SNET_NOTCH3 100% DMD_CALM1 80%
Selected from Table 7,8,9,10 in[2]
![Page 19: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/19.jpg)
19
CONCLUSIONS
Traditional Methods have inconsistency problem across different dataset of the same disease
SNet utilize Biological insights to mitigate the gap Gene-to-Gene relationship Gene Pathway knowledge
SNet shows better results than established algorithms More consistent
![Page 20: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/20.jpg)
20
REFERENCES [1] Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D,
Wang C, Guo Z: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes.
[2] Donny Soh, Difeng Dong1, Yike Guo, Limsoon Wong Finding consistent disease subnetworks across microarray datasets
![Page 21: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649f055503460f94c19cd3/html5/thumbnails/21.jpg)
21
THANK YOU!!