pathway based omics data classification
TRANSCRIPT
PathwaybasedOMICsdataclassification
Bioinformatics- 2016/2017
Goals
• Classificationwithpathways- Groupofgenesthatareinvolvedinthesamebiologicalfunctions
• Identifyrelationsamongpathways
• BuildagraphofinteractionsbetweenpathwaysandmiRNAs
Data
• BreastCancer(BC)• 151patients
• RNA(20501genes)• miRNA(1046)
• 4classes• LumA - 55• LumB - 59• Basal- 24• Her2- 13
PathwaysbyMSigDBà KEGG,Reactome,Biocarta,C6…
• Glioma• 167patients
• RNA(12042genes)• miRNA(534)
• 4classes• Proneural - 52• Classical- 37• Mesenchymal- 54• Neural- 24
Introduction
DiscriminantFuzzyPatterns
EnrichmentAnalysis
ClassificationlinearSVM
PermutationTest
Genes
miRNAsMSigDB
InteractionGraph
Firststep
• BCDatasetà TrainingSet(70%)andTestSet(30%)
• GliomaDatasetà TrainingSet(75%)andTestSet(25%)
Featureselection
• Discriminantfuzzypattern• Toomanyfeaturesà Identifydiscriminantgenes
• Enrichmentà Groupinggenesinpathways(MSigDB)• IdentifywhichpathwaysaresignificantlyrepresentedbythegenesselectedwiththeDFPalgorithm
DiscriminantFuzzyPattern– Gridsearch(BC)
• Skipfactorà 0,1,2,3• Factortoskipoutliers.Lowervalueàmorevaluesskipped(0:don’tskip)
• Zetaà 0.35,0.4,0.45,0.5• Threshold used inthemembership functions tolabel thefloatvalues withadiscretevalue
• piValà 0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8• Percentage ofvalues ofaclass todetermine thefuzzy patterns
• Overlappingà 1,2• Determines thenumber ofdiscretelabels
• Genes after DFPà 578withSkip Factor 1,Zeta0.4,piVal 0.65andOverlapping 1
DiscriminantFuzzyPattern– Gridsearch(Glioma)• Skipfactorà 1,2,3• Zetaà 0.35,0.4,0.45,0.5• piValà 0.6,0.65,0.7• Overlappingà 1,2
Genes after DFPà 635withSkip Factor 1,Zeta0.35,piVal 0.65andOverlapping 1
Enrichment
• BreastCancer• Numberofpathwaysselectedthroughenrichment:1585• Numberofpathwayswithmorethan10genes:859
• Glioma• Numberofpathwaysbyenrichment:1612p-value0.0001• First1000pathwayswithmorethan10genesandlowestp-value
ClassificationwithSVM
• Linear SVM
• Two level cross-validation• 3outer folds• 2inner folds
• C:1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5,1e6
FirststepofclassificationPatients
gene
spathway
1
gene
spathway
2
gene
spathway
i
gene
spathway
3
LinearSVM1
LinearSVM2
LinearSVM3
LinearSVMi
Classprob.
Patie
nts
Patie
nts
Patie
nts
Patie
nts
Classprob.
Classprob.
Classprob.
Sizevs PathwaysAccuracy(BC)
Correlation:0.028
SizevsPathwaysAccuracy(Glioma)
Correlation:0.342
Pathwaysafterpermutationtest
• 1000permutationtestsonthepathwayswithbestaccuracies
• Breastcancer• Numberofpathwaysthatpassedpermutationtest:36
• Lowestaccuracy77.9%• Highestaccuracy84.6%
• Glioma• Numberofpathwaysthatpassedpermutationtest:278
• Lowestaccuracy80%• Highestaccuracy88%
• ACEVEDO_FGFR1_TARGETS_IN_PROSTATE_CANCER_MODEL_UP
• DEBIASI_APOPTOSIS_BY_REOVIRUS_INFECTION_DN
• DELACROIX_RARG_BOUND_MEF
• ENK_UV_RESPONSE_EPIDERMIS_UP
• ENK_UV_RESPONSE_KERATINOCYTE_DN
• FARMER_BREAST_CANCER_APOCRINE_VS_BASAL
• GO_CELLULAR_RESPONSE_TO_LIPID
• GO_CIRCULATORY_SYSTEM_PROCESS
• GO_GLAND_DEVELOPMENT
• GO_REGIONALIZATION
• GO_REGULATION_OF_CELL_CYCLE_PHASE_TRANSITION
• GO_REGULATION_OF_PROTEIN_SERINE_THREONINE_KINASE_ACTIVITY
• GO_RESPONSE_TO_ALCOHOL
• GO_RESPONSE_TO_ESTROGEN
• GO_RESPONSE_TO_STEROID_HORMONE
• GO_UROGENITAL_SYSTEM_DEVELOPMENT
• GSE1460_NAIVE_CD4_TCELL_ADULT_BLOOD_VS_THYMIC_STROMAL_CELL_DN
• GSE21927_SPLEEN_VS_4T1_TUMOR_MONOCYTE_BALBC_DN
• GSE23502_WT_VS_HDC_KO_MYELOID_DERIVED_SUPPRESSOR_CELL_COLON_TUMOR_DN
• GSE26351_WNT_VS_BMP_PATHWAY_STIM_HEMATOPOIETIC_PROGENITORS_UP
• HALLMARK_ESTROGEN_RESPONSE_LATE
• LEI_MYB_TARGETS
• LIU_PROSTATE_CANCER_DN
• MODULE_18
• MODULE_255
• MODULE_52
• NFE2L2.V2
• SATO_SILENCED_BY_METHYLATION_IN_PANCREATIC_CANCER_1
• SHEN_SMARCA2_TARGETS_DN
• SMID_BREAST_CANCER_RELAPSE_IN_BONE_DN
• V$ALPHACP1_01
• V$TEF1_Q6
• V$ZIC2_01
• VANTVEER_BREAST_CANCER_ESR1_DN
• VECCHI_GASTRIC_CANCER_EARLY_DN
• ZHANG_BREAST_CANCER_PROGENITORS_UP
BCPathways
GliomaPathways
• MEISSNER_NPC_HCP_WITH_H3K4ME2 • YYCATTCAWW_UNKNOWN • RIGGI_EWING_SARCOMA_PROGENITOR_UP • DEURIG_T_CELL_PROLYMPHOCYTIC_LEUKEMIA_DN
• MODULE_169 • GO_REGULATION_OF_MEMBRANE_POTENTIAL • GSE24574_BCL6_LOW_TFH_VS_NAIVE_CD4_TCELL_UP
• GSE25677_MPL_VS_R848_STIM_BCELL_DN • REACTOME_AXON_GUIDANCE • MODULE_19• HELLER_HDAC_TARGETS_SILENCED_BY_METHYLATION_UP
• GO_ACTIN_BINDING• GSE3982_EOSINOPHIL_VS_BASOPHIL_UP• GSE3982_MAC_VS_TH2_UP • V$TATA_C • GO_REGULATION_OF_ANATOMICAL_STRUCTURE_SIZE
• MODULE_52 • SCHAEFFER_PROSTATE_DEVELOPMENT_48HR_UP•DAVICIONI_TARGETS_OF_PAX_FOXO1_FUSIONS_UP
•DEURIG_T_CELL_PROLYMPHOCYTIC_LEUKEMIA_UP
• GSE21063_CTRL_VS_ANTI_IGM_STIM_BCELL_NFATC1_KO_16H_UP
• KAECH_NAIVE_VS_MEMORY_CD8_TCELL_DN• SANSOM_APC_TARGETS_DN • GO_SINGLE_ORGANISM_CELL_ADHESION• HOLLMANN_APOPTOSIS_VIA_CD40_DN
• GSE22025_TGFB1_VS_TGFB1_AND_PROGESTERONE_TREATED_CD4_TCELL_DN
• GO_CELL_SUBSTRATE_JUNCTION
• GSE3982_NEUTROPHIL_VS_EFF_MEMORY_CD4_TCELL_UP
• HIRSCH_CELLULAR_TRANSFORMATION_SIGNATURE_UP
• GSE21927_SPLENIC_C26GM_TUMOROUS_VS_BONE_MARROW_MONOCYTES_DN
• GSE3982_BASOPHIL_VS_CENT_MEMORY_CD4_TCELL_UP
• MCBRYAN_PUBERTAL_BREAST_4_5WK_UP
• GO_CELL_CELL_JUNCTION
• GSE13411_NAIVE_BCELL_VS_PLASMA_CELL_UP
• GO_AXON
• GO_REGULATION_OF_INTRACELLULAR_PROTEIN_TRANSPORT
• GO_TELENCEPHALON_DEVELOPMENT
• GSE13484_UNSTIM_VS_12H_YF17D_VACCINE_STIM_PBMC_DN
• LEF1_UP.V1_DN
• CASORELLI_ACUTE_PROMYELOCYTIC_LEUKEMIA_UP
• GO_ACTIVATION_OF_IMMUNE_RESPONSE
• GO_EPITHELIAL_CELL_DIFFERENTIATION
• GO_POSITIVE_REGULATION_OF_CELL_ADHESION
• GSE15735_2H_VS_12H_HDAC_INHIBITOR_TREATED_CD4_TCELL_UP
• MODULE_8
• BLALOCK_ALZHEIMERS_DISEASE_INCIPIENT_UP
• GO_DENDRITE
• GSE3982_CENT_MEMORY_CD4_TCELL_VS_TH2_UP
• KIM_WT1_TARGETS_UP
• GO_REGULATION_OF_NEURON_PROJECTION_DEVELOPMENT
GliomaPathways
Graph- (1)
• BuildinteractiongraphbetweenpathwayandmiRNAcommunities
• Wefirstcomputeinteractionsbetweenpathways• InteractionScorematrix
• WethenaddmiRNAsconnectingthemtopathways• CorrelationmatrixbetweenmiRNAsandgenes• Fisher'sexacttest
• WeaddedgesbetweenmiRNAs• Weightednetworkprojection
Graph- (2)
• GroupmiRNAsincommunities• Walktrap algorithm
• WereplacemiRNAswithnodesrepresentingmiRNAcommunities
• Wefinallyidentifycommunitiesinthewholeinteractiongraph
InteractionScore
Relationsamongpathways:interactionscore(IS)
!" = |%& −%(|"& + "(
MandSarerespectivelymeanandstandarddeviationofthetwopathwaysxandy
Weapplyacutoffontheresultinginteractionmatrix
miRNAandPathwaysinteraction
• WeevaluatePearsoncorrelationbetweenthemiRNAandallthegenesinthepathway.Wethenapplyacutofftoselectstrongcorrelations.
• ThenforeachmiRNAandpathwayweuseFisher’sexacttest,todetermineifthemiRNAissignificantlylinkedtothepathway(i.e.wecheckifthereisasignificantnumberofgenesincommon)
TheGoal
TheGoal(2)
FinalclassificationPatients
Stacking
with
pathw
ay
SVM Classes foreach patient
Patients
gene
spathways
LinearSVM
LinearSVM
LinearSVM
LinearSVM
Classprob.
Patie
nts
Patients
miRNA
sand
pathw
ayconn
ected
LinearSVM
LinearSVM
LinearSVM
LinearSVM
Classprob.
Patie
nts
Patients
Stacking
with
pathway
andmiRNA
s
SVM Classes foreach patient
Fine