pathway based omics data classification

25
Pathway based OMICs data classification Bioinformatics - 2016/2017

Upload: luca-vitale

Post on 22-Jan-2018

30 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Pathway based OMICs data classification

PathwaybasedOMICsdataclassification

Bioinformatics- 2016/2017

Page 2: Pathway based OMICs data classification

Goals

• Classificationwithpathways- Groupofgenesthatareinvolvedinthesamebiologicalfunctions

• Identifyrelationsamongpathways

• BuildagraphofinteractionsbetweenpathwaysandmiRNAs

Page 3: Pathway based OMICs data classification

Data

• BreastCancer(BC)• 151patients

• RNA(20501genes)• miRNA(1046)

• 4classes• LumA - 55• LumB - 59• Basal- 24• Her2- 13

PathwaysbyMSigDBà KEGG,Reactome,Biocarta,C6…

• Glioma• 167patients

• RNA(12042genes)• miRNA(534)

• 4classes• Proneural - 52• Classical- 37• Mesenchymal- 54• Neural- 24

Page 4: Pathway based OMICs data classification

Introduction

DiscriminantFuzzyPatterns

EnrichmentAnalysis

ClassificationlinearSVM

PermutationTest

Genes

miRNAsMSigDB

InteractionGraph

Page 5: Pathway based OMICs data classification

Firststep

• BCDatasetà TrainingSet(70%)andTestSet(30%)

• GliomaDatasetà TrainingSet(75%)andTestSet(25%)

Page 6: Pathway based OMICs data classification

Featureselection

• Discriminantfuzzypattern• Toomanyfeaturesà Identifydiscriminantgenes

• Enrichmentà Groupinggenesinpathways(MSigDB)• IdentifywhichpathwaysaresignificantlyrepresentedbythegenesselectedwiththeDFPalgorithm

Page 7: Pathway based OMICs data classification

DiscriminantFuzzyPattern– Gridsearch(BC)

• Skipfactorà 0,1,2,3• Factortoskipoutliers.Lowervalueàmorevaluesskipped(0:don’tskip)

• Zetaà 0.35,0.4,0.45,0.5• Threshold used inthemembership functions tolabel thefloatvalues withadiscretevalue

• piValà 0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8• Percentage ofvalues ofaclass todetermine thefuzzy patterns

• Overlappingà 1,2• Determines thenumber ofdiscretelabels

• Genes after DFPà 578withSkip Factor 1,Zeta0.4,piVal 0.65andOverlapping 1

Page 8: Pathway based OMICs data classification

DiscriminantFuzzyPattern– Gridsearch(Glioma)• Skipfactorà 1,2,3• Zetaà 0.35,0.4,0.45,0.5• piValà 0.6,0.65,0.7• Overlappingà 1,2

Genes after DFPà 635withSkip Factor 1,Zeta0.35,piVal 0.65andOverlapping 1

Page 9: Pathway based OMICs data classification

Enrichment

• BreastCancer• Numberofpathwaysselectedthroughenrichment:1585• Numberofpathwayswithmorethan10genes:859

• Glioma• Numberofpathwaysbyenrichment:1612p-value0.0001• First1000pathwayswithmorethan10genesandlowestp-value

Page 10: Pathway based OMICs data classification

ClassificationwithSVM

• Linear SVM

• Two level cross-validation• 3outer folds• 2inner folds

• C:1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5,1e6

Page 11: Pathway based OMICs data classification

FirststepofclassificationPatients

gene

spathway

1

gene

spathway

2

gene

spathway

i

gene

spathway

3

LinearSVM1

LinearSVM2

LinearSVM3

LinearSVMi

Classprob.

Patie

nts

Patie

nts

Patie

nts

Patie

nts

Classprob.

Classprob.

Classprob.

Page 12: Pathway based OMICs data classification

Sizevs PathwaysAccuracy(BC)

Correlation:0.028

Page 13: Pathway based OMICs data classification

SizevsPathwaysAccuracy(Glioma)

Correlation:0.342

Page 14: Pathway based OMICs data classification

Pathwaysafterpermutationtest

• 1000permutationtestsonthepathwayswithbestaccuracies

• Breastcancer• Numberofpathwaysthatpassedpermutationtest:36

• Lowestaccuracy77.9%• Highestaccuracy84.6%

• Glioma• Numberofpathwaysthatpassedpermutationtest:278

• Lowestaccuracy80%• Highestaccuracy88%

Page 15: Pathway based OMICs data classification

• ACEVEDO_FGFR1_TARGETS_IN_PROSTATE_CANCER_MODEL_UP

• DEBIASI_APOPTOSIS_BY_REOVIRUS_INFECTION_DN

• DELACROIX_RARG_BOUND_MEF

• ENK_UV_RESPONSE_EPIDERMIS_UP

• ENK_UV_RESPONSE_KERATINOCYTE_DN

• FARMER_BREAST_CANCER_APOCRINE_VS_BASAL

• GO_CELLULAR_RESPONSE_TO_LIPID

• GO_CIRCULATORY_SYSTEM_PROCESS

• GO_GLAND_DEVELOPMENT

• GO_REGIONALIZATION

• GO_REGULATION_OF_CELL_CYCLE_PHASE_TRANSITION

• GO_REGULATION_OF_PROTEIN_SERINE_THREONINE_KINASE_ACTIVITY

• GO_RESPONSE_TO_ALCOHOL

• GO_RESPONSE_TO_ESTROGEN

• GO_RESPONSE_TO_STEROID_HORMONE

• GO_UROGENITAL_SYSTEM_DEVELOPMENT

• GSE1460_NAIVE_CD4_TCELL_ADULT_BLOOD_VS_THYMIC_STROMAL_CELL_DN

• GSE21927_SPLEEN_VS_4T1_TUMOR_MONOCYTE_BALBC_DN

• GSE23502_WT_VS_HDC_KO_MYELOID_DERIVED_SUPPRESSOR_CELL_COLON_TUMOR_DN

• GSE26351_WNT_VS_BMP_PATHWAY_STIM_HEMATOPOIETIC_PROGENITORS_UP

• HALLMARK_ESTROGEN_RESPONSE_LATE

• LEI_MYB_TARGETS

• LIU_PROSTATE_CANCER_DN

• MODULE_18

• MODULE_255

• MODULE_52

• NFE2L2.V2

• SATO_SILENCED_BY_METHYLATION_IN_PANCREATIC_CANCER_1

• SHEN_SMARCA2_TARGETS_DN

• SMID_BREAST_CANCER_RELAPSE_IN_BONE_DN

• V$ALPHACP1_01

• V$TEF1_Q6

• V$ZIC2_01

• VANTVEER_BREAST_CANCER_ESR1_DN

• VECCHI_GASTRIC_CANCER_EARLY_DN

• ZHANG_BREAST_CANCER_PROGENITORS_UP

BCPathways

Page 16: Pathway based OMICs data classification

GliomaPathways

• MEISSNER_NPC_HCP_WITH_H3K4ME2 • YYCATTCAWW_UNKNOWN • RIGGI_EWING_SARCOMA_PROGENITOR_UP • DEURIG_T_CELL_PROLYMPHOCYTIC_LEUKEMIA_DN

• MODULE_169 • GO_REGULATION_OF_MEMBRANE_POTENTIAL • GSE24574_BCL6_LOW_TFH_VS_NAIVE_CD4_TCELL_UP

• GSE25677_MPL_VS_R848_STIM_BCELL_DN • REACTOME_AXON_GUIDANCE • MODULE_19• HELLER_HDAC_TARGETS_SILENCED_BY_METHYLATION_UP

• GO_ACTIN_BINDING• GSE3982_EOSINOPHIL_VS_BASOPHIL_UP• GSE3982_MAC_VS_TH2_UP • V$TATA_C • GO_REGULATION_OF_ANATOMICAL_STRUCTURE_SIZE

• MODULE_52 • SCHAEFFER_PROSTATE_DEVELOPMENT_48HR_UP•DAVICIONI_TARGETS_OF_PAX_FOXO1_FUSIONS_UP

•DEURIG_T_CELL_PROLYMPHOCYTIC_LEUKEMIA_UP

• GSE21063_CTRL_VS_ANTI_IGM_STIM_BCELL_NFATC1_KO_16H_UP

• KAECH_NAIVE_VS_MEMORY_CD8_TCELL_DN• SANSOM_APC_TARGETS_DN • GO_SINGLE_ORGANISM_CELL_ADHESION• HOLLMANN_APOPTOSIS_VIA_CD40_DN

Page 17: Pathway based OMICs data classification

• GSE22025_TGFB1_VS_TGFB1_AND_PROGESTERONE_TREATED_CD4_TCELL_DN

• GO_CELL_SUBSTRATE_JUNCTION

• GSE3982_NEUTROPHIL_VS_EFF_MEMORY_CD4_TCELL_UP

• HIRSCH_CELLULAR_TRANSFORMATION_SIGNATURE_UP

• GSE21927_SPLENIC_C26GM_TUMOROUS_VS_BONE_MARROW_MONOCYTES_DN

• GSE3982_BASOPHIL_VS_CENT_MEMORY_CD4_TCELL_UP

• MCBRYAN_PUBERTAL_BREAST_4_5WK_UP

• GO_CELL_CELL_JUNCTION

• GSE13411_NAIVE_BCELL_VS_PLASMA_CELL_UP

• GO_AXON

• GO_REGULATION_OF_INTRACELLULAR_PROTEIN_TRANSPORT

• GO_TELENCEPHALON_DEVELOPMENT

• GSE13484_UNSTIM_VS_12H_YF17D_VACCINE_STIM_PBMC_DN

• LEF1_UP.V1_DN

• CASORELLI_ACUTE_PROMYELOCYTIC_LEUKEMIA_UP

• GO_ACTIVATION_OF_IMMUNE_RESPONSE

• GO_EPITHELIAL_CELL_DIFFERENTIATION

• GO_POSITIVE_REGULATION_OF_CELL_ADHESION

• GSE15735_2H_VS_12H_HDAC_INHIBITOR_TREATED_CD4_TCELL_UP

• MODULE_8

• BLALOCK_ALZHEIMERS_DISEASE_INCIPIENT_UP

• GO_DENDRITE

• GSE3982_CENT_MEMORY_CD4_TCELL_VS_TH2_UP

• KIM_WT1_TARGETS_UP

• GO_REGULATION_OF_NEURON_PROJECTION_DEVELOPMENT

GliomaPathways

Page 18: Pathway based OMICs data classification

Graph- (1)

• BuildinteractiongraphbetweenpathwayandmiRNAcommunities

• Wefirstcomputeinteractionsbetweenpathways• InteractionScorematrix

• WethenaddmiRNAsconnectingthemtopathways• CorrelationmatrixbetweenmiRNAsandgenes• Fisher'sexacttest

• WeaddedgesbetweenmiRNAs• Weightednetworkprojection

Page 19: Pathway based OMICs data classification

Graph- (2)

• GroupmiRNAsincommunities• Walktrap algorithm

• WereplacemiRNAswithnodesrepresentingmiRNAcommunities

• Wefinallyidentifycommunitiesinthewholeinteractiongraph

Page 20: Pathway based OMICs data classification

InteractionScore

Relationsamongpathways:interactionscore(IS)

!" = |%& −%(|"& + "(

MandSarerespectivelymeanandstandarddeviationofthetwopathwaysxandy

Weapplyacutoffontheresultinginteractionmatrix

Page 21: Pathway based OMICs data classification

miRNAandPathwaysinteraction

• WeevaluatePearsoncorrelationbetweenthemiRNAandallthegenesinthepathway.Wethenapplyacutofftoselectstrongcorrelations.

• ThenforeachmiRNAandpathwayweuseFisher’sexacttest,todetermineifthemiRNAissignificantlylinkedtothepathway(i.e.wecheckifthereisasignificantnumberofgenesincommon)

Page 22: Pathway based OMICs data classification

TheGoal

Page 23: Pathway based OMICs data classification

TheGoal(2)

Page 24: Pathway based OMICs data classification

FinalclassificationPatients

Stacking

with

pathw

ay

SVM Classes foreach patient

Patients

gene

spathways

LinearSVM

LinearSVM

LinearSVM

LinearSVM

Classprob.

Patie

nts

Patients

miRNA

sand

pathw

ayconn

ected

LinearSVM

LinearSVM

LinearSVM

LinearSVM

Classprob.

Patie

nts

Patients

Stacking

with

pathway

andmiRNA

s

SVM Classes foreach patient

Page 25: Pathway based OMICs data classification

Fine