inference of patient-speci c pathway activities from multi

37
Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010 C.J.Vaske et al. May 22, 2013 Presented by: Rami Eitan

Upload: others

Post on 01-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inference of patient-speci c pathway activities from multi

Inference of patient-specific pathway activitiesfrom multi-dimensional cancer genomics data

using PARADIGM. Bioinformatics, 2010

C.J.Vaske et al.

May 22, 2013

Presented by: Rami Eitan

Page 2: Inference of patient-speci c pathway activities from multi

Complex Genomic Rearrangements

I Cancer tissue experience molecular changesI Varied genomic data available

I copy number variationsI mutations, gene expression

I Stratification of cancers can improve:I diagnosisI prognosisI risk assessmentI response to treatment

Page 3: Inference of patient-speci c pathway activities from multi

Complex Genomic Rearrangements

I Genetic alterations differ between patients

I Pathways often are common

Page 4: Inference of patient-speci c pathway activities from multi

Pathways

I What is a pathway?

Figure : The P53 pathway

Page 5: Inference of patient-speci c pathway activities from multi

Pathways

I A set of interactions between entities, logically groupedtogether around a biological process.

I Protein-coding genes, small molecules, complexes, genefamilies, abstract processes

I Available databases: Reactome, KEGG, NCI

Page 6: Inference of patient-speci c pathway activities from multi

Motivation

•  Integra+ve  analysis  of  cancer  genome  data  –  Copy  number  varia+ons,  gene  expressions  

•  Leverage  pathway  informa+on  to  find  frequently  occurring  pathway  perturba+ons  –  NCI  pathway  interac+on  database,  KEGG  etc.  

Page 7: Inference of patient-speci c pathway activities from multi

Observed Data

Figure : Gene expression

Figure : Copy number

Page 8: Inference of patient-speci c pathway activities from multi

Motivation

•  Pathway  informa+on  contains  informa+on  on  how  genes  are  supposed  to  behave    

Page 9: Inference of patient-speci c pathway activities from multi

Input

I Infer integrated pathway activity (IPA)

I Produce a matrix A. Aij is the inferred activity of entity i inpatient j

Page 10: Inference of patient-speci c pathway activities from multi

PARADIGM

Page 11: Inference of patient-speci c pathway activities from multi

Factor graph

I Factor graph is a probabilistic graphical model.

I Variables, factors.

Figure : A simple factor graph

Page 12: Inference of patient-speci c pathway activities from multi

PARADIGM Model

•  Factor  graph  representa+on  of  various  en++es  corresponding  to  a  single  gene  

Page 13: Inference of patient-speci c pathway activities from multi

PARADIGM Model: Gene Interactions

Page 14: Inference of patient-speci c pathway activities from multi

PARADIGM Model:

•  A  factor  graph  for  a  pathway  

Page 15: Inference of patient-speci c pathway activities from multi

Model Specification

•  Convert  an  NCI  pathway  into  a  factor  graph  –  NCI  pathway  to  Bayesian  network  

•  Directed  network  •  Each  variable  takes  values  of  -­‐1  (de-­‐ac+va+on),  0  (normal),  1  (ac+va+on)  – mRNA:  over  expression  for  ac+va+on  

–  Copy  number  varia+ons:  more  than  two  copies  for  ac+va+ons  

•  Probability  distribu+on  of  each  node  –  Labeled  edges  for  posi+ve/nega+ve  interac+ons    –  Set  the  value  of  the  child  node  as  weighted  votes  from  its  parents  

Page 16: Inference of patient-speci c pathway activities from multi

Model Specification

•  Conver+ng  the  Bayesian  network  to  a  factor  graph  –  Assign  a  factor  to  each  group  of  variables  consis+ng  of  a  node  and  its  

parents  

•  Z:  normaliza+on  constant  

•  ε  =  0.001  

Page 17: Inference of patient-speci c pathway activities from multi

Inference

I Observed variables: copy number variations, gene expressions

I Unobserved variables: protein, protein activity, overallpathway activity state

I Learn models with EM algorithmI E step: Infer the probabilities of the unobserved variablesI M step: Change parameters to to maximize the likelihood

given the probabilities

Page 18: Inference of patient-speci c pathway activities from multi

Expectation Maximization

Figure : EM algorithm

Page 19: Inference of patient-speci c pathway activities from multi

Log-likelihood Ratio Test

•  Test  sta+s+c  for  assessing  en+ty  i’s  ac+vity  given  data  D  

–  The  probabili+es  can  be  obtained  by  performing  inference  on  the  factor  graph    

Page 20: Inference of patient-speci c pathway activities from multi

Significance assessment

I Permutate the labels of the observed data

I ’Within’ permutation: choosing random genes from the samepathway

I ’Any’ permutation: choosing any random genes

I 1000 permutations of each type are used to determine nulldistribution

Page 21: Inference of patient-speci c pathway activities from multi

Decoy paths

I Create decoy paths by replacing genes with random genes

I Maintain the same structure

I All complexes and abstract processes remain the same

Page 22: Inference of patient-speci c pathway activities from multi

Log-likelihood Ratio Test

•  Aggrega+ng  over  mul+ple  values  en+ty  i  takes  

Page 23: Inference of patient-speci c pathway activities from multi

Dataset

•  Breast  cancer  copy  number  and  gene  expression  data  

•  TCGA  Glioblastoma  copy  number  and  gene  expression  data  

•  Pathways  from  NCI  pathway  interac+on  database  (PID)    

Page 24: Inference of patient-speci c pathway activities from multi

Results - breast cancer

I Breast Cancer dataset:I 56172 IPA’s (7%) found to be significantly higherI 497 significant entities per patient on averageI 103 out of 127 pathways had at least one entity altered in 20%

or more of the patients

Page 25: Inference of patient-speci c pathway activities from multi

Results - GBM

I GBM dataset:I 141682 IPA’s (9%) found to be significantly higherI 616 significant entities per patient on averageI 110 out of 127 pathways had at least one entity altered in 20%

or more of the patients

Page 26: Inference of patient-speci c pathway activities from multi

EM Convergence

•  Original  data  vs.  permuted  data  

Red:  real  data  Green:  permuted  data  

Page 27: Inference of patient-speci c pathway activities from multi

Results - decoy paths

Distinguishing decoy from real pathways

Figure : PARADIGM vs SPIA: FP rate

Page 28: Inference of patient-speci c pathway activities from multi

Results - decoy paths

I Distinguishing decoy from real pathwaysI Breast cancer AUC:

I PARADIGM: 0.669I SPIA: 0.602

I GBM AUC:I PARADIGM: 0.642I SPIA: 0.604

Page 29: Inference of patient-speci c pathway activities from multi

Top PARADIGM Pathways of Breast Cancer

Page 30: Inference of patient-speci c pathway activities from multi

Top PARADIGM Pathways of Glioblastoma

Page 31: Inference of patient-speci c pathway activities from multi

Glioblastoma Subtypes

Page 32: Inference of patient-speci c pathway activities from multi

Survival Rates for Each Subtypes

Page 33: Inference of patient-speci c pathway activities from multi

Results - Patient vs permutation

Figure : Patient vs permuted IPA’s

Page 34: Inference of patient-speci c pathway activities from multi

Results - Patient vs permutation

Figure : Patient vs permuted IPA’s. Source: BroadInstitute/Dana-Farber Cancer Institute/Harvard Medical School

Page 35: Inference of patient-speci c pathway activities from multi

Summary

•  PARADIGM  integrates  different  types  of  data,  including  gene-­‐expression,  copy  number  varia+on,  and  pathway  database,  in  order  to  infer  pathway  ac+vi+es  for  individual  cancer  pa+ents.  –  Factor  graph  model  for  represen+ng  pathway  and  modeling  datasets  

–  Pathway  ac+vi+es  inferred  by  PARADIGM  can  be  used  to  iden+fy  cancer  subtypes  

Page 36: Inference of patient-speci c pathway activities from multi

Questions

Page 37: Inference of patient-speci c pathway activities from multi

Discussion

I Can the method be successfully expanded to more observeddata?

I Instead of using the pathways as is, can this method be usedto find new pathways and interactions?