inference of patient-speci c pathway activities from multi

Inference of patient-specific pathway activitiesfrom multi-dimensional cancer genomics data

using PARADIGM. Bioinformatics, 2010

C.J.Vaske et al.

May 22, 2013

Presented by: Rami Eitan

Complex Genomic Rearrangements

I Cancer tissue experience molecular changesI Varied genomic data available

I copy number variationsI mutations, gene expression

I Stratification of cancers can improve:I diagnosisI prognosisI risk assessmentI response to treatment

Complex Genomic Rearrangements

I Genetic alterations differ between patients

I Pathways often are common

Pathways

I What is a pathway?

Figure : The P53 pathway

Pathways

I A set of interactions between entities, logically groupedtogether around a biological process.

I Protein-coding genes, small molecules, complexes, genefamilies, abstract processes

I Available databases: Reactome, KEGG, NCI

Motivation

• Integra+ve analysis of cancer genome data – Copy number varia+ons, gene expressions

• Leverage pathway informa+on to find frequently occurring pathway perturba+ons – NCI pathway interac+on database, KEGG etc.

Observed Data

Figure : Gene expression

Figure : Copy number

Motivation

• Pathway informa+on contains informa+on on how genes are supposed to behave

Input

I Infer integrated pathway activity (IPA)

I Produce a matrix A. Aij is the inferred activity of entity i inpatient j

PARADIGM

Factor graph

I Factor graph is a probabilistic graphical model.

I Variables, factors.

Figure : A simple factor graph

PARADIGM Model

• Factor graph representa+on of various en++es corresponding to a single gene

PARADIGM Model: Gene Interactions

PARADIGM Model:

• A factor graph for a pathway

Model Specification

• Convert an NCI pathway into a factor graph – NCI pathway to Bayesian network

• Directed network • Each variable takes values of -‐1 (de-‐ac+va+on), 0 (normal), 1 (ac+va+on) – mRNA: over expression for ac+va+on

– Copy number varia+ons: more than two copies for ac+va+ons

• Probability distribu+on of each node – Labeled edges for posi+ve/nega+ve interac+ons – Set the value of the child node as weighted votes from its parents

Model Specification

• Conver+ng the Bayesian network to a factor graph – Assign a factor to each group of variables consis+ng of a node and its

parents

• Z: normaliza+on constant

• ε = 0.001

Inference

I Observed variables: copy number variations, gene expressions

I Unobserved variables: protein, protein activity, overallpathway activity state

I Learn models with EM algorithmI E step: Infer the probabilities of the unobserved variablesI M step: Change parameters to to maximize the likelihood

given the probabilities

Expectation Maximization

Figure : EM algorithm

Log-likelihood Ratio Test

• Test sta+s+c for assessing en+ty i’s ac+vity given data D

– The probabili+es can be obtained by performing inference on the factor graph

Significance assessment

I Permutate the labels of the observed data

I ’Within’ permutation: choosing random genes from the samepathway

I ’Any’ permutation: choosing any random genes

I 1000 permutations of each type are used to determine nulldistribution

Decoy paths

I Create decoy paths by replacing genes with random genes

I Maintain the same structure

I All complexes and abstract processes remain the same

Log-likelihood Ratio Test

• Aggrega+ng over mul+ple values en+ty i takes

Dataset

• Breast cancer copy number and gene expression data

• TCGA Glioblastoma copy number and gene expression data

• Pathways from NCI pathway interac+on database (PID)

Results - breast cancer

I Breast Cancer dataset:I 56172 IPA’s (7%) found to be significantly higherI 497 significant entities per patient on averageI 103 out of 127 pathways had at least one entity altered in 20%

or more of the patients

Results - GBM

I GBM dataset:I 141682 IPA’s (9%) found to be significantly higherI 616 significant entities per patient on averageI 110 out of 127 pathways had at least one entity altered in 20%

or more of the patients

EM Convergence

• Original data vs. permuted data

Red: real data Green: permuted data

Results - decoy paths

Distinguishing decoy from real pathways

Figure : PARADIGM vs SPIA: FP rate

Results - decoy paths

I Distinguishing decoy from real pathwaysI Breast cancer AUC:

I PARADIGM: 0.669I SPIA: 0.602

I GBM AUC:I PARADIGM: 0.642I SPIA: 0.604

Top PARADIGM Pathways of Breast Cancer

Top PARADIGM Pathways of Glioblastoma

Glioblastoma Subtypes

Survival Rates for Each Subtypes

Results - Patient vs permutation

Figure : Patient vs permuted IPA’s

Results - Patient vs permutation

Figure : Patient vs permuted IPA’s. Source: BroadInstitute/Dana-Farber Cancer Institute/Harvard Medical School

Summary

• PARADIGM integrates different types of data, including gene-‐expression, copy number varia+on, and pathway database, in order to infer pathway ac+vi+es for individual cancer pa+ents. – Factor graph model for represen+ng pathway and modeling datasets

– Pathway ac+vi+es inferred by PARADIGM can be used to iden+fy cancer subtypes

Questions

Discussion

I Can the method be successfully expanded to more observeddata?

I Instead of using the pathways as is, can this method be usedto find new pathways and interactions?

inference of patient-speci c pathway activities from multi

Documents