netbiosig2012 joshstuart
DESCRIPTION
Keynote lecture for NetBio SIG 2012TRANSCRIPT
Predicting the impact of mutations using pathway-
guided integrative genomicsNetwork Biology SIG, ISMB 2012
Josh Stuart, UC Santa Cruz
July 12, 2012
Overview of pathway-guided approach
Integrate many data sources to gain accurate view of how genes are functioning in pathways
Predict the functional consequences of mutations by quantifying the effect on the surrounding pathway
Use pathway signatures to implicate mutations in novel genes to (re-)focus targeting
Identify critical “Achilles Heels” in the pathways that distinguish a particular sub-type
Flood of Data Analysis Challenges
Expression
DNA Methylation
StructuralVariation
ExomeSequences
Copy NumberAlterations
Multiple, PossiblyConflicting Signals
This is What itDoes to You
Genomics, Functional Genomics, Metabolomics, Epigenomics =
Analysis of disease samples like automotive repair
(or detective work or other sleuthing)Patient Sample 1 Patient Sample 2
Patient Sample 3 Patient Sample N
…
Sleuths use as much knowledge as possible.
Much Cell Machinery Known:Gene circuitry now available.
Curated and/or CollectedReactome
KEGGBiocartaNCI-PID
Pathway Commons…
Integration key to interpret gene function
Expression not always an indicator of activityDownstream effects often provide clues
TF
Inference:TF is OFF
(high expressionbut inactive)
TF
Inference:TF is ON
(low-expressionbut active )
TF
Inference:TF is ON(expression
reflectsactivity)
Expression of 3 transcription factors:
high high low
Integration key to interpret gene function
Need multiple data modalities to get it right.
TF
Expression -> TF ON
BUT, targets are amplified
Lowers our beliefin active TF becauseexplained away bycis evidence.
Copy Number -> TF OFF
Probabilistic Graphical Models:A Language for Integrative Genomics
Generalize HMMs, Kalman Filters, Regression, Boolean Nets, etc.Language of probability ties together multiple aspects of gene
function & regulationEnable data-driven discovery of biological mechanismsFoundation: J. Pearl, D. Heckerman, E. Horvitz, G. Cooper, R.
Schacter, D. Koller, N. Friedman, M. Jordan, …Bioinformatics: D. Pe’er, A. Hartemink, E. Segal, E Schadt…
Nir Friedman, Science (2004) - Review
Integration Approach: Detailed models of gene expression and interaction
MDM2
TP53
Integration Approach: Detailed models of expression and interaction
MDM2
TP53
Two Parts:
1. Gene Level Model(central dogma)
2. Interaction Model(regulation)
PARDIGM Gene Model to Integrate Data
Vaske et al. 2010. Bioinformatics
1. Central Dogma-LikeGene Model of Activity
2. Interactions that connect to specific pointsin gene regulation map
Charlie VaskeSteve Benz
Integrated Pathway Analysis for Cancer Integrated dataset for downstream analysis
Inferred activities reflect neighborhood of influence around a gene.
Can boost signal for survival analysis and mutation impact
Multimodal Data
CNV
mRNA
meth
Pathway Modelof Cancer
Cohort Inferred Activities
…
TCGA Ovarian CancerInferred Pathway Activities
Patient Samples (247)Pa
thw
ay C
once
pts
(867
)
TCGA Network. 2011. Nature(lead by Paul Spellman)
Ovarian: FOXM1 pathway alteredin majority of serous ovarian tumors
FOXM1 Transcription Network
Patient Samples (247)
Path
way
Con
cept
s (8
67)
TCGA Network. 2011. Nature(lead by Paul Spellman)
FOXM1 central to cross-talk betweenDNA repair and cell proliferation
in Ovarian Cancer
TCGA Network. 2011. Nature (lead by Paul Spellman)
Pathway signatures of mutations to reveal therapeutic candidates
Mutated genes are the focus of many targeted approaches.
Some patients with “right” mutation don’t respond. Why?
Many cancers have one of several “novel” mutations. Can these be targeted with current approaches?
Pathway-motivated approaches:Identify gain-of-function from loss-of-function.
Compare novel signatures
Sam NgISMB
Oral Poster
PARADIGM-Shift Predicting the Impact of Mutations On Genetic Pathways
FG
Inference using all neighbors
FG
Inference using downstream neighbors
FG
Inference using upstream neighbors
SHIFT
Sam Ng, ECCB 2012
High Inferred Activity
Low Inferred Activity
mutatedgene
FG
PARADIGM-Shift Overview
Sam Ng, ECCB 2012
FG FG
1.IdentifyLocal
Neighbor-hood
PARADIGM-Shift Overview
Sam Ng, ECCB 2012
FG FG
1.IdentifyLocal
Neighbor-hood
FG
FG
2a.Regulators
Run
2b.Targets
Run
PARADIGM-Shift Overview
Sam Ng, ECCB 2012
FG FG
1.IdentifyLocal
Neighbor-hood
FG
FG
2a.Regulators
Run
2b.Targets
Run
P-ShiftScore
3.Calculate
Difference
FGFG -
FG
(LOF)
PARADIGM-Shift Overview
Sam Ng, ECCB 2012
P-Shift Predicts RB1 Loss-of-Function in GBM
Shift Score
PARADIGM downstream
PARADIGM upstream
Expression
Mutation
RB1
Sam Ng, ECCB 2012
RB1 Network (GBM)
Neighbor Gene Key
Activity
Expression
RB1 Mutation
Focus Gene Key
P-Shift
Expression
Mutation
T-Run
R-Run
Sam Ng, ECCB 2012
RB1 Discrepancy Scores distinguish mutated vs non-mutated samples
Signal Score (t-statistic) = -5.78
Sam Ng
RB1 discrepancy distinction is significant
Given the same network topology, how likely would we call a gain/loss of function Background model: permute gene labels in our dataset Compare observed signal score to signal scores (SS) obtained
from background modelObserved SS
Background SS
Sam Ng
TP53 Network
Sam Ng
Gain-of-Function (LUSC)
NFE2L2
P-Shift Score
PARADIGM downstream
PARADIGM upstream
Expression
Mutation
Sam Ng
NFE2L2 Network (LUSC)
Sam Ng
Neighbor Gene Key
Activity
Expression
RB1 Mutation
Focus Gene Key
P-Shift
Expression
Mutation
T-Run
R-Run
NFE2L2
Discrepancy scores are sensitive
Signal Score (t-statistic) = -5.78 Signal Score (t-statistic) = 4.985
RB1 NFE2L2
Signal Score (t-statistic) = -10.94
TP53
Observed SS
Background SS
Sam Ng
Expect passenger mutations to lack shifts
Is the discrepancy specific?Negative control: calculate scores for “passenger” mutations
Passengers:insignificant by MutSig (p > 0.10) well-represented in our pathways
Discrepancy of these “neutral” mutations should be close to what’s expected by chance (from permuted)
Sam Ng
Discrepancies of Passenger Mutationsare NOT distinctive
Sam Ng
PARADIGM-Shift gives orthogonal view of the importance of mutations in
LUSC
Enables probing into infrequent eventsCan detect non-coding mutation impact (pseudo FPs) Can detect presence of pathway compensation for
those seemingly functional mutations (pseudo FPs)Extend beyond mutationsLimited to genes w/ pathway representation
NFE2L2 (29)
CDKN2A (n=30)
Pat
hway
Dis
crep
ancy
LUS
C
MET (n=7) (gefitinib resistance)
HIF3A (n=7)TBC1D4 (n=9) (AKT signaling)
MAP2K6 (n=5)
EIF4G1 (n=20)
GLI2 (n=10) (SHH signaling)
AR (n=8)
Sam Ng
PARADIGM in TCGA patient BRCA tumors
Christina Yau,Buck Inst
Christina Yau, Buck Inst.
PATHMARK: Identify Pathway-based “markers” that underlie sub-types
Identify sub-pathways that distinguish patients sub-types (e.g. mutant vs. non-mutant, response to drug, etc)
Predict mutation impact on pathway “neighborhood”
Identify master control points for drug targeting.
Predict outcomes with quantitative simulations.
Insight from contrast
Ted GoldsteinSam Ng
PathMark: Differential Subnetworks from a “SuperPathway”
Pathway Activities
Pathway Activities
Ted Goldstein Sam Ng
PathMark: Differential Subnetworks from a “SuperPathway”
SuperPathway Activities
SuperPathway Activities
PathwaySignature
Ted Goldstein Sam Ng
Traditional methods treat each gene as a separate feature
Use features reflecting overall pathway activity
Smaller number of features are now fed to predictorsPredictor
PIL: Pathway-informed Learning
Artem Sokolov
Traditional methods treat each gene as a separate feature
Use features reflecting overall pathway activity
Smaller number of features are now fed to predictorsPredictor
PIL: Pathway-informed Learning
Artem Sokolov
ISMB poster
Basal vs. Luminal
Recursive feature elimination: we train an SVM, drop the least important half of features and recurse
The number of times each feature survived the elimination across 100 random splits of data
Artem Sokolov
Methotrexate Sensitivity
Non sub-type specific drug
Artem Sokolov
Pathway involvingthe target of the drug.
One large highly-connectedcomponent (size and connectivity
significant according to permutationanalysis)
Higher activity in ER-
Lower activity in ER-
Triple Negative Breast Pathway MarkersIdentified from 50 Cell Lines
980 pathway concepts1048 interactions
HIF1A/ARNT
Characterized byseveral “hubs’
IL23/JAK2/TYK2
Myc/Max
P53tetramer
ER
FOXA1
Sam Ng, Ted Goldstein
Identify master controllers usingSPIA (signaling pathway impact analysis)
Impactfactor:
Google PageRank for NetworksDetermines affect of a given pathway on each nodeCalculates perturbation factor for each node in the networkTakes into account regulatory logic of interactions.
Yulia Newton (NetBio SIG Poster)
Google’s PageRank-Like
Slight Trick: Run SPIA in reverse
Reverse edges in Super PathwayHigh scoring genes now those at the “top” of
the pathway
Yulia Newton
PageRank findshighly referenced
Reverse to findHighly referencing
Master Controller Analysis on Breast Cell Lines
BasalLuminal
Yulia Newton
Master regulators predict response to drugs:PLK3 predicted as a target for basal breast
UpDown
• DNA damage network is upregulated in basal breast cancers
• Basal breast cancers are sensitive to PLK inhibitors
Basal
Claudin-low
Luminal
GSK-PLKi
Heiser et al. 2011 PNAS
Ng, Goldstein
• HDAC Network is down-regulated in basal breast cancer cell lines
• Basal/CL breast cancers are resistant to HDAC inhibitors
VORINOSTATHDAC inhibitor
HDAC inhibitors predicted for luminal breast
Heiser et al. 2011 PNAS Ng, Goldstein
Connect genomic alterations to downstream expression/activity
• What circuitry connects mutations to transcriptional changes?– Mutations general (epi-) genomic perturbation– Expression activity
• Mutation/perturbation and expression/activity treated as heat diffusing on a network– HotNet, Vandin F, Upfal E, B.J. Raphael, 2008.– HotNet used in ovarian to implicate Notch pathway
• Find subnetworks that link genetic to mRNA and protein-level changes.
?
Evan PaullISMB
Oral Poster
HotLinkGenomic Perturbations(Mutations, Methylation, Focal Copy Number)
Gene Activity(Expression, RPPA, PARADIGM)
?
Evan Paull
HotLinkGenomic Perturbations(Mutations, Methylation, Focal Copy Number)
Gene Activity(Expression, RPPA, PARADIGM)
1. Add heat 2. Diffuse heat 3. Cut out linkers
Evan Paull
HotLinkGenomic Perturbations(Mutations, Methylation, Focal Copy Number)
Gene Activity(Expression, RPPA, PARADIGM)
Linking Sub-Pathway
Evan Paull
HotLink “Double Diffusion” Overview
HotLink Causal Paths
Significance of Interlinking Network
Double diffusion better than single
Basal-LumAHotLink
Basal LumA
TCGA Interlinking Network
Basal-LumA HotLink MapMAP, PI3K, AKT
TP53, RB1
MYC, FOXM1
Basal LumA
AKT/PI3KMAPK8, MAPK14 (p38-alpha)identified as mediators.
TP53, RB1
Basal LumA
MYC Neighborhood
Basal LumA
Double feedback loop involving TP53, RB1, CDK4, FOXM1, and MYC
AKT signaling
Basal LumA
Defining Pathway Signatures for Mutations and Sub-Types
Build a signature for every mutation and tumor/clinical event.
Correlate every signature to each other.Reveals common molecular similarities
between different divisions of patient subgroups
Mutations in novel genes may “phenocopy” mutations in known genes
Ted Goldstein
Mutation Association to PathwaysWhat pathway activities is a mutation’s presence
associated?Can we classify mutations based on these
associations?Mutations
PA
RA
DIG
M S
igna
ture
s
Ted Goldstein
Mutation Association to Pathways
What pathway activities is a mutation’s presence associated?
Can we classify mutations based on these associations?
(Note: CRC figure below; soon for BRCA)
Mutations
PA
RA
DIG
M S
igna
ture
s
APC and other Wnt
Ted Goldstein
Mutation Association to PathwaysWhat pathway activities is a mutation’s presence
associated?Can we classify mutations based on these
associations?Mutations
PA
RA
DIG
M S
igna
ture
s
Ted Goldstein
Mutation Association to PathwaysWhat pathway activities is a mutation’s presence
associated?Can we classify mutations based on these
associations?
(Note: CRC figure below; soon for BRCA)
Mutations
PA
RA
DIG
M S
igna
ture
s
TGFB Pathway mutations
Ted Goldstein
Mutation Association to PathwaysWhat pathway activities is a mutation’s presence
associated?Can we classify mutations based on these
associations?
(Note: CRC figure below; soon for BRCA)
Mutations
PA
RA
DIG
M S
igna
ture
s
PIK3CA, RTK pathway, KRAS
Ted Goldstein
Mutation Association to PathwaysWhat pathway activities is a mutation’s presence
associated?Can we classify mutations based on these
associations?
(Note: CRC figure below; soon for BRCA)
Mutations
PA
RA
DIG
M S
igna
ture
s
Evidence forAHNAK2 actingPI3KCA-like?
Ted Goldstein
Summary• Modeling information flow on known pathways
gives view of gene activity.• Patient stratification into pathway-based subtypes• Sub-networks provide pathway-based signatures of
sub-types and mutations.• Loss- and gain-of-function predicted from pathway
neighbors for even rare mutations.• Identify interlinking genes associated with
mutations to implicate additional targets even in LOF cases• E.g. Target MYC-related pathways in certain
TP53-deficient cells?• Current work: Use pathways to now simulate the
affects of specific knock-downs.
James DurbinChris Szeto
UCSC Integrative Genomics Group
Sam Ng
Ted Golstein
Dan Carlin Evan Paull
Artem Sokolov
Chris Wong
Marcos Woehrmann
Yulia Newton
Please See Posters!
Acknowledgments
UCSC Cancer Genomics• Kyle Ellrott• Brian Craft• Chris Wilks• Amie Radenbaugh• Mia Grifford• Sofie Salama• Steve Benz
UCSC Genome Browser Staff• Mark Diekins• Melissa Cline• Jorge Garcia• Erich Weiler
Buck Institute for Aging• Christina Yau• Sean Mooney• Janita Thusberg
Collaborators• Joe Gray, LBL• Laura Heiser, LBL• Eric Collisson, UCSF• Nuria Lopez-Bigas, UPF• Abel Gonzalez, UPF
Funding Agencies• NCI/NIH• SU2C• NHGRI• AACR• UCSF Comprehensive Cancer
Center• QB3
Jing Zhu
David Haussler
Chris Benz,
Broad Institute• Gaddy Getz• Mike Noble• Daniel DeCara
UCSC Cancer Browsergenome-cancer.ucsc.edu
Jing Zhu
supplemental
325 Genes, 2213 Interactions
“Backbone” of 43 genes, 90 connections
“Backbone” of 43 genes, 90 connectionsMaster regulators: Cell-Cell Signaling, AKT1, Cyclin D1
“Backbone” of 43 genes, 90 connectionsMajor PARADIGM hubs included: MYC, FOXM1, FOXA1, HIF1A/ARNT
“Backbone” of 43 genes, 90 connectionsSignaling through beta-catenin explains MYC activity in basals:- deletions in CDKN2A de-repress CTNNB1 in basals or- lower expression of Cyclin D1 de-repress CTNNB1
Top 20 basal master controllers
Yulia Newton
RNAi vs. Master Controller (after recurring runs)
Yulia Newton
RPS6KA3- p90 S6 kinase
High-scoring afterIterative runs.
Basals differentiallySensitive to RNAi
Inhibitors available.Bas
al v
s Lu
min
al R
NA
i Gro
wth
Master Controller Score
RPS6KA3
AKT1
RAF1
PDPK1
AKT2