reduced google matrix approach for exploring biological networks · 2018-10-23 · reduced google...
TRANSCRIPT
Andrei Zinovyev
institut Curie - INSERM U900 - PSL Research University / Mines ParisTech
Computational Systems Biology of Cancer
Reduced Google Matrix
approach for exploring
biological networks
Biological networks
• Representation of cellular biochemistry at various level
granularity
RECON2.2 – close to complete
reconstruction of human
metabolic network (>10k reactions)
Mathematical modeling of large networks
of chemical reactions for biology?• Chemical kinetics formalism
• Lack of quantitative parameters
• Flux balance analysis – just stoichiometry
• Modularisation and abstraction
• Approaches based on “thermodynamic”
thinking – extracting macrovariables (e.g.,
formalism of invariant manifolds)
• Approaches based on assumptions on
parameter distribution : asymtotology of
chemical reaction networks (Gorban,
Radulescu, Zinovyev, Chem Eng Sci, 2009)
Cell metabolism
(including DNA
metabolism)
Signaling networks
(curation, e.g. SIGNOR
or ACSN)
Transcriptional
networks(in principle measurable)
Biological networks
• Representation of cellular biochemistry at various level
granularity
“Influence” networks“Interaction” networks
From Wikipedia
Atlas of Cancer Signaling Network: http://acsn.curie.fr (Kuperstein et al, Oncogenesis, 2015)
Google Maps API
Inna Kuperstein
Using ACSN map for visualization of
expression data
Patchy coloring
difficult to interpret
Smooth coloring
easily interpretable
Data
“Sm
oo
thin
g”
• Network defines functional proximity between genes/proteins:
– in standard approach the functional distance is binary (in the same pathway or
not) or discrete (number of steps)
– similarity function on the network graph:
how easy to get from A to B without knowing the map
• «Guilty by association» principle
– Determining «active network regions»
– «Neighbourhood» gene/protein sets
• Network propagation
– Propagation of «influence»:
local and distant
– Undirected and directed networks:
analogy with heat diffusion or
random walks on graphs
Use of large biological networks
in biological data analysis
Initial perturbation Stationary state
Network propagation ends up in a smooth score
distribution function defined on the interaction graph
A B
CD
E
F
G
H
I
K
A B
CD
E
F
G
H
I
K
Smooth distribution Non-smooth distribution
High score
Low score
Network smooting or Spectral graph analysis
(Fourier transformation on graphs), (Rapaport, Zinovyev, Barillot, Vert, BMC Bioinformatics, 2007)
Function on graph
Slow, smooth componentFast, high frequency
component
= +
Franck Rapaport
Classifier smooth on biological graphRapaport et al., BMC Bioinformatics, 2007
200 Gy irradiation
0 h
7 h
3 h
No irradiation
5 h
0 h
3 h
“Classical” SVM SVM done in the reduced
subspace of smooth functions
(first 20% of Laplacian eigenvalues)
Data from
Marie Dutreix
(Institut Curie)
DeDaL: Cytoscape plugin for constructing
data-driven network layouts http://bioinfo-out.curie.fr/projects/dedal/, Czerwinska et al, BMC Sys Biol, 2015
Tissue-specific gene
expression data +
Network smoothing +
Non-linear dimension
reduction (manifold learning)
Urszula Czerwinska
Network diffusion
vs random walk with restart
(Simple)
diffusion
Random walk
with restart
initial state
stationary
state
A B
CD
E
F
G
H
I
K
a
A B
CD
E
F
G
H
I
K
A B
CD
E
F
G
H
I
K
A B
CD
E
F
G
H
I
K
a
Random walk with restart (RWR): Google matrix
A B
C
D
E
F
G
H
I
K
Gij = aSij+(1-a)/N
G X = l X
PageRank =
“stationary”
eigenvector
corresponding
to l=1,
probability of
visiting the node
after infinite time
Analysing somatic mutations in cancerTCGA datasets
Problem of mutation data analysis
(predicting survival)
random
prediction
pa
tie
nts
genes
Problem of mutation data analysis
(clustering with NMF)
The role of the total number of mutations
(mutational load)
Overlap between
mutations is very small
Tumors are very
different in mutational load
NSQN method (Network Smoothing + Quantile
Normalization) (Hofree et al, 2013)
A B
CD
E
F
G
H
I
K
mutated
mutated
NSQN method (Network Smoothing + Quantile
Normalization) (Hofree et al, 2013)
A B
CD
E
F
G
H
I
K
mutated
mutated
Testing NSQN for survival prediction in cancer(Le Morvan, Zinovyev, Vert, PLoS Comp Biol, 2017)
• TCGA data on 8 cancer types (LUAD, SKCM, GBM, BRCA, KIRC,
HNSC, LUSC, OV)
• Benefit from NSQN only for 2 cancer types (LUAD, SKCM)
• Quantile normalization is an essential step! (NS = NSQN without QN)
• Considering first neighbours (SimpNSQN=NSQN with k=1) is enough!
random
prediction
Marine Le Morvan
Instead of network propagation+QN ->
NetNorm equilibrating the number of mutations to k
Hubs mark mutated
“functions”
Orphan nodes
are most probably
passenger mutations
NetNorm
normalizes the mutation
matrix using
“guilty by association”
principle
k=4
proxy
NetNorm is more performant than NSQN
(when both work)
Using only mutations Adding clinical info
random
prediction
Is the network structure really play role?
NetNorm benefits more from the real network structure than NSQN
random
prediction
Application of Google Matrix approach
to signalling network (SIGNOR)(Lager, Shepelyansky, Zinovyev, PLoS One, 2018)
3k nodes, 7k edges
Application of Google Matrix approach
to signalling network (SIGNOR)(Lager, Shepelyansky, Zinovyev, PLoS One, 2018)
Reduced Google matrix(refer to Frahm&Shepelyansky, arXiv, 2016)
A B
CD
E
F
G
H
I
K
direct
indirect “hidden”d
ire
ct lin
ks
ind
ire
ct
“hid
de
n”
(>0
.01
)
Dima
ShepelyanskyJosé Lages
Type of data: cell-specific
transcriptional regulation network
direct
indirect “hidden”
Signaling network (SN)
(SIGNOR database)
Transcription regulation
network (TRN), reconstructed
from systematic Chip-Seq
experiments
Reduced Google matrix analytically
quantifies the global effect of
transcriptional feedback
Comparing two TRN networks :
e.g., “normal” vs “cancer”
direct
indirect “hidden”
TRN1 (“normal”) TRN2 (“cancer”)
B
EG
Normal B-Lymphocytes Leukemia cell line
indirect signaling rewiring
Comparing two TRN networks :
e.g., “normal” vs “cancer” (Lager, Shepelyansky, Zinovyev, PLoS One, 2018)
direct
indirect “hidden”
TRN1 (“normal”) TRN2 (“cancer”)
B
EG
PageRank
goes down
PageRank
improves
Normal B-Lymphocytes Leukemia cell line
indirect signaling rewiring
Change of PageRank and CheiRank in cancer
CheiRank changes were 3 times larger and more
biologically interpretable (touch the genes associated with leukemia)
Genes of a proliferative signature
resulted from pancancer transcriptomic analysis
Genes of a proliferative signature
resulted from pancancer transcriptomic analysis
More genes are connected into the network
Emergence of a new “hidden” hub BUB1
Connection to PCNA (DNA replication and DNA repair)
Many cell cycle proteins improves in PageRank (AURK)
Connection between STIL (mitotic spindle checkpoint regulator) and CCNA2, CCNE1
WikiProteins project: studying the protein
network embedded in Wikipedia
A sample of Wikipedia
network of pages at
three steps from
“Transhumanism” article
(5k nodes, 23k links)
What is transhumanism?http://allthingsgraphed.com/2015/09/16/what-is-
transhumanism-wikipedia/
“Semantic field” of WikiPedia
Direct links
“Semantic field” of WikiPedia
Inferring directed network of proteins from
Wikipedia using reduced Google Matrix(ongoing work…)
Inferring directed network of proteins from
Wikipedia using reduced Google Matrix(ongoing work…)
~10000 wiki pages devoted to proteins
5000 proteins with described interactions, 16000 (2013) and
18000 (2017) direct connections
Wiki proteins hairball
The rest of the Wikipedia
defines a context of hyperlinks
(model of external world)
This network is embedded
in the global Wikipedia network
Reduced Google matrix
allows finding “hidden”
functional interactions between
proteins
Comparing protein network of direct links and network
of hidden links (same density) in Wikipedia (2013)
Direct links Hidden links
Comparing protein network of direct links and network
of hidden links (same density) in Wikipedia (2013)
Direct links
connectivity distribution
Hidden links
connectivity distribution
“Hidden” protein communities (2013)
Clustering the hidden network with MCL algorithm
Immune
system
Cell
CycleGlucagon
metabolism GTPase
singaling
Potassium
Ion transport
Transcription
factors
Apoptosis
and inflammationCoagulation
Keratins
Peroxisome
Nuclear
receptors
Hormone
receptors
“Hidden” protein communities (2017)
Immune
system
Cell
Cycle
(G2M)
? Apoptosis
Potassium
Ion transport
GTPase
singaling
Coagulation
SH2/3
signaling
DNA
repairNFkB
Cell-cell
junctions
Dynamics of the largest hidden
communities from 2013 to 2017 (20 largest)
20
13
co
mm
un
itie
s 20
17
co
mm
un
ities
Example: Coagulation-related community
Highest local PageRank : ThrombinThrombin
Antithrombin
Factor XII
Factor VIII
Transthyretin
Factor VII
Factor IX
Protein S
Protein C
Factor X
Factor V
P-selectin glycoprotein ligand-1
ADAMTS13
Tissue factor
Osteocalcin
Heparin cofactor II
Gamma-glutamyl carboxylase
Annexin A5
Apolipoprotein H
ITGA2B
GAS6
Fibrinogen alpha chain
Fibrinogen beta chain
Matrix gla protein
Carboxypeptidase B2
ITIH2
FGL2
ADAM22
Liver
Fresh
frozen
plasma
Fimbrin
SOS1
Epigen
CDC42
RAC1
RHOA
PAK1
Rac3
Kalirin
RAC2
ARHGEF7
WNK1
RhoG
Rnd1
ANLN
RALB
Dock2
Synergin gamma
EXOC7
Rnd3
RhoD
Rnd2
RhoH
RAPGEF2
Dock7
Dock4
RCC2
FMNL2
Dock3
SRGAP2
Birth of new super-large hidden
community in 2017Myoglobin
Hidden network
Direct interactions explaining
hidden connections
Conclusions
References1. Rapaport et al, BMC Bioinformatics 8:35. 2007.
2. Czerwinska et al, BMC Systems Biology 14;9:46, 2015.
3. Lages et al, PLoS One, 13(1):e0190812
4. Le Morvan et al. PLoS Comp Biol 13(6):e1005573. 2017.
https://github.com/marineLM/NetNorM
http://bioinfo-out.curie.fr/projects/dedal
Network propagation is a powerful tool in joint analysis of
molecular biology (medical) data and biological networks
First neighbourhood relations seem to be sufficient in practical
applications
Google Matrix approach highlights creative elements and detect
indirect rewiring events
Wikiprotein hidden communities gives idea about dynamical
hotspots of interest in molecular biology
Acknowledgements
Mutation data
analysis
Data-driven
network layouts
ACSN
Network smoothing
Funding from Agilent Thought Leader Award
CNRS ApliGoogle project
ACI IMPBIO KernelChip
Jean-Philippe Vert
Ecole de Mines
Emmanuel Batillot
Institut CurieFranck Rapaport
Memorial Sloan Kettering
Laurence Calzone Urszula Czerwinska
Institut Curie
Marine Le Morvan
Ecole de Mines
Inna Kuperstein
Institut Curie
Reduced Google matrix
Dima
Shepelyansky
Université Paul
Sabatier
José Lages
Institut UTINAM
Klaus Frahm
Université Paul
Sabatier