reduced google matrix approach for exploring biological networks · 2018-10-23 · reduced google...

Andrei Zinovyev

institut Curie - INSERM U900 - PSL Research University / Mines ParisTech

Computational Systems Biology of Cancer

Reduced Google Matrix

approach for exploring

biological networks

Biological networks

• Representation of cellular biochemistry at various level

granularity

RECON2.2 – close to complete

reconstruction of human

metabolic network (>10k reactions)

Mathematical modeling of large networks

of chemical reactions for biology?• Chemical kinetics formalism

• Lack of quantitative parameters

• Flux balance analysis – just stoichiometry

• Modularisation and abstraction

• Approaches based on “thermodynamic”

thinking – extracting macrovariables (e.g.,

formalism of invariant manifolds)

• Approaches based on assumptions on

parameter distribution : asymtotology of

chemical reaction networks (Gorban,

Radulescu, Zinovyev, Chem Eng Sci, 2009)

Cell metabolism

(including DNA

metabolism)

Signaling networks

(curation, e.g. SIGNOR

or ACSN)

Transcriptional

networks(in principle measurable)

Biological networks

• Representation of cellular biochemistry at various level

granularity

“Influence” networks“Interaction” networks

From Wikipedia

Atlas of Cancer Signaling Network: http://acsn.curie.fr (Kuperstein et al, Oncogenesis, 2015)

Google Maps API

Inna Kuperstein

http://acsn.curie.fr/

Using ACSN map for visualization of

expression data

Patchy coloring

difficult to interpret

Smooth coloring

easily interpretable

Data

“Sm

oo

thin

g”

• Network defines functional proximity between genes/proteins:

– in standard approach the functional distance is binary (in the same pathway or

not) or discrete (number of steps)

– similarity function on the network graph:

how easy to get from A to B without knowing the map

• «Guilty by association» principle

– Determining «active network regions»

– «Neighbourhood» gene/protein sets

• Network propagation

– Propagation of «influence»:

local and distant

– Undirected and directed networks:

analogy with heat diffusion or

random walks on graphs

Use of large biological networks

in biological data analysis

Initial perturbation Stationary state

Network propagation ends up in a smooth score

distribution function defined on the interaction graph

A B

CD

E

F

G

H

I

K

A B

CD

E

F

G

H

I

K

Smooth distribution Non-smooth distribution

High score

Low score

Network smooting or Spectral graph analysis

(Fourier transformation on graphs), (Rapaport, Zinovyev, Barillot, Vert, BMC Bioinformatics, 2007)

Function on graph

Slow, smooth componentFast, high frequency

component

= +

Franck Rapaport

Classifier smooth on biological graphRapaport et al., BMC Bioinformatics, 2007

200 Gy irradiation

0 h

7 h

3 h

No irradiation

5 h

0 h

3 h

“Classical” SVM SVM done in the reduced

subspace of smooth functions

(first 20% of Laplacian eigenvalues)

Data from

Marie Dutreix

(Institut Curie)

DeDaL: Cytoscape plugin for constructing

data-driven network layouts http://bioinfo-out.curie.fr/projects/dedal/, Czerwinska et al, BMC Sys Biol, 2015

Tissue-specific gene

expression data +

Network smoothing +

Non-linear dimension

reduction (manifold learning)

Urszula Czerwinska

http://bioinfo-out.curie.fr/projects/dedal/

Network diffusion

vs random walk with restart

(Simple)

diffusion

Random walk

with restart

initial state

stationary

state

A B

CD

E

F

G

H

I

K

a

A B

CD

E

F

G

H

I

K

A B

CD

E

F

G

H

I

K

A B

CD

E

F

G

H

I

K

a

Random walk with restart (RWR): Google matrix

A B

C

D

E

F

G

H

I

K

Gij = aSij+(1-a)/N

G X = l X

PageRank =

“stationary”

eigenvector

corresponding

to l=1,

probability of

visiting the node

after infinite time

Analysing somatic mutations in cancerTCGA datasets

Problem of mutation data analysis

(predicting survival)

random

prediction

pa

tie

nts

genes

Problem of mutation data analysis

(clustering with NMF)

The role of the total number of mutations

(mutational load)

Overlap between

mutations is very small

Tumors are very

different in mutational load

NSQN method (Network Smoothing + Quantile

Normalization) (Hofree et al, 2013)

A B

CD

E

F

G

H

I

K

mutated

mutated

Testing NSQN for survival prediction in cancer(Le Morvan, Zinovyev, Vert, PLoS Comp Biol, 2017)

• TCGA data on 8 cancer types (LUAD, SKCM, GBM, BRCA, KIRC,

HNSC, LUSC, OV)

• Benefit from NSQN only for 2 cancer types (LUAD, SKCM)

• Quantile normalization is an essential step! (NS = NSQN without QN)

• Considering first neighbours (SimpNSQN=NSQN with k=1) is enough!

random

prediction

Marine Le Morvan

Instead of network propagation+QN ->

NetNorm equilibrating the number of mutations to k

Hubs mark mutated

“functions”

Orphan nodes

are most probably

passenger mutations

NetNorm

normalizes the mutation

matrix using

“guilty by association”

principle

k=4

proxy

NetNorm is more performant than NSQN

(when both work)

Using only mutations Adding clinical info

random

prediction

Is the network structure really play role?

NetNorm benefits more from the real network structure than NSQN

random

prediction

Application of Google Matrix approach

to signalling network (SIGNOR)(Lager, Shepelyansky, Zinovyev, PLoS One, 2018)

3k nodes, 7k edges

Application of Google Matrix approach

to signalling network (SIGNOR)(Lager, Shepelyansky, Zinovyev, PLoS One, 2018)

Reduced Google matrix(refer to Frahm&Shepelyansky, arXiv, 2016)

A B

CD

E

F

G

H

I

K

direct

indirect “hidden”d

ire

ct lin

ks

ind

ire

ct

“hid

de

n”

(>0

.01

)

Dima

ShepelyanskyJosé Lages

Type of data: cell-specific

transcriptional regulation network

direct

indirect “hidden”

Signaling network (SN)

(SIGNOR database)

Transcription regulation

network (TRN), reconstructed

from systematic Chip-Seq

experiments

Reduced Google matrix analytically

quantifies the global effect of

transcriptional feedback

Comparing two TRN networks :

e.g., “normal” vs “cancer”

direct


TRN1 (“normal”) TRN2 (“cancer”)

B

EG

Normal B-Lymphocytes Leukemia cell line

indirect signaling rewiring

Comparing two TRN networks :

e.g., “normal” vs “cancer” (Lager, Shepelyansky, Zinovyev, PLoS One, 2018)

direct


TRN1 (“normal”) TRN2 (“cancer”)

B

EG

PageRank

goes down

PageRank

improves

Normal B-Lymphocytes Leukemia cell line

indirect signaling rewiring

Change of PageRank and CheiRank in cancer

CheiRank changes were 3 times larger and more

biologically interpretable (touch the genes associated with leukemia)

Genes of a proliferative signature

resulted from pancancer transcriptomic analysis

Genes of a proliferative signature

resulted from pancancer transcriptomic analysis

More genes are connected into the network

Emergence of a new “hidden” hub BUB1

Connection to PCNA (DNA replication and DNA repair)

Many cell cycle proteins improves in PageRank (AURK)

Connection between STIL (mitotic spindle checkpoint regulator) and CCNA2, CCNE1

WikiProteins project: studying the protein

network embedded in Wikipedia

A sample of Wikipedia

network of pages at

three steps from

“Transhumanism” article

(5k nodes, 23k links)

What is transhumanism?http://allthingsgraphed.com/2015/09/16/what-is-

transhumanism-wikipedia/

“Semantic field” of WikiPedia

Direct links

“Semantic field” of WikiPedia

Inferring directed network of proteins from

Wikipedia using reduced Google Matrix(ongoing work…)

Inferring directed network of proteins from

Wikipedia using reduced Google Matrix(ongoing work…)

~10000 wiki pages devoted to proteins

5000 proteins with described interactions, 16000 (2013) and

18000 (2017) direct connections

Wiki proteins hairball

The rest of the Wikipedia

defines a context of hyperlinks

(model of external world)

This network is embedded

in the global Wikipedia network

Reduced Google matrix

allows finding “hidden”

functional interactions between

proteins

Comparing protein network of direct links and network

of hidden links (same density) in Wikipedia (2013)

Direct links Hidden links

Comparing protein network of direct links and network

of hidden links (same density) in Wikipedia (2013)

Direct links

connectivity distribution

Hidden links

connectivity distribution

“Hidden” protein communities (2013)

Clustering the hidden network with MCL algorithm

Immune

system

Cell

CycleGlucagon

metabolism GTPase

singaling

Potassium

Ion transport

Transcription

factors

Apoptosis

and inflammationCoagulation

Keratins

Peroxisome

Nuclear

receptors

Hormone

receptors

“Hidden” protein communities (2017)

Immune

system

Cell

Cycle

(G2M)

? Apoptosis

Potassium

Ion transport

GTPase

singaling

Coagulation

SH2/3

signaling

DNA

repairNFkB

Cell-cell

junctions

Dynamics of the largest hidden

communities from 2013 to 2017 (20 largest)

20

13

co

mm

un

itie

s 20

17

co

mm

un

ities

Example: Coagulation-related community

Highest local PageRank : ThrombinThrombin

Antithrombin

Factor XII

Factor VIII

Transthyretin

Factor VII

Factor IX

Protein S

Protein C

Factor X

Factor V

P-selectin glycoprotein ligand-1

ADAMTS13

Tissue factor

Osteocalcin

Heparin cofactor II

Gamma-glutamyl carboxylase

Annexin A5

Apolipoprotein H

ITGA2B

GAS6

Fibrinogen alpha chain

Fibrinogen beta chain

Matrix gla protein

Carboxypeptidase B2

ITIH2

FGL2

ADAM22

Liver

Fresh

frozen

plasma

Fimbrin

SOS1

Epigen

CDC42

RAC1

RHOA

PAK1

Rac3

Kalirin

RAC2

ARHGEF7

WNK1

RhoG

Rnd1

ANLN

RALB

Dock2

Synergin gamma

EXOC7

Rnd3

RhoD

Rnd2

RhoH

RAPGEF2

Dock7

Dock4

RCC2

FMNL2

Dock3

SRGAP2

Birth of new super-large hidden

community in 2017Myoglobin

Hidden network

Direct interactions explaining

hidden connections

Conclusions

References1. Rapaport et al, BMC Bioinformatics 8:35. 2007.

2. Czerwinska et al, BMC Systems Biology 14;9:46, 2015.

3. Lages et al, PLoS One, 13(1):e0190812

4. Le Morvan et al. PLoS Comp Biol 13(6):e1005573. 2017.

https://github.com/marineLM/NetNorM

http://bioinfo-out.curie.fr/projects/dedal

Network propagation is a powerful tool in joint analysis of

molecular biology (medical) data and biological networks

First neighbourhood relations seem to be sufficient in practical

applications

Google Matrix approach highlights creative elements and detect

indirect rewiring events

Wikiprotein hidden communities gives idea about dynamical

hotspots of interest in molecular biology

https://github.com/marineLM/NetNorM

http://bioinfo-out.curie.fr/projects/dedal

Acknowledgements

Mutation data

analysis

Data-driven

network layouts

ACSN

Network smoothing

Funding from Agilent Thought Leader Award

CNRS ApliGoogle project

ACI IMPBIO KernelChip

Jean-Philippe Vert

Ecole de Mines

Emmanuel Batillot

Institut CurieFranck Rapaport

Memorial Sloan Kettering

Laurence Calzone Urszula Czerwinska

Institut Curie

Marine Le Morvan

Ecole de Mines

Inna Kuperstein

Institut Curie

Reduced Google matrix

Dima

Shepelyansky

Université Paul

Sabatier

José Lages

Institut UTINAM

Klaus Frahm

Université Paul

Sabatier

reduced google matrix approach for exploring biological networks · 2018-10-23 · reduced google...

Documents