bayesian inference of signaling network topology in a cancer cell line steven m. hill, yiling lu,...

BAYESIAN INFERENCE OF SIGNALING NETWORK TOPOLOGY IN A CANCERCELL LINE

Steven M. Hill, Yiling Lu, Jennifer Molina, Laura M. Heiser, Paul T. Spellman, Terence P. Speed, Joe W. Gray, Gordon B. Mills and Sach Mukherjee

Discussion Leader: AshwinScribe: MatthewComputational Network Biology BMI 826/Computer Sciences 838https://compnetbiocourse.discovery.wisc.edu

https://compnetbiocourse.discovery.wisc.edu/

Problem Overview

• Data-driven characterization of signaling networks, specific to a context of interest (such as a cell line or tissue type)

• Requires the ability to probe post-translational modification states in multiple proteins through time and across samples.

• Proteomic analyses on large scale are challenging• Intrinsic and experimental noise• Find trade-off between “fit-to-data” and “model parsimony”

Approach and its Motivation

• Learn network structure through a Dynamic Bayesian Network (DBN)• What?

• Nodes represent proteins at different time points• Edges represent probability relationships between the nodes• DBNs are capable of modelling feedback loops, while these are not allowed in

static BNs

• Why?• Due to computational constraints, statistical inference approaches are limited

to exploring only a few hypothesized networks, while this approach can explore a large number of candidate networks.

Fig.1: Data-driven characterization of signaling networks. Reverse-phase protein arrays interrogate signaling dynamics in samples of interest. Network structure is inferred using DBNs, with primary phospho-proteomic data integrated with existing biology, using informative priors objectively weighted by an empirical Bayes approach. Edge probabilities then allow the generation and prioritization of hypotheses for experimental validation

Method: Preview

• Biological signaling information is incorporated by assigning prior distributions to links

• Priors assigned based on frequency of links appearing in high-scoring topologies

• Bayesian Model Averaging carried out • Maximal marginal likelihood analysis• Obtain a closed form score for network topologies using ‘g-prior’

Assumptions

• Specific Context: breast cancer cell line MDA-MB-468• First-order Markov and Stationarity assumption• Dependence may be sparse, with each node at time t depending only on a subset of nodes at time t-1

• Edges are directed forward in time, hence no cycle-checking required

Joint Probability Distribution in a Bayesian Network

where X = (X1, . . . ,XT) is all data, πG (i) {1, . . . , p} is an index set for parents of protein ⊆ i according to graph

GXt πG(i) = { Xt

j | j π∈ G(i) } is data for the parents of protein i at time t

θi are parameters for the conditional distribution of X⊆ ti

ψi are parameters for X⊆ 1i

Joint Probability Distribution can be factorized into a product of conditional distributions :

Regression model

• The conditionals are taken to be Gaussian and describe the dependence of child nodes on parents and can be thought of as regression models

S• For ease of notation, X+

i = ( X2i X3

i … XTi ) and X-

i = ( X1i X2

i … XT-1

i )

Parameters• Dependence of a node on products of parents as well as parents themselves.

• For example, if πG(i) = {1, 2, 3} then the mean for variable Xti is a linear

combination of the three parents Xt-1j , the three possible pairwise products

of parents Xt-1j Xt-1

k and the product of all parents Xt-11 Xt-1

2 Xt-13 .

• For each protein i, let Bi denote a [ n x 2|πG(i)| - 1 ] design matrix, with columns being the regression coefficients corresponding to all possible combinations of parents, pairwise combinations,…,product of all parents.

• The 2|πG(i)| - 1 regression coefficients, forming a vector βi and variance σ2i

constitute parameters θi

Marginal Likelihood• Integrating out the prior parameters from the expression for marginal likelihood

a closed form expression for the same is obtained:

• Requires inverse of BiTBi , which may be ill-conditioned or singular, especially

when n << 2|πG(i)| - 1 • Thus the in-degree |πG(i)| < dmax is assumed

• Ridge regularization done for nodes with large |πG(i)| by adding a ‘ +ɑI ’ term to Bi

TBi

Network Priors• Priors are assumed as

P(G) exp( λ f(G) ),

where λ is a strength parameter for f(G), which is the score of each network according to existing biological knowledge from literature• λ chosen by empirically

maximizing marginal likelihood P(data | λ)

Posterior Probabilities of Edges• Posterior probability of an edge is

given by:

• where P(G|X) is a posterior distribution over graphs

• Instead of averaging over full graph, score subsets of potential parents and average over them

Results

Simulation study. Average ROC curves. True-positive rate (for network edges) plotted against false-positive rate across a range of edge probability thresholds. Simulated data were generated from known graph structures by ancestral sampling. Graph structures were created to be in only partial agreement with the network prior (Supplementary Fig. S3). Results shown are averages obtained from 25 iterations. See text for full details of simulation and for description of methods shown. (For ‘Lasso’, curve produced by thresholding absolute regression coefficients, while marker ‘X’ is single graph obtained by taking non-zero coefficients to be edges)

Results

Table 1. Synthetic yeast network study. Inference methods assessed on time-series gene expression data generated from a synthetically constructed gene regulatory network in yeast (Cantone et al., 2009). Results shown are area under the ROC curve (AUC). See text for description of methods shown. [The regimes using a network prior are mean AUCSD over 25 prior network structures. Prior networks were generated to be in partial agreement with the true, underlying network structure (see text for details).]

Results

ResultsFig. 4. Validation of predictions by targeted inhibition in breast cancer cell line MDA-MB-468. (a) MAPK-STAT3 crosstalk. Network inference (Fig. 3a) predicted an unexpected link between phospho-MAPK (MAPKp) and STAT3p(S727) in the breast cancer cell line MDA-MB-468. The hypothesis of MAPK-STAT3 crosstalk was tested by MEK inhibition: this successfully reduced MAPK phosphorylation and resulted in a corresponding decrease in STAT3p(S727). (b) AKTp → p70S6Kp, AKT-MAPK crosstalk and AKT-JNK/JUN crosstalk. AKTp is linked to p70S6kp, MEKp and cJUNp. In line with these model predictions, use of an AKT inhibitor reduced both p70S6K and MEK phosphorylation and increased JNK phosphorylation. (RPPA data; MEK inhibitor GSK1120212 and AKT inhibitor GSK690693B at 0 uM, 0.625 uM, 2.5 uMand 10 uM; measurements taken 0, 5, 15, 30, 60, 90, 120 and 180min after EGF stimulation; average values over 3 replicates shown, error bars indicate SEM)

Discussion

• Advantages:• Efficient and fast for moderate datasets• Takes advantage of existing information in the form of priors• Exact calculation of Posterior edge probabilities• Variable selection approach factorizes the problem and enables computations to be

parallelized

• Limitations:• Parameter prior used (g-prior) can suffer from matrix ill-conditioning• Assumes homogeneity of parameters and network structure through time, which

can be unrealistic• Does not take into account ‘latent variables’

bayesian inference of signaling network topology in a cancer cell line steven m. hill, yiling lu,...

Documents