big challenges with big data in life sciences
DESCRIPTION
Big Challenges with Big Data in Life Sciences. Shankar Subramaniam UC San Diego. The Digital Human. A Super-Moore’s Law. Adapted from Lincoln Stein 2012. The Phenotypic Readout. Data to Networks to Biology. NETWORK RECONSTRUCTION. Data-driven network reconstruction of biological systems - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/1.jpg)
Big Challenges with Big Data in Life Sciences
Shankar SubramaniamUC San Diego
![Page 2: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/2.jpg)
The Digital Human
![Page 3: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/3.jpg)
A Super-Moore’s Law
Adapted from Lincoln Stein 2012
![Page 4: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/4.jpg)
The Phenotypic Readout
![Page 5: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/5.jpg)
Data to Networks to Biology
Data to Networks to Biology
![Page 6: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/6.jpg)
NETWORK RECONSTRUCTION
• Data-driven network reconstruction of biological systems– Derive relationships between input/output data– Represent the relationships as a network
Inverse Problem: Data-driven Network Reconstruction
Experiments/Measurements
![Page 7: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/7.jpg)
Network ReconstructionsReverse Engineering of biological networks
• Reverse engineering of biological networks:
- Structural identification: to ascertain network structure or topology.
- Identification of dynamics to determine interaction details.
• Main approaches:
- Statistical methods- Simulation methods- Optimization methods- Regression techniques- Clustering
![Page 8: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/8.jpg)
Network Reconstruction of Dynamic Biological
Systems: Doubly Penalized LASSO
Behrang Asadi*, Mano R. Maurya*,
Daniel Tartakovsky, Shankar Subramaniam
Department of Bioengineering
University of California, San Diego
• NSF grants (STC-0939370, DBI-0641037 and DBI-0835541)• NIH grants 5 R33 HL087375-02* Equal effort
![Page 9: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/9.jpg)
APPLICATIONPhosphoprotein signaling and cytokine measurements in RAW 264.7 macrophage cells.
![Page 10: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/10.jpg)
MOTIVATION FOR THE NOVEL METHOD
• Various methods– Regression-based approaches (least-squares) with statistical
significance testing of coefficients– Dimensionality-reduction to handle correlation: PCR and PLS
– Optimization/Shrinkage (penalty)-based approach: LASSO
– Partial-correlation and probabilistic model/Bayesian-based
• Different methods have distinct advantages/disadvantages‒ Can we benefit by combining the methods?
‒ Compensate for the disadvantages
• A novel method: Doubly Penalized Linear Absolute Shrinkage and Selection Operator (DPLASSO)‒ Incorporate both “statistical significant testing” and “Shrinkage”
![Page 11: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/11.jpg)
LINEAR REGRESSION
Goal: Building a linear-relationship based model
X: input data (m samples by n inputs), zero mean, unit standard deviationy: output data (m samples by 1 output column), zero-meanb: model coefficients: translates into the edges in the networke: normal random noise with zero mean
Ordinary Least Squares solution:
Formulation for dynamic systems:
2ˆ arg min{ ( - ) ( - )}Te b y Xb y Xb-1ˆ ( )T Tb X X X y
),0(~ Nee;Xby
),0(~)( Nttdt
dee;Xb
XXXy
![Page 12: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/12.jpg)
• Most coefficients non-zero, a mathematical artifact
• Perform statistical significance testing
• Compute the standard deviation on the coefficients
• Ratio
• Coefficient is significant (different from zero) if:
• Edges in the network graph represents the coefficients.
STATISTICAL SIGNIFICANCE TESTING
* 2ˆ ˆ
ˆˆ ˆcov( )
T
b bb
y y
, , , ,/ij k ij k ij kr b b
tinv(1 / 2, )
, 1 confidence level
ijr v
v DOF
* Krämer, Nicole, and Masashi Sugiyama. "The degrees of freedom of partial least squares regression." Journal of the American Statistical Association106.494 (2011): 697-705.
1);/())(( :SquaresFor Least 2/11, nmvvmRMSEXXdiag LS
TLSb
mmyystdyym
RMSE pii
m
i piiLS /)1()()(1
,1
2,
![Page 13: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/13.jpg)
• Partial least squares finds direction in the X space that explains the maximum variance direction in the Y space
• PLS regression is used when the number of observations per variable is low and/or collinearity exists among X values
• Requires iterative algorithm: NIPALS, SIMPLS, etc
• Statistical significance testing is iterative
CORRELATED INPUTS: PLS
T
T
0
X=TP +E
Y=UQ +F
ˆ ˆY=XB+B
* H. WOLD, (1975), Soft modelling by latent variables; the non-linear iterative partial least squares approach, in Perspectives in Probability and Statistics, Papers in Honour of M. S. Bartlett, J. Gani, ed., Academic Press, London.
![Page 14: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/14.jpg)
LASSO• Shrinkage version of the Ordinary Least Squares,
subject to L-1 penalty constraint (the sum of the absolute value of the coefficients should be less than a threshold)
• Where represents the full least square estimates• 0 < t < 1: causes the shrinkage
The LASSO estimator is then defined as:
* Tibshirani, R.: ‘Regression shrinkage and selection via the Lasso’, J. Roy. Stat. Soc. B Met., 1996, 58, (1), pp. 267–288
Cost Function
Cost Function
L-1 Constrain
t
L-1 Constrain
t
jj
jj
N
i jijji
btb
xbbybb
0
1
200
ˆ subject to
)(argmin)ˆ,ˆ(
0b̂
![Page 15: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/15.jpg)
Noise and Missing Data– More systematic comparison needed with
respect to:1. Noise: Level, Type2. Size (dimension)3. Level of missing data4. Collinearity or dependency among input
channels5. Missing data6. Nonlinearity between inputs/outputs and
nonlinear dependency7. Time-series inputs(/outputs) and dynamic
structure
![Page 16: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/16.jpg)
METHODS
• Linear Matrix Inequalities (LMI)*
– Converts a nonlinear optimization problem into a linear optimization problem.
– Congruence transformation:
– Pre-existing knowledge of the system (e.g. ) can be added in the form of LMI constraints:
– Threshold the coefficients:
13 210 , 0a a
* [Cosentino, C., et al., IET Systems Biology, 2007. 1(3): p. 164-173]
min( ) / ( - )( - )n p
Tm m
B
e s t e
Y Xb Y Xb I
ˆ-0
ˆ( ) -
m m
Tp p
e
I Y Xb
Y - Xb I
( )0T T Ti j j iv u u v B B
0,
1,r
ir
v r iv
v r i
0,
1,r
ir
u r iu
u r i
.. :2 2
ˆ ˆ ˆ ˆ/ij ij i jb b b b
![Page 17: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/17.jpg)
METRICS
• Metrics for comparing the methods
o Reconstruction from 80% of datasets and 20% for validation o RMSE on the test set, and the number and the identity of the
significant predictors as the basic metric to evaluate the performance of each method
1. Fractional error in the estimating the parameters
2. Sensitivity, specificity, G, accuracy
,,
,
1method jfrac j
true j
bb mean
b
parameters smaller than 10% of the standard deviation of all parameter values were set to 0 when generating the synthetic data
:
:
:
TN TPAccuracy
TN TP FN FPTP
SensitivityTP FN
TNSpecificity
TN FP
TP : True PositiveFP : False PositiveTN : True NegativeFN : False Negative
![Page 18: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/18.jpg)
RESULTS: DATA SETS
• Data sets for benchmarking: Two data sets
1. First set: experimental data measured on macrophage cells (Phosphoprotein (PP) vs Cytokine)*
2. Second sets consist of synthetic data generated in Matlab. We build the model using 80% of the data-set (called training set) and use the rest of data-set to validate the model (called test set).
* [Pradervand, S., M.R. Maurya, and S. Subramaniam, Genome Biology, 2006. 7(2): p. R11].
![Page 19: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/19.jpg)
RESULTS: PP-Cytokine Data Set
• Schematic representation of Phosphoprotein (PP) vs Cytokine
- Signals were transmitted through 22 recorded signaling proteins and other pathways (unmeasured pathways).
- Only measured pathways contributed to the analysis
Schematic graphs from:[Pradervand, S., M.R. Maurya, and S. Subramaniam, Genome Biology, 2006. 7(2): p. R11].
![Page 20: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/20.jpg)
PP-CYTOKINE DATASET
Measurements of phosphoproteins in response to LPS
Courtesy: AfCS
![Page 21: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/21.jpg)
Measurements of cytokines in response to LPS
~ 250 such datasets
![Page 22: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/22.jpg)
RESULTS: COMPARISON• Comparison on synthetic noisy data• The methods are applied on synthetic data with 22 inputs and 1
output. The true coefficients for the inputs (about 1/3rd) are made zero to test the methods if they identify them as insignificant.
• Effect of noise levelFour outputs with 5, 10, 20 and 40% noise levels, respectively, are generated from the noise-free (true) output.
• Effect of noise typeThree outputs with White, t-distributed, and uniform noise types, respectively are generated from the noise-free (true) output
![Page 23: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/23.jpg)
RESULTS: COMPARISON• Variability between realizations of data with white
noise
PCR, LASSO, and LMI—are used to identify significant predictors for 1000 input-output pairs.
Histograms of the coefficients in the three significant predictors common to the three methods:
Method Predictor # 1 10 11
True value -3.40 5.82 -6.95
PCR Mean -3.81 4.73 -6.06
Std. 0.33 0.32 0.32
Frac. Err. in mean 0.12 0.19 0.13
LASSO Mean -2.82 4.48 -5.62
Std. 0.34 0.32 0.33
Frac. Err. in mean 0.17 0.23 0.19
LMI Mean -3.70 4.74 -6.34
Std. 0.34 0.32 0.34
Frac. Err. in mean 0.09 0.18 0.09
Mean and standard deviation in the histograms of the coefficients computed with
PCR, LASSO, and LMI.
![Page 24: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/24.jpg)
RESULTS: COMPARISON• Comparison of outcome of different methods on the real data
Different methods identified unique sets of common and distinct predictors for each output
Graphical illustration of methods PCR, LASSO, and LMI in detection of significant predictors for output IL-6 in PP/cytokine experimental dataset
• Only the PCR method detects the true input cAMP
• zone I provides validation and it highlights the common output of all the methods
![Page 25: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/25.jpg)
RESULTS: SUMMARY
• Comparison with respect to different noise types:– LASSO is the most robust methods for different noise types.
• Missing data RMSE: – LASSO less deviation, more robust.
• Collinearity: – PCR less deviation against noise level, better accuracy and G
with increasing noise level.
![Page 26: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/26.jpg)
A COMPARISON (Asadi, et al., 2012)
Methods / Criteria PCR LASSO LMIIncreasing Noise
RMSEScore= (average RMSE across different noise levels for LS)/(average RMSE across different noise levels for the chosen method)
/ 0.68degrades gradually with level of noise
/ 0.56 / 0.94
Standard deviation and error in mean of Coefficients. Score = 1 – average (fractional error in mean(10,12,20) + (std(10,12,20)/ |true associated coefficients|) )
/ 0.53 / 0.47 / 0.55
Acc./GScore = average accuracy across different noise levels for chosen method (white noise)
/ 0.70 / 0.87 / 0.91
at high noise all similar
Fractional Error in estimating the parametersScore = 1- average fractional error in estimating the coefficients across different noise levels for chosen method (white noise)
/ 0.81 / .55 / 0.78
Types of noise
Fractional Error in estimating the parameters Score = 1- average fractional error in estimating the coefficients across different noise levels and different noise types (20% noise level)
/ 0.80 / 0.56 / 0.79
Accuracy and GScore = average accuracy across different noise levels and different noise types
/ 0.71 / 0.87 / 0.91
Dimension ratio / Size
Fractional Error in estimating the parametersScore = 1- average fractional error in estimating the coefficients across different noise levels and different ratios (m/n = 100/25, 100/50, 400/100)
/ 0.77 / 0.53 / 0.75
Accuracy and G Score = average accuracy across different white noise levels and different ratios (m/n = 100/25, 100/50, 400/100)
/ 0.66 / 0.83 / 0.90
![Page 27: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/27.jpg)
DPLASSODoubly Penalized Least Absolute Shrinkage and Selection Operator
![Page 28: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/28.jpg)
OUR APPROACH: DPLASSO
Reconstructed Network
Reconstructed Network
1 3 5 6 7
ˆy = Xb + ε
: , , , , ,...B b b b b b
Statistical Significant Testing
PLS
Statistical Significant Testing
PLS
1 2 3 4 5 6 7 8: , , , , , , , , ...
: 0, 1 , 0 , 1 , 0 , 1 , 0, 1 ,...
B b b b b b b b b
W
LASSOLASSO
1, 2 3 4 5 6 7 8: , , , , , , , ...B b b b b b b b b
ModelModel
ˆy = Xb + ε
![Page 29: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/29.jpg)
• Our approach: DPLASSO includes two parameter selection layers:
• Layer 1 (supervisory layer): – Partial Least Squares (PLS)– Statistical significance testing
• Layer 2 (lower layer):– LASSO with extra weights on less informative model parameters
derived in layer 1
– Retain significant predictors and set the remaining small coefficients to zero
DPLASSO WORK FLOW
2
1,..., 1,...,
ˆ arg min{ ( - ) ( - )}
ˆ ˆ/
Tj j j j j
LSij ij ij ij
i p i p
e
s t w b t w b
b y Xb y Xb
wij 0 bij is PLS - significant
1 otherwise
![Page 30: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/30.jpg)
DPLASSO: EXTENDED VERSION• Smooth weights:• Layer 1 :
– Continuous significance score η (versus binary):
– Mapping function (logistic significance score):
• Layer 2:– Continuous weight vector (versus fuzzy weight vector)
- tinv(1 / 2, )
, 1 confidence level
i i
PLS
r v
v DOF
( )
1( )
1 ii ise
2
1,..., 1,...,
ˆ ˆˆ arg min{ ( - ) ( - )} , / T LSj j j j j i ij i ij
i p i p
e s t w b t w b
b y Xb y Xb
-5 0 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(Significance Score)
s() (Significance Score)w() (Weight function)
15.05.0)(00 :tscoefficienant insignific5.001)(5.00 :tscoefficient significan
),()(1
iiii
iiii
iiiii
wsws
ssw
Tuning parameter
![Page 31: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/31.jpg)
APPLICATIONS1. Synthetic (random) networks: Datasets
generated in Matlab
2. Biological dataset: Saccharomyces cerevisiae - cell cycle model
![Page 32: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/32.jpg)
SYNTHETIC (RANDOM) NETWORKS
Datasets generated in Matlab using:
• Linear dynamic system
• Dominant poles/Eigen values (λ) ranges [-2,0]
• Lyapunov stable – Informal definition from wikipedia: if all solutions of the
dynamical system that start out near an equilibrium point xe
stay near xe forever, then the system is “Lyapunov stable”.
• Zero-input/Excited-state release condition
• 5% measurement (white) noise.
),0(~)( Nttdt
dee;Xb
XXXy
![Page 33: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/33.jpg)
• Two metrics to evaluate the performance of DPLASSO
1. Sensitivity, Specificity, G (Geometric mean of Sensitivity and
Specificity), Accuracy
2. The root-mean-squared error (RMSE) of prediction
METRICS
TP : True PositiveFP : False PositiveTN : True NegativeFN : False Negative
2,
1
1( )
m
i i pi
RMSE y ym
Accuracy TN TP
TN TP FN FP
Sensitivity TP
TP FN
Specificity TN
TN FP
Precision TP
TP FP
![Page 34: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/34.jpg)
TUNING
• Tuning shrinkage parameter for DPLASSO
The shrinkage parameters in LASSO level (threshold t) via k-fold cross-validation (k = 10) on associated dataset
Validation error versus selection threshold t for DPLASSO on synthetic data set
Rule of thumb after cross validations:
Example:Optimal value of the tuning parameter for a network with 65% connectivity roughly equal to 0.65
![Page 35: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/35.jpg)
PERFORMANCE COMPARISON: ACCURACY
00.5
11.5
-4
-2
0
20.5
0.55
0.6
0.65
0.7
Accuracy
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20.2
0.4
0.6
0.8
1
Accuracy
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20.2
0.4
0.6
0.8
1
Accuracy
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Accuracy
LASSODPLASSOPLS
Density 5%Density 10%
Density 50%Density 20%
Network Size 20MC 10Noise 5%
• PLS Better performance• DPLASSO provides good compromise between LASSO and PLS in terms of
accuracy for different network densities
![Page 36: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/36.jpg)
PERFORMANCE COMPARISON: SENSITIVITY
00.5
11.5
-4
-2
0
20.4
0.6
0.8
1
Sensitivity
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20.4
0.6
0.8
1
Sensitivity
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20.4
0.6
0.8
1
Sensitivity
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20.4
0.6
0.8
1
Sensitivity
LASSODPLASSOPLS
Density 5%Density 10%
Density 50% Density 20%
Network Size 20MC 10Noise 5%
• LASSO has better performance• DPLASSO provides good compromise between LASSO and PLS in terms of
Sensitivity for different network densities
![Page 37: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/37.jpg)
PERFORMANCE COMPARISON: SPECIFICITY
00.5
11.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Specificity
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Specificity
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Specificity
LASSODPLASSOPLS
00.5
11.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Specificity
LASSODPLASSOPLS
Density 50% Density 20%
Density 5%Density 10%
Network Size 20MC 10Noise 5%
• DPLASSO provides good compromise between LASSO and PLS in terms of specificity for different network densities.
![Page 38: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/38.jpg)
PERFORMANCE COMPARISON: NETWORK-SIZE
• DPLASSO provides good compromise between LASSO and PLS in terms of accuracy for different network sizes
• DPLASSO provides good compromise between LASSO and PLS in terms of sensitivity (not shown) for different network sizes
00.5
11.5
-4
-20
20.2
0.4
0.6
0.8
1
LASSO
DPLASSO
PLS
00.5
11.5
-4
-2
0
20.2
0.4
0.6
0.8
1
Accuracy
LASSO
DPLASSO
PLS
00.5
11.5
-4
-2
0
20.2
0.4
0.6
0.8
1
LASSO
DPLASSO
PLS
Network Size: 10* 100 potential connections
Network Size: 20* 400 potential connections
Network Size: 50* 2500 potential connections
Acc
urac
y
Acc
urac
y
Acc
urac
y
![Page 39: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/39.jpg)
ROC CURVE vs. DYNAMICS AND WEIGHTINGS
• DPLASSO exhibits better performance for networks with slow dynamics. • The parameter γ in DPLASSO can be adjusted to improve performance
for fast dynamic networks
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
ROC for variable (the closer to origin the larger - Density: 20% MC: 10 Size: 50)
Specificity
Se
nsi
tivity
LASSODPLASSOPLS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1ROC for variable (the larger the larger - Density: 20% MC: 10 Size: 50)
Specificity
Se
nsi
tivity
LASSODPLASSOPLS
![Page 40: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/40.jpg)
YEAST CELL DIVISION
Experimental dataset generated via well-known nonlinear model of a
cell division cycle of fission yeast. The model is dynamic with 9 state
variables.
* Novak, Bela, et al. "Mathematical model of the cell division cycle of fission yeast." Chaos: An Interdisciplinary Journal of Nonlinear Science 11.1 (2001): 277-286.
![Page 41: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/41.jpg)
CELL DIVISION CYCLETrue Network (Cell Division Cycle)
PLS DPLASSO LASSO
Missing in DPLASSO!
![Page 42: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/42.jpg)
RECONSTRUCTION PERFORMANCE
MethodMetric
Accuracy Sensitivity Specificity SD RMSE/Mean
LASSO 0.31 0.92 0.16 0.14
DPLASSO 0.56 0.73 0.52 0.08
PLS 0.60 0.67 0.63 0.09
Case Study II: Cell Division Cycle, Average over γ value
Case Study I: 10 Monte Carlo Simulations, Size 20, Average over different γ, λ, network density, and Monte Carlo sample datasets
MethodMetric
Accuracy Sensitivity Specificity SD RMSE/Mean
LASSO 0.39 0.90 0.05 0.06
DPLASSO 0.52 0.90 0.34 0.07
PLS 0.59 0.80 0.20 0.07
![Page 43: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/43.jpg)
CONCLUSION• Novel method, Doubly Penalized Linear Absolute Shrinkage and
Selection Operator (DPLASSO), to reconstruct dynamic biological networks – Based on integration of significance testing of coefficients and optimization
– Smoothening function to trade off between PLS and LASSO
• Simulation results on synthetic datasets– DPLASSO provides good compromise between PLS and LASSO in terms
of accuracy and sensitivity for • Different network densities• Different network sizes
• For biological dataset– DPLASSO best in terms of sensitivity
– DPLASSO good compromise between LASSO and PLS in terms of accuracy, specificity and lift
![Page 44: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/44.jpg)
Information Theory Methods
Farzaneh Farangmehr
![Page 45: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/45.jpg)
Mutual Information
• It gives us a metric that is indicative of how much information from a variable can be obtained to predict the behavior of the other variable .
• The higher the mutual information, the more similar are the two profiles.
• For two discrete random variables of X={x1,..,xn} and Y={y1,…ym}:
p(xi,yj) is the joint probability of xi and yj
P(xi) and p(yj) are marginal probability of xi and yj
m
j
n
i ji
jiji ypxp
yxpyxpYXI
1 1 )()(
),(log),();(
![Page 46: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/46.jpg)
Information theoretical approachShannon theory
• Hartley’s conceptual framework of information relates the information of a random variable with its probability.
• Shannon defined “entropy”, H, of a random variable X given a random sample in terms of its probability distribution:
• Entropy is a good measure of randomness or uncertainty.
• Shannon defines “mutual information” as the amount of information about a random variable X that can be obtained by observing another random variable Y:
},...,{ 1 nxx
)](log[)()()()(11
i
n
iii
n
ii xPxPxIxPXH
),()()()()(),()()(),( XYIYXHXHXYHYHYXHYHXHYXI
![Page 47: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/47.jpg)
Mutual information networks
X={x1 , …,xi} Y={y1 , …,yj}
•The ultimate goal is to find the best model that maps X Y- The general definition: Y= f(X)+U. In linear cases: Y=[A]X+U where [A] is a
matrix defines the linear dependency of inputs and outputs
•Information theory maps inputs to outputs (both linear and non-linear models) by using the mutual information:
m
j
n
i ji
jiji ypxp
yxpyxpYXI
1 1 )()(
),(log),();(
![Page 48: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/48.jpg)
Mutual information networks• The entire framework of network reconstruction using information
theory has two stages: 1-Mutual information measurements2- The selection of a proper threshold.
• Mutual information networks rely on the measurement of the mutual information matrix (MIM). MIM is a square matrix whose elements (MIMij = I(Xi;Yj)) are the mutual information between Xi and Yj.
• Choosing a proper threshold is a non-trivial problem. The usual way is to perform permutations of expression of measurements many times and recalculate a distribution of the mutual information for each permutation. Then distributions are averaged and the good choice for the threshold is the largest mutual information value in the averaged permuted distribution.
![Page 49: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/49.jpg)
Mutual information networksData Processing Inequality (DPI)
)],();,(min[),( 322131 ggIggIggI
)()(),( jiji gpgpggp
![Page 50: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/50.jpg)
ARACNe algorithm
• ARACNE stands for “Algorithm for the Reconstruction of Accurate Cellular NEtworks” [25].
• ARACNE identifies candidate interactions by estimating pairwise gene expression profile mutual information, I(gi, gj) and then filter MIs using an appropriate threshold, I0, computed for a specific p-value, p0. In the second step, ARACNe removes the vast majority of indirect connections using the Data Processing Inequality (DPI).
![Page 51: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/51.jpg)
Protein-Cytokine
Network in Macrophag
e Activation
![Page 52: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/52.jpg)
Application to Protein-Cytokine Network Reconstruction
Release of immune-regulatory Cytokines during inflammatory response is medicated by a complex signaling network [45].
Current knowledge does not provide a complete picture of these signaling components.
22 Signaling proteins responsible for cytokine releases:
cAMP, AKT, ERK1, ERK2, Ezr/Rdx, GSK3A, GSK3B, JNK lg, JNK sh, MSN, p38, p40Phox, NFkB p65, PKCd, PKCmu2,RSK, Rps6 , SMAD2, STAT1a, STAT1b, STAT3, STAT5
7 released cytokines (as signal receivers):G-CSF, IL-1a, IL-6, IL-10, MIP-1a, RANTES, TNFa
we developed an information theoretic-based model that derives the responses of seven Cytokines from the activation of twenty two signaling Phosphoproteins in RAW 264.7 macrophages.
This model captured most of known signaling components involved in Cytokine releases and was able to reasonably predict potentially important novel signaling components.
![Page 53: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/53.jpg)
Protein-Cytokine Network ReconstructionMI Estimation using KDE
- Given a random sample for a univariate random variable X with an unknown density a kernel density estimator (KDE) estimates the shape of this function as:
assuming Gaussian kernels:
- Bivariate kernel density function of two random variables X and Y given two random samples and :
- Mutual information of X and Y using Kernel Density Estimation:
n =sample size; h=kernel width
},...,{ 1 nxxf
)(1
)(1
)(1 h
xxk
nhxxk
nxf i
hi
n
ih
n
i
i
h
xx
nhxf
12
2
2 2
)(exp
2
1)(
},...,{ 1 nxx },...,{ 1 nyy
n
i
ii
h
yyxx
nhyxf
12
22
2 2
)()(exp
2
1),(
n
j jj
jj
yfxf
yxf
nYXI
1 )()(
),(ln
1),(
![Page 54: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/54.jpg)
Protein-Cytokine Network ReconstructionKernel bandwidth selection
• There is not a universal way of choosing h and however the ranking of the MI’s depends only weakly on them.
• The most common criterion used to select the optimal kernel width is to minimize expected risk function, also known as the mean integrated squared error (MISE):
• Loss function (Integrated Squared Error) :
• Unbiased Cross-validation approach select the kernel width that minimizes the lost function by minimizing:
where f(-i),h (xi) is the kernel density estimator using the bandwidth h at xi obtained after removing ith observation.
constdxxfdxxfdxxfxfdxxf
dxxfxfhL
h
h
h
)( where)()()(2)(
)](()([)(
222
2
)(2
)()(),(1
2i
hi
n
ih xf
ndxxfhUCV
xxfxfEhMISE h d)]()([)( 2
![Page 55: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/55.jpg)
Protein-Cytokine Network ReconstructionThreshold Selection
• Based on large deviation theory (extended to biological networks by ARACNE), the probability that an empirical value of mutual information I is greater than I0, provided that its true value , is:
Where the bar denotes the true MI, N is the sample size and c is a constant. After taking the logarithm of both sides of the above equation:
• Therefore, lnP can be fitted as a linear function of I0 and the slope of b, where b is proportional to the sample size N.
• Using these results, for any given dataset with sample size N and a desired p-value, the corresponding threshold can be obtained.
0I
0ln bIaP
![Page 56: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/56.jpg)
Protein-Cytokine Network ReconstructionKernel density estimation of cytokines
Figure 3: The probability distribution of seven released cytokines in macrophage 246.7 using on Kernel density estimation (KDE)
Mutual information for all 22x7 pairs of phosphoprotein-cytokine from toll data (the upper bar) and non-toll data (the lower bar).
![Page 57: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/57.jpg)
Protein-Cytokine Network ReconstructionProtein-Cytokine signaling networks
The topology of signaling protein-released cytokines obtained from the non-Toll (A) and Toll (B) data.
A
B
![Page 58: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/58.jpg)
Protein-cytokine Network Reconstruction Summary
• This model successfully captures all known signaling components involved in cytokine releases
• It predicts two potentially new signaling components involved in releases of cytokines including: Ribosomal S6 kinase on Tumor Necrosis Factor and Ribosomal Protein S6 on Interleukin-10.
• For MIP-1α and IL-10 with low coefficient of determination data that lead to less precise linear the information theoretical model shows advantage over linear methods such as PCR minimal model [Pradervand et al.] in capturing all known regulatory components involved in cytokine releases.
![Page 59: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/59.jpg)
Network reconstruction from time-course dataBackground: Time-delayed gene networks
• Comes from the consideration that the expression of a gene at a certain time could depend by the expression level of another gene at previous time point or at very few time points before.
• The time-delayed gene regulation pattern in organisms is a common phenomenon since:
• If effect of gene g1 on gene g2 depends on an inducer,g3, that has to be bound first in order to be able to bind to the inhibition site on g2, there can be a significant delay between the expression of gene g1 and its observed effect, i.e., the inhibition of gene g2.
• Not all the genes that influence the expression level of a gene are necessarily observable in one microarray experiment. It is quite possible that there are not among the genes that are being monitored in the experiment, or its function is currently unknown.
![Page 60: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/60.jpg)
Network reconstruction from time-course dataThe Algorithm
downstsup
tssts iiiii
eeoreeeICNA 00 //minarg)(
![Page 61: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/61.jpg)
Network reconstruction from time-course data
Algorithm
![Page 62: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/62.jpg)
Network reconstruction from time-course dataThe flow diagram
Gene lists Cluster into
n subnetwork
s
Measure sub-network
activities
Measure the influence between flagged sub-
networks
Build Inflence matrix Find the
threshold
Remove connections below the threshold
Apply DPI for connections above
the threshold
Build the network based on non-zero
elements of the mutual information
matrix
Flag potentially dependent sub-
networks by measuring ICNA
The flow diagram of the information theoretic approach for biological network reconstruction from time-course microarray data by identifying the topology of functional sub-networks
![Page 63: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/63.jpg)
Network reconstruction from time-course dataCase study: the yeast cell-cycle
The cell cycle consists of four distinct phases:
G0 (Gap 0) :A resting phase where the cell has left the cycle and has stopped dividing.
G1 (Gap 1) : Cells increase in size in Gap 1. The G1 checkpoint control mechanism ensures that everything is ready for DNA synthesis.
S1 (Synthesis): DNA replication occurs during this phase.
G2 (Gap 2): During the gap between DNA synthesis and mitosis, the cell will continue to grow. The G2 checkpoint
control mechanism ensures that everything is ready to enter the M (mitosis) phase and divide.
M (Mitosis) : Cell growth stops at this stage and cellular energy is focused on the orderly division into two daughter cells. A checkpoint in the middle of mitosis (Metaphase Checkpoint) ensures that the cell is ready to complete cell division.
![Page 64: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/64.jpg)
Network reconstruction from time-course dataCase study: the yeast cell-cycle
• Data from Gene Expression Omnibus (GEO)
• Culture synchronized by alpha factor arrest. samples taken every 7 minutes as cells went through cell cycle.
• Value type: Log ratio
• 5,981 genes, 7728 probes and 14 time points
• 94 Pathways from KEGG Pathways
![Page 65: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/65.jpg)
Network reconstruction from time-course dataCase study: the yeast cell-cycle
The reconstructed functional network of yeast cell cycle obtained from time-course microarray data
![Page 66: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/66.jpg)
Mutual information networksAdvantages and Limits
![Page 67: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/67.jpg)
Time Varying NetworksCausality
Maryam Masnardi-Shirazi
![Page 68: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/68.jpg)
Causal Inference of Time-Varying Biological Networks
![Page 69: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/69.jpg)
Definition of Causality
![Page 70: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/70.jpg)
Beyond Correlation: Causation
Idea: map a set of K time series to a directed graph with K nodes where an edge is placed from a to b if the past of a has an impact
on the future of b
How do we quantitatively do this in a general purpose manner?
![Page 71: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/71.jpg)
Granger’s Notion of Causality
It is said that process X Granger Causes Process Y, if future values of Y can be better predicted using the past values of X
and Y than only using past values of Y.
![Page 72: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/72.jpg)
Ganger Causality Formulation
• There are many ways to formulate the notion of granger causality, some of which are:
- Information Theory and the concept of Directed Information- Learning Theory - Dynamic Bayesian Networks- Vector Autoregressive Models (VAR)- Hypothesis Tests, e.g. t-test and F tests
![Page 73: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/73.jpg)
Vector Autoregressive Model (VAR)
![Page 74: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/74.jpg)
Least Squares Estimation
![Page 75: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/75.jpg)
Least Squares Estimation (Cont.)
![Page 76: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/76.jpg)
Processing the data• Phosphoprotein two-ligand screen assay: RAW 264.7
• There are 327 experiments from western blots processed with mixtures of phosphospecific antibodies. In all experiments, the effects of single ligand and simultaneous ligand addition are measured
• Each experiment includes the fold change of Phosphoprotein at time points t=0, 1, 3, 10, 30 minutes
• Data at time=30 minute is omitted, and data from t=0:10 is interpolated by steps=1 min
![Page 77: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/77.jpg)
Least Squares Estimation and Rank Deficiency of Transformation Matrix
Exp.1
Exp.2
Exp. 327
All Y data
Exp.1
Exp.2
Exp. 327
All X data
![Page 78: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/78.jpg)
Normalizing the data
![Page 79: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/79.jpg)
Statistical Significance Test (Confidence Interval)
![Page 80: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/80.jpg)
The Reconstructed Phosphoproteins Signaling Network
• The network is reconstructed by estimating causal relationships between all nodes
• All the 21 phosphoproteins are present and interacting with one another
• There are 122 edges in this network
![Page 81: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/81.jpg)
Correlation and Causation
• The conventional dictum that "correlation does not imply causation" means that correlation cannot be used to infer a causal relationship between the variables
• This does not mean that correlations cannot indicate the potential existence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown
• Consequently, establishing a correlation between two variables is not a sufficient condition to establish a causal relationship (in either direction).
![Page 82: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/82.jpg)
Correlation and Causality comparison Heat-map of the correlation matrix
between the input (X) and output (Y)
The reconstructed network considering significant coefficients and their intersection with connections having correlations higher
than 0.5
The conventional dictum that "correlation does not imply causation" means that correlation cannot be used to infer a causal relationship between the variables. This dictum should not be taken to mean that correlations cannot indicate the potential existence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown. Consequently, establishing a correlation between two variables is not a sufficient condition to establish a causal relationship (in either direction).
![Page 83: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/83.jpg)
Correlation and Causality comparison (cont.)
Heat-map of the correlation matrix between the input (X) and output (Y)
The reconstructed network considering significant coefficients and their intersection with connections having correlations higher
than 0.4
![Page 84: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/84.jpg)
Validating our network
Identification of Crosswalk between
phosphoprotein Signaling
Pathways in RAW 264.7 Macrophage Cells (Gupta et al.,
2010)
![Page 85: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/85.jpg)
The Reconstructed Phosphoproteins Signaling Network for t=0 to t=4 minutes
Heat-map of the correlation matrix between the input (X) and output
(Y) for t=0 to t=4 minutes
Intersection of Causal Coefficients with connections with correlations higher than
0.4 for time t=0 to t=4 minutes
9 nodes15 edges
![Page 86: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/86.jpg)
The Reconstructed Phosphoproteins Signaling Network for t=3 to t=7 minutes
Heat-map of the correlation matrix between the input (X) and output
(Y) for t=3 to t=7 minutes
Intersection of Causal Coefficients with connections with correlations higher than
0.4 for time t=3 to t=7 minutes
19 nodes51 edges
![Page 87: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/87.jpg)
The Reconstructed Phosphoproteins Signaling Network for t=6 to t=10 minutes
Heat-map of the correlation matrix between the input (X) and output
(Y) for t=6 to t=10 minutes
Intersection of Causal Coefficients with connections with correlations higher
than 0.4 for time t=6 to t=10 minutes
19 nodes56 edges
![Page 88: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/88.jpg)
Time-Varying reconstructed Network
t=0 to 4 min t=3 to 7 min t=6 to 10 min
![Page 89: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/89.jpg)
The Reconstructed Network for t=0 to t=4 minutes without the presence of LPS as a
Ligand
With LPS15 Edges
Without LPS16 Edges
![Page 90: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/90.jpg)
The Reconstructed Network for t=3 to t=7 minutes without the presence of LPS as a Ligand VS the
presence of all ligands
With all ligands including LPS
(51 Edges)
Without LPS
(55 Edges)
![Page 91: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/91.jpg)
The Reconstructed Network for t=6 to t=10 minutes without the presence of LPS as a Ligand VS the
presence of all ligands
With all ligands including LPS
(56 Edges)
Without LPS
(66 Edges)
![Page 92: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/92.jpg)
Time-Varying Network with LPS not present as a ligand
t=0 to 4 min t=3 to 7 min t=6 to 10 min
![Page 93: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/93.jpg)
Summary
![Page 94: Big Challenges with Big Data in Life Sciences](https://reader035.vdocuments.us/reader035/viewer/2022062309/568135d4550346895d9d3e6d/html5/thumbnails/94.jpg)