complex systems biology informed data analysis and machine learning
TRANSCRIPT
Complex Systems Biology informed
Data AnalysisDmitry Grapov, PhD
CDS- Creative Data Solutionswww.createdatasol.com
About Me
Born: Minsk, Belarus in 1981
Minsk, BelarusUniversity of Utah (2000-2007)• B.S. Biology • B.S. Chemistry
Salt Lake City, UT
University of California, Davis (2007-2012)• Ph.D. Analytical Chemistry
with Emphasis in Biotechnology
• Post doc, Oliver Fiehn Lab
Davis, CA
data visualization network analysis machine learningpredictive modeling biochemistry software
WCMC - West Coast Metabolomics Center
• Principal Statistician at the NIH West Coast Metabolomics Center (WCMC)
Bioinformatics and Data Science• CDS - Creative Data Solutions
St. Louis, MO
Structural similarity
Complex Metabolic Systems BiologyBiochemistry
http://www.archaeology.org/issues/207-1603/features/4157-arles-roman-wall-paintings
Sam
ple
Variable
experimental design - organism, sex, age etc.analyte description and metadata- biochemical class, mass spectra, etc.
VariableSample
Biological Data
http://www.archaeology.org/issues/207-1603/features/4157-arles-roman-wall-paintings
Materials of Connected BiologicalData Analysis and Visualization
Quality Assessment• use replicated measurements
and/or internal standards to estimate analytical variance
Statistical and Multivariate• use the experimental design to
test hypotheses and/or identify trends in analytes
Functional• use statistical and multivariate
results to identify impacted biochemical domains
Network and Predictive• integrate statistical and
multivariate results with the experimental design and analyte metadata
Predictive Modeling Within a Biochemical Context
Grapov et. al., Circ. Cardiovasc. Genet. 2014
Personalized Medicine
Complex Data Integration
Grapov et. al.,PLoS ONE (2014) doi:10.1371/journal.pone.0084260
J. Proteome Res., 2015, 14 (1), pp 557–566 DOI: 10.1021/pr500782g
Biomarker Discovery
Abun
danc
e
Time
Drift in >400 replicated measurements across >100 analytical batches for a single analyte
Quality Controls (QCs) embedded among >5,500 samples (1:10) collected over 1.5 years
Analytical Batch
Principal Component Analysis (PCA) of all analytes, showing QC sample scores
biological effect vs. analytical
variance
Time
Biochemical Signal Over Time
Data Quality and Normalization
Analyte specific data quality overview
normalizations can be used to remove analytical variance
Raw Data Normalized Data
log mean
low precision
%RS
D
high precision
Example of data normalization using a LOESS model fit QCs
Raw Data Normalized Data
SamplesQCs
LOESS
Data Normalization Strategies
Maximizing Metabolomic Coverage
American Journal of Physiology - Endocrinology and Metabolism 2015 Vol. no. , DOI: 10.1152/ajpendo.00019.2015
trans-Omic Biochemical Signal
http://dx.doi.org/10.1016/j.tibtech.2015.12.013
Signal State
trans-Omic Signal
~10% variance explained
Many diseases, including aging, have dominant metabolic components (e.g. metabolic syndrome)
PMID:24204828
Genotype + metabolome >40% variance explained
Type 2 DiabetesIs More Data the Answer?
Biochemically Orthogonal Evidence
http://dx.doi.org/10.1016/j.tibtech.2015.12.013
Systems Biology Informed
Personalized Medicine
http://dx.doi.org/10.1016/j.tibtech.2015.12.013 Grapov et. al., Circ. Cardiovasc. Genet. 2014
Can You Spot the Signal?
metabolites
proteinsincrease
Omic’ data integration strategies
Biomarker Insights 2015:Suppl. 4 1-6 DOI: 10.4137/BMI.S29511
Empirical correlation
Network based
Biochemical pathway
MetaMapR: Metabolomic network calculation
www.createdatasol.com
MappingsNetwork Mapped Network
Grapov D.,American Society of Mass Spectrometry Conference (2013, 2014)
Network Mapping
+ =
Biochemically Defined Wellness
DeviumWeb: Data analysis and visualization
www.createdatasol.com
DeviumWeb: Cluster Analysis
www.createdatasol.com
DeviumWeb: Predictive Modeling
www.createdatasol.com
DeviumWeb: Pathway analysis
www.createdatasol.com
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
Machine Learning Biology
https://commons.wikimedia.org/w/index.php?curid=2776582
Biochemical PageRank
https://arxiv.org/ftp/arxiv/papers/1603/1603.06430.pdf
Biochemical Deep Learning
Findata visualization network analysis machine learningpredictive modeling biochemistry software
www.createdatasol.com