complex systems biology informed data analysis and machine learning

28
Complex Systems Biology informed Data Analysis Dmitry Grapov, PhD CDS- Creative Data Solutions www.createdatasol.com

Upload: dmitry-grapov

Post on 14-Apr-2017

125 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Complex Systems Biology Informed Data Analysis and Machine Learning

Complex Systems Biology informed

Data AnalysisDmitry Grapov, PhD

CDS- Creative Data Solutionswww.createdatasol.com

Page 2: Complex Systems Biology Informed Data Analysis and Machine Learning

About Me

Born: Minsk, Belarus in 1981

Minsk, BelarusUniversity of Utah (2000-2007)• B.S. Biology • B.S. Chemistry

Salt Lake City, UT

University of California, Davis (2007-2012)• Ph.D. Analytical Chemistry

with Emphasis in Biotechnology

• Post doc, Oliver Fiehn Lab

Davis, CA

data visualization network analysis machine learningpredictive modeling biochemistry software

WCMC - West Coast Metabolomics Center

• Principal Statistician at the NIH West Coast Metabolomics Center (WCMC)

Bioinformatics and Data Science• CDS - Creative Data Solutions

St. Louis, MO

Page 3: Complex Systems Biology Informed Data Analysis and Machine Learning

Structural similarity

Complex Metabolic Systems BiologyBiochemistry

Page 4: Complex Systems Biology Informed Data Analysis and Machine Learning

http://www.archaeology.org/issues/207-1603/features/4157-arles-roman-wall-paintings

Sam

ple

Variable

experimental design - organism, sex, age etc.analyte description and metadata- biochemical class, mass spectra, etc.

VariableSample

Biological Data

Page 5: Complex Systems Biology Informed Data Analysis and Machine Learning

http://www.archaeology.org/issues/207-1603/features/4157-arles-roman-wall-paintings

Materials of Connected BiologicalData Analysis and Visualization

Quality Assessment• use replicated measurements

and/or internal standards to estimate analytical variance

Statistical and Multivariate• use the experimental design to

test hypotheses and/or identify trends in analytes

Functional• use statistical and multivariate

results to identify impacted biochemical domains

Network and Predictive• integrate statistical and

multivariate results with the experimental design and analyte metadata

Page 6: Complex Systems Biology Informed Data Analysis and Machine Learning

Predictive Modeling Within a Biochemical Context

Grapov et. al., Circ. Cardiovasc. Genet. 2014

Personalized Medicine

Complex Data Integration

Grapov et. al.,PLoS ONE (2014) doi:10.1371/journal.pone.0084260

J. Proteome Res., 2015, 14 (1), pp 557–566 DOI: 10.1021/pr500782g

Biomarker Discovery

Page 7: Complex Systems Biology Informed Data Analysis and Machine Learning

Abun

danc

e

Time

Drift in >400 replicated measurements across >100 analytical batches for a single analyte

Quality Controls (QCs) embedded among >5,500 samples (1:10) collected over 1.5 years

Analytical Batch

Principal Component Analysis (PCA) of all analytes, showing QC sample scores

biological effect vs. analytical

variance

Time

Biochemical Signal Over Time

Page 8: Complex Systems Biology Informed Data Analysis and Machine Learning

Data Quality and Normalization

Analyte specific data quality overview

normalizations can be used to remove analytical variance

Raw Data Normalized Data

log mean

low precision

%RS

D

high precision

Page 9: Complex Systems Biology Informed Data Analysis and Machine Learning

Example of data normalization using a LOESS model fit QCs

Raw Data Normalized Data

SamplesQCs

LOESS

Data Normalization Strategies

Page 10: Complex Systems Biology Informed Data Analysis and Machine Learning

Maximizing Metabolomic Coverage

American Journal of Physiology - Endocrinology and Metabolism 2015 Vol. no. , DOI: 10.1152/ajpendo.00019.2015

Page 11: Complex Systems Biology Informed Data Analysis and Machine Learning

trans-Omic Biochemical Signal

http://dx.doi.org/10.1016/j.tibtech.2015.12.013

Page 12: Complex Systems Biology Informed Data Analysis and Machine Learning

Signal State

trans-Omic Signal

Page 13: Complex Systems Biology Informed Data Analysis and Machine Learning

~10% variance explained

Many diseases, including aging, have dominant metabolic components (e.g. metabolic syndrome)

PMID:24204828

Genotype + metabolome >40% variance explained

Type 2 DiabetesIs More Data the Answer?

Page 14: Complex Systems Biology Informed Data Analysis and Machine Learning

Biochemically Orthogonal Evidence

http://dx.doi.org/10.1016/j.tibtech.2015.12.013

Page 15: Complex Systems Biology Informed Data Analysis and Machine Learning

Systems Biology Informed

Personalized Medicine

http://dx.doi.org/10.1016/j.tibtech.2015.12.013 Grapov et. al., Circ. Cardiovasc. Genet. 2014

Page 16: Complex Systems Biology Informed Data Analysis and Machine Learning

Can You Spot the Signal?

metabolites

proteinsincrease

Page 17: Complex Systems Biology Informed Data Analysis and Machine Learning

Omic’ data integration strategies

Biomarker Insights 2015:Suppl. 4 1-6 DOI: 10.4137/BMI.S29511

Empirical correlation

Network based

Biochemical pathway

Page 18: Complex Systems Biology Informed Data Analysis and Machine Learning

MetaMapR: Metabolomic network calculation

www.createdatasol.com

Page 19: Complex Systems Biology Informed Data Analysis and Machine Learning

MappingsNetwork Mapped Network

Grapov D.,American Society of Mass Spectrometry Conference (2013, 2014)

Network Mapping

+ =

Page 20: Complex Systems Biology Informed Data Analysis and Machine Learning

Biochemically Defined Wellness

Page 21: Complex Systems Biology Informed Data Analysis and Machine Learning

DeviumWeb: Data analysis and visualization

www.createdatasol.com

Page 22: Complex Systems Biology Informed Data Analysis and Machine Learning

DeviumWeb: Cluster Analysis

www.createdatasol.com

Page 23: Complex Systems Biology Informed Data Analysis and Machine Learning

DeviumWeb: Predictive Modeling

www.createdatasol.com

Page 24: Complex Systems Biology Informed Data Analysis and Machine Learning

DeviumWeb: Pathway analysis

www.createdatasol.com

Page 25: Complex Systems Biology Informed Data Analysis and Machine Learning

http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

Machine Learning Biology

Page 26: Complex Systems Biology Informed Data Analysis and Machine Learning

https://commons.wikimedia.org/w/index.php?curid=2776582

Biochemical PageRank

Page 27: Complex Systems Biology Informed Data Analysis and Machine Learning

https://arxiv.org/ftp/arxiv/papers/1603/1603.06430.pdf

Biochemical Deep Learning

Page 28: Complex Systems Biology Informed Data Analysis and Machine Learning

Findata visualization network analysis machine learningpredictive modeling biochemistry software

www.createdatasol.com