![Page 1: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/1.jpg)
Advanced Strategies for Metabolomic Data Analysis
Dmitry Grapov, PhD
Met
abol
omic
Dat
a An
alys
is
![Page 2: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/2.jpg)
Analysis at the Metabolomic Scale
![Page 3: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/3.jpg)
Multivariate Analysis
Samples
variables
![Page 4: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/4.jpg)
Multivariate Analysis
• Visualization• Clustering• Projection• Modeling • Networks
Simultaneous analysis of many variables
![Page 5: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/5.jpg)
ClusteringIdentify
•patterns
•group structure
•relationships
•Evaluate/refine hypothesis
•Reduce complexity
Artist: Chuck Close
![Page 6: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/6.jpg)
Cluster AnalysisUse the concept similarity/dissimilarity to group a collection of samples or variables
Approaches•hierarchical (HCA)•non-hierarchical (k-NN, k-means)•distribution (mixtures models)•density (DBSCAN)•self organizing maps (SOM)
Linkage k-means
Distribution Density
![Page 7: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/7.jpg)
Hierarchical Cluster Analysis• similarity/dissimilarity defines “nearness” or distance
X
Y
euclidean
X
Y
manhattan Mahalanobis
X
Y*
non-euclidean
![Page 8: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/8.jpg)
Hierarchical Cluster Analysis
single complete centroid average
Agglomerative/linkage algorithm defines how points are grouped
![Page 9: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/9.jpg)
Hierarchical Cluster Analysis (cont.)
Sim
ilarit
y
x
xx
x
![Page 10: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/10.jpg)
Overview Confirmation
How does my metadata match my data structure?
Hierarchical Cluster Analysis (cont.)
![Page 11: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/11.jpg)
Multidimensional Scaling
PLoS ONE 7(11): e48852. doi:10.1371/journal.pone.0048852
![Page 12: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/12.jpg)
Projection of Data
The algorithm defines the position of the light sourcePrincipal Components Analysis (PCA)
• unsupervised• maximize variance (X)
Partial Least Squares Projection to Latent Structures (PLS)
• supervised• maximize covariance (Y ~ X)
![Page 13: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/13.jpg)
PCA: GoalsPrincipal Components (PCs)
•non-supervised
•projection of the data which maximize variance explained
Results
1.eigenvalues = variance explained
2.scores = new coordinates for samples (rows)
3.loadings = linear combination of original variables
James X. Li, 2009, VisuMap Tech.
![Page 14: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/14.jpg)
Interpreting PCA Results
Variance explained (eigenvalues)
Row (sample) scores and column (variable) loadings
![Page 15: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/15.jpg)
PCA Example
*no scaling or centering
glucose
219021
![Page 16: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/16.jpg)
How are scores and loadings related?
![Page 17: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/17.jpg)
Centering and Scaling
van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7: 142.
![Page 18: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/18.jpg)
Data scaling is very important!
*autoscaling (unit variance and centered)
glucose (GC/TOF)
glucose (clinical)
219021
![Page 19: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/19.jpg)
Use PLS to test a hypothesis
Loadings on the first latent variable (x-axis) can be used to interpret the multivariate changes in metabolites which are correlated with time
time = 0 120 min.
![Page 20: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/20.jpg)
Modeling multifactorial relationships
dynamic changes among groups~two-way ANOVA
![Page 21: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/21.jpg)
“goodness” of the model is all about the perspective
Determine in-sample (Q2) and out-of-sample error (RMSEP) and compare to a random model
•permutation tests
•training/testing
![Page 22: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/22.jpg)
Biological Interpretation
• Visualization• Enrichment• Networks
– biochemical– structural– spectral– empirical
Projection or mapping of analysis results into a biological context.
![Page 23: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/23.jpg)
Ingredients for Network Analysis
1. Determine connections• biochemical (substrate/product) • chemical similarity• spectral similarity• empirical dependency (correlation)
2. Determine vertex properties• magnitude• importance• direction• relationships
![Page 24: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/24.jpg)
Organism specific biochemical relationships and information
Multiple organism DBs
•KEGG
•BioCyc
•Reactome
•Human
•HMDB
•SMPDB
Making Connections Based on Biochemistry
![Page 25: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/25.jpg)
Biochemical Networks
![Page 26: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/26.jpg)
•Use structure to generate molecular fingerprint
•Calculate similarities between metabolites based on fingerprint
•PubChem service for similarity calculations•http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
•online tools•http://uranus.fiehnlab.ucdavis.edu:8080/MetaMapp/homePage
BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99
Making Connections Based on structural similarity
![Page 27: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/27.jpg)
Structural Similarity Network
![Page 28: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/28.jpg)
Making Connections Based on spectral similarity
Watrous J et al. PNAS 2012;109:E1743-E1752
•Connect molecules based on EI or MS/MS spectral similarity
•Useful for linking annotated analytes (known) to unknown
![Page 29: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/29.jpg)
Spectral Similarity Network
Watrous J et al. PNAS 2012;109:E1743-E1752
![Page 30: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/30.jpg)
Making connections based on empirical relationships
•Connect molecules based on strength of correlation or partial-correlation
![Page 31: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/31.jpg)
Treatment Effects Network
=
MetabolitesShape = increase/decreaseSize = importance (loading)Color = correlation
Connectionsred = Biochemical relationships violet = Structural similarity
![Page 32: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/32.jpg)
Summary
Multivariate analysis is useful for: • Visualization• Exploration and overview• Complexity reduction• Identification of multidimensional
relationships and trends• Mapping to networks• Generating holistic summaries of
findings
![Page 33: Advanced strategies for Metabolomics Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062312/554e84edb4c90526358b45be/html5/thumbnails/33.jpg)
Resource
•Mapping tools (review)• Brief Bioinform (2012) doi: 10.1093/bib/bbs055
•Tutorials and Examples• http://imdevsoftware.wordpress.com/category/uncategorized/ • https://github.com/dgrapov/TeachingDemos
•Chemical Translations Services• CTS: http://cts.fiehnlab.ucdavis.edu/
•R-interface: https://github.com/dgrapov/CTSgetR • CIR: http://cactus.nci.nih.gov/chemical/structure
•R-interface: https://github.com/dgrapov/CIRgetR