![Page 1: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/1.jpg)
Automation of Biological Data Analysis and Report Generation
Dmitry Grapov, PhD
![Page 2: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/2.jpg)
Bots write the darndest things
http://www.latimes.com/local/lanow/earthquake-27-quake-strikes-near-westwood-california-rdivor,0,3229825.story#axzz2wQwc82EK
• fill in the template (easy)
• human-guided automation (e.g. Metaboanalyst, intermediate)
• intelligent/reactive writing (e.g. ~AI, advanced)
http://narrativescience.com/
![Page 3: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/3.jpg)
Humans + Bots
Interaction:
• Bots and humans combine in guided analyses
• Humans: make choices (based on bot guides)
• Bots: automate!
Facilitate:
• workflow logging and template creation
• reproducible results
Bot: Initial data and meta data parsing and quality validation
(need: template input)
Human: data cleaning and experimental design identification
(use: multiple choice, dynamic GUI)
Bot: instantiation of complex workflows
Human: overview of bot assumptions and results
Bot: Numerical and text output generation
![Page 4: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/4.jpg)
Humans + Bots write darndender things?
Choose Your Own Life Adventure!
?
https://github.com/
dgrapov/AdventureR
![Page 5: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/5.jpg)
Data Analysis Tasks
Visualization (how does it look?)
• histograms, density plots, box plots, line plots, scatter plots, networks, etc.
Statistical Analysis (what is statistically significant?)
• summary tables, ANOVA, FDR adjustment, power analysis, etc.
Exploration (what are the major patterns/trends?)
• clustering, PCA, ICA, etc.
Predictive Modeling (what explains my hypothesis?)
• mixed effects, partial least squares (O-/PLS/-DA), etc.
Network Analysis and Mapping (how are things related?)
• Functional analysis: pathway enrichment or overrepresentation
• Networks: biochemical, structural, mass spectral and empirical networks
• Mapping: projection of analysis results onto network
![Page 6: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/6.jpg)
WCMC Data Analysis Reports ™
Statistical analysisClusteringPCAO-PLS-DABiochemical enrichmentNetwork mapping
Input template: BinBase
• inference of experimental goals from sample meta data
• mapping variables to external databases
Tasks:
Report:
Tools:
![Page 7: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/7.jpg)
Automation Challenges
Data cleaning and quality validation
• use: quality control samples; identify: precision/accuracy, normalization, batch corrections; mitigate: outliers, missing values, batch effects, etc.
Identification of experimental goals
• use: meta data, identify: main and accessory effects; choose: statistics, multivariate tests and visualizations
Integration of multiple tasks to evolve robust analyses • tasks: statistics, multivariate, functional, networks,
database mapping, etc
Data analysis report generation
• use: R, Latex, markdown
?
![Page 8: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/8.jpg)
Challenges to automated metabolite ID mapping
Stereochemistry?
Search: catechin
Best Match: Catechin
Biologically relevant:
D-catechin
Synonyms?
Search: UDP GlcNAc
FAIL: UDP GlcNac
PASS: UDP-GlcNac
![Page 9: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/9.jpg)
Strategies for automated metabolite ID mapping (from synonym)
#1: CTS+ #2: Web query #3: Curated DB
• Use CTS to translate from synonyms to KEGG (KID) and PubChem (CID)
• Use KEGGREST and PUG to filter and choose most appropriate IDs
• Use fuzzy matching and word similarity metrics (e.g. Damerau–Levenshtein distance)
• Use KEGGREST + PubChem PUG to translate synonyms to IDs
• For KEGG ID:
synonym SID KID
• Generate a curated DB for KEGG and CID translations +
• Include InChI Keys
• Map to other DBs
• Allow fuzzy matching on synonyms
• e.g. IDEOM http://bioinformatics.oxfordjournals.org/content/early/2012/02/04/bioinformatics.bts069
![Page 10: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/10.jpg)
Interactive Analysis and Report Generation
knitr (http://yihui.name/knitr/)
Analysis Report Generation
• Analysis on rails or open sandbox
• Humans facilitate robust results generation + Bots ensure reproduction
• Generation of Methods and Results should be automateable
![Page 11: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/11.jpg)
Devium 2.0Human-guided automated data analysis and report generator
Human-guided automation could help ensure robust results by making choices which are otherwise difficult to automate.
https://github.com/dgrapov/DeviumWeb
![Page 12: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/12.jpg)
MetaMapRLinking data analysis and
biologyhttps://github.com/dgrapov/MetaMapR
Integration of complex work flows is key to automation.
![Page 13: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/13.jpg)
+ Workflows for complex experiments (e.g. time-course)
+ Biochemical functional analysis (pathway enrichment)
+ GUI for report generation (Devium 2.0)
+ Integrate multi-’Omic’ data sets (MetaMapR 2.0)
+ Scientific literature mining (RapportR)
+ Interactive plots and networks (JavaScript)
Future Goals
![Page 14: Automation of (Biological) Data Analysis and Report Generation](https://reader033.vdocuments.us/reader033/viewer/2022042814/554e84dab4c90526358b45b0/html5/thumbnails/14.jpg)
[email protected] metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154