workflow4metabolomics : infrastructure pour l’analyse des
TRANSCRIPT
Workflow4Metabolomics :
Infrastructure pour l’analyse des données de métabolomique
BiLille octobre 2019
Jean-François Martin et la Core team
Outline
• METABOLOMIC
• Principle• Analytic tools
• W4M CONTEXT • History• Galaxy
• W4M Ecosystem• Tools• Services
METABOLOMIC
3
Omics…
4
Genotype Phenotype
Metabolomic workflow
5
Biologicalhypothesis
Metabolomics
workflow
Analyticalanalyses
LC-MS GC-MS
Analytical
chemistry
NMR
Metaboliteannotation
DatabasesAnalytical
chemistry
Pre-processing
Data
matrix
Statistics Statistics
Biology,
Medecine,
Biochemistry
Pathwayinterpretation
Bioinformatics
6
Target
Untargeted metabolomic :
- Used to detect unexpected changes in
metabolite concentrations; the aim is to detect
a maximum number of metabolites in order to
observe unexpected changes.
- Hundreds to thousands of metabolites can be
measured.
- No absolute quantification
- Needs multiple analytical devices
Semi targeted metabolomic :- In between, this approach seek a set of
known metabolites in an untargeted analysis.
- hundreds of metabolites.
- No absolute quantification
- Lipidome
- Exposome
- Microbiome
- Epigenome
Targeted metabolomic :
- Small number of metabolites,
- Biochemically annotated with known biological function
- Quantification of the metabolite is performed using
chemical standards.
Reso 1000 Reso >10000
Mass Spectrometry coupled with liquid (LC-MS) or gaz (GC-MS) chromatography
• Great sensitivity
• Relative quantification
• Low repetability, noisy
• Destructive
• Several ions for 1 molecule
NMR
• Low sensitivity
• Quantification ~absolute
• Robust and good repetability
• Non destructive
• Several chemical shifts for 1 molecule
7
Analytical technics
Biological matrix
• Urines
• Plasma
• liver
• Cells
• Fecal water
• Skin extract
Some numbers
• Sample prep
• 40min /injection
• 1 day for files conversion
• 1 day for pre-processing
• x days for statistics
• n days for annotation
• k days for interpretation
8
Analytical technics
Wishart D. PLOS ONE 2017
Drawback
• Black box extraction software
• MS signal drift
• Semi quantificative
• Problem to automate the annotation process
• Needs Inhouse databases
9
CONTEXT
10
Brief History
• 2005 Galaxy project
• 2006 Few bioinformatics tools
– Packages R xcms et CAMERA
– Incomplete database MS information (KEGG, metlin, HMDB,…)
– No annotation tools
• 2010 Scattered french inhouse tools
• 2013 MetaboHub & IFB french national infrastructures
• 2015 Giacomoni et al. doi:10.1093/bioinformatics/btu813
• 2017 Guitton et al (2017). IJBC, doi:10.1016/j.biocel.2017.07.002
11
W4M Metabolomic workflow
12
Biologicalhypothesis Analytical
analyses
LC-MSGC-MS
NMR
Metaboliteannotation
Pre-processing
Data matrix
Statistics
Pathwayinterpretation
Result : an online infrastructure for Metabolomics
Based on the Galaxy Framework
• Modular : ~40 modules• Reproducible approach • Sharing : data, workflkow, etc.• Sustainable :
– permanent staff– several funding : >100 PM non permanent staff– strong support from our 2 national communities
(IFB & MetaboHUB) with permanent staff
Online analysis
14
User interface and
results
Galaxy workflow
15
• 15 bioinformaticians
• 6 metabolomics platforms
• 2 French infrastructures:
16
W4M Core Team and Help desk
The W4M core team
17
Galaxy As A Gateway
Playing an important role in community building process
Synergy with the French Galaxy Working Group
The W4M ecosystem
W4MTraining
Workflow4Experimenters
5 sessions since 2014 (5 days)
10 trainers for 20/25 Trainees
Half theory - Half tutoring
“Bring your own data”
Help Desk
Tools
Building, running, saving and
sharing functionalities
On-line analysis
https://galaxy.workflow4metabolomics.org
Tools & Workflow
20
40 tools
Pre-processing
• LC-MS : based on xcms and CAMERA R packages
• GC-MS : based on metaMS package
• FIA-MS : fully developped for W4M
• NMR : fully devolopped for W4M21
• Data Extraction : From acquisition data files to dataMatrix
Normalization, filtration, correction
• Normalizations
– Internal standard
– Sum of intensities
– PQN (most probable quotient)
• Filtration
– based on correlation between dilutedquality control pooled samples
– based on CV of ions among samplesand among quality control pooledsamples.
• Correction of signal drift (MS) based on loess regression on quality control pooled samples
22
Raw Corrected
Statistical common tools
23
Annotation
• NMR
• MS
– Match with public database via webservices
• HMDB, KEGG, lipidmaps, chemspider…
• PeakForest
– In house database
24
Building comprehensive workflows
25
Developpment tools
Developments– Stick to IUC standards– A GitHub: github.com/workflow4metabolomics– Conda / Planemo / TravisCI
Digital object identifier DOI
– W4M provides DOI to reference histories (including data (from RAW to statistical/annotation results), tools and their parameters and workflow
– Usable in papers
– How : http://workflow4metabolomics.org/referenced_W4M_histories
27
Online analysis
28
Oct.2018 – Oct.2019
600 000 jobs/year
1500 registered users
Contribution Get, Push
You can get our tools- Download our VM W4M - All tools are publicly available in GitHub
You can push your tools
- Can be integrated and hosted within the main W4M instance
- Tools must stick IUC standards
- Support must be done by the developers themselve
https://github.com/workflow4metabolomics/workflow4metabolomics#how-to-contribute
PERSPECTIVES
30
Perspectives
• Interoperability
• MS/MS Julien Saint-Vanne
Yann Guitton, Gildas Le Corguillé
• Annotation RMN 2D Cécile Canlet, Franck Giacomoni et Marie Tremblay Franco
• Visualisation …
31
32
@workflow4metabo
github.com/workflow4metabolomics
Merci et à bientôt sur W4Mhttps://galaxy.workflow4metabolomics.org