bioshare: opal and mica: a software suite for data harmonization and federation - vincent ferretti -...
TRANSCRIPT
![Page 1: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/1.jpg)
A SOFTWARE SUITE FOR DATA HARMONIZATION AND FEDERATION
Vincent FerrettiOntario Institute for Cancer Research
![Page 2: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/2.jpg)
The Maelstrom Research Software Suite
Software development started in 2007$3,800,000 CAD of investment so far
Onyx
Opal
Mica DataSHIELD
Collection
StorageManagement Harmonization
Publication Analysis
![Page 3: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/3.jpg)
Some User’s StoriesName Type Activities Tools
The Canadian Longitudinal Study on Aging (CLSA)
Single study50,000 participants
Collection, management, portal
The Canadian Partnership for tomorrow project (CPTP)
Study consortium5 studies, 300,000 participants
Collection, harmonization, portal
BBMRI-LPC Network >30 studies Cataloguing
Maelstrom Research Research project Cataloguing, harmonization
Interconnect NetworkCataloguing, (harmonization, federated data analysis)
BioSHaRE NetworkCataloguing, harmonization, federated data analysis DataSHIELD
Onyx
OpalMica
Onyx
OpalMica
Mica
OpalMica
Mica
OpalMica
![Page 4: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/4.jpg)
1 - Data Harmonization with OpalThe Canadian Partnership for Tomorrow Project (CPTP)
5 cohorts with baseline data on ~ 300,000 participants• 5 Different legislations, questionnaires, data access
policies, languages, etc. Project’s objectives
• To create harmonized datasets across the 5 cohorts• To create a data portal to browse harmonized datasets
and request access to themPhase 1
The baseline Health and Risk Factorquestionnaire (CoreQx)• 716 harmonized variables
![Page 5: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/5.jpg)
Opal SoftwareA database application for integrating and storing data from multiple and heterogeneous sources
•Used by studies to create central data repositories
![Page 6: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/6.jpg)
Metadata in Opal Projects -> tables -> variables Tables are defined by a customizable dictionaries in Excel
format Variables are annotated with an arbitrary number of attributes
Controlled vocabularies - Taxonomies - (e.g. ICD-10) Maelstrom Research variable classification
More than 130 terms in 17 classes (e.g. Reproduction, Physical Measures)
Variable Name Attribute Name Attribute Value
Cancer_type Diseases NeoplasmAsthma_ever Diseases Respiratory system (J00-J99)Ever_smoke Question label [EN] Have you ever smoked?
[FR] Avez-vous déjà fumé?Ever_smoke Health
behaviorsTobacco
![Page 7: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/7.jpg)
![Page 8: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/8.jpg)
Data DerivationOpal derive new variables by executing custom JavaScript code
Useful for data validation, curation and harmonisation
User-friendly interfaces for recoding variables
JavaScript API for more advanced derivation
![Page 9: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/9.jpg)
JavaScript code executed by Opal when needed
Derived data is not persisted – Views or Virtual tables
![Page 10: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/10.jpg)
Deriving the CoreQx datasets with Opal
![Page 11: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/11.jpg)
Deriving the CoreQx datasets with Opal
![Page 12: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/12.jpg)
Deriving the CoreQx datasets with Opal
How to query and access these harmonized datasets?
![Page 13: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/13.jpg)
The Mica Software Software to create web data portals for individual studies or for study consortiaStudy catalogue• MR Standard description of
longitudinal studies• Publication workflow
Datasets• Data dictionaries, data
harmonization, • database federation
Data Access• Online forms, requests
management workflow with roles
Data Persistence
MongoDB
Opal Server
Mica Server
Mica2New client-server
architecture
![Page 14: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/14.jpg)
The CPTP Data Portal
![Page 15: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/15.jpg)
Study Catalogue
![Page 16: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/16.jpg)
![Page 17: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/17.jpg)
Querying Opal Servers for Metadata and Aggregated Data
![Page 18: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/18.jpg)
Dictionary Faceted Search
![Page 19: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/19.jpg)
Variable Page
Real time summary statistics
![Page 20: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/20.jpg)
Harmonization Result
![Page 21: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/21.jpg)
Data Access Requests
Researcher account registration
Customized application form Application review workflow Email notifications Multi-languages
![Page 22: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/22.jpg)
2 - Advanced Cataloguing with MicaMaelstrom-research.org
Maelstrom Research web site is powered by Mica Includes a catalogue of international networks and studies with annotated dictionaries
Current version • 6 Networks• 129 Studies• 222 datasets• 182,622 Variables
![Page 23: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/23.jpg)
Search Harmonisation Potential
![Page 24: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/24.jpg)
Multi-dimensional Search Tool
![Page 25: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/25.jpg)
3- Data AnalysisThe BioSHaRE Healthy Obese Project
10 studies from 7 European countries
200,000 subjects The HOP dataset - 103
harmonized variables
How to analyze these datasets
» without pooling data » without accessing
individual-level data?
![Page 26: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/26.jpg)
A Federated Approach
![Page 27: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/27.jpg)
Real Time Cross Tabulation on Harmonized Data
![Page 28: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/28.jpg)
New Improved Version
![Page 29: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/29.jpg)
Real Time Advanced Queries on Harmonized Data
![Page 30: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/30.jpg)
More Advanced Analyses with R
![Page 31: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/31.jpg)
R Studio Web Consolerstudio.bioshare.eu
![Page 32: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/32.jpg)
More Information
www.maelstrom-research.org www.obiba.org Code available at github.com/obiba
Let us know and acknowledge Maelstrom Research if you are using our software, it’s important for our funding and our ability to provide support
![Page 33: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research](https://reader036.vdocuments.us/reader036/viewer/2022062523/58f2b4191a28ab6d1f8b4585/html5/thumbnails/33.jpg)
Acknowledgement
Yannick Marcon and his software developer teamThe Maelstrom Research scientific team
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n°261433 (Biobank Standardisation and Harmonisation for Research Excellence in the European Union - BioSHaRE-EU)