the elixir of linked data - open · pdf filethe elixir of linked data ... (and elasticsearch)...
TRANSCRIPT
![Page 1: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/1.jpg)
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
The ELIXIR of Linked DataProfessor Carole Goble (UK node)
Barend Mons (NL node) , Helen Parkinson (EMBL-EBI node)
The Interoperability Services Backbone Team
![Page 2: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/2.jpg)
What is ELIXIR?
An international distributed infrastructure for life-science information
orchestrate the collection, quality control and archiving of biological data produced
by life science experiments.
integrate research data
ensure a seamless service provision that is easily accessible to all.
http://www.elixir-europe.org/about
![Page 3: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/3.jpg)
ELIXIR: An international distributed infrastructure for biological data
Hub
major bioinformaticsservice providers (~130) 16 ELIXIR members
4 observers
![Page 4: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/4.jpg)
Drivers: Infrastructure Providers
COordinated Research Infrastructures Building Enduring Life-science Services
Marine metagenomics
Human data
Crop and forest plants
Rare diseases
![Page 5: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/5.jpg)
Rare diseases
Genomic data
(WES, WGS)
Other omics data
(transcriptomics,
metabolomics,
proteomics …)
Sample data
(biobank
databases)
Clinical data
(registries, and
phenotypic databases)
1000 exomes1000 exomes
+ > 2500 from other projects
![Page 6: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/6.jpg)
Drug prioritization for Huntington’s DiseaseKaterina Nosikova, Elizaveta Besedina, Eelke van der Horst, Peter-Bram ‘t Hoen, Marco Roos, Eleni Mina, Human Genetics department, LUMC, NL
8
Select
genes by
phenotype
matching
in Monarch
Select drug
compounds in
Open PHACTS
Filter on
feasibility for
treating HD
Prioritized
drug
compounds
![Page 7: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/7.jpg)
What is ELIXIR?
![Page 8: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/8.jpg)
Technical platforms
Data
Tools
Compute
Training
Secure and deliver core data resources
Discoverable tools, services and connectors for data access and exploitation
Robust technical platforms and clouds for secure data access, data exchange and compute
Training programme for professionals, bridging the computational biology skills gap
Standards Data management, reuse and integration
Findable Accessible Interoperable Reusable
![Page 9: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/9.jpg)
Training: BYODs, data wrangling, governance and quality assurance
Linked Data experts,
data experts from
MycoBase and
Human Protein Atlas
http://www.macs.hw.ac.uk/~ajg33/first-byod-workshop/
Tomato genome, phenotypic
observations, variants
![Page 10: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/10.jpg)
ImpactScientific focus
Indicators
Scientific
impact
Community
Legal &funding
infrastructure
Quality
Data: Basket of indicators, reflecting the multiple facets of bioinformatics resources
1) Scientific focus and quality of sciencee.g. curational effort, benchmarking
2) Community served by the resourcee.g. web statistics
3) Quality of servicee.g. uptime, user support and training
4) Legal and funding infrastructuree.g. institutional support, use policy
5) Impact and translational stories
Mandatory and optional
![Page 11: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/11.jpg)
![Page 12: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/12.jpg)
Compute Platform: Authentication, Archiving and Movement
![Page 13: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/13.jpg)
Tools Interoperability and APIs
Describing Tools
EDAM Ontology
Describing Workflows
Common format for bioinformatics tool execution
http://commonwl.org/
Rich: Linked Data allows for infinite metadata annotations and reasoning
SWAGGER.json
Describing APIs
API changes Semantic versioningGetting resources to have APIs
![Page 14: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/14.jpg)
[Luiz Olavo Bonino, DTL] RD-CONNECT, ODEXA4ALL
A FAIRifying Architecture
![Page 15: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/15.jpg)
Warehouses
Preparing SourcesOn boardingDatasets, Content, API
Access fromIntegratingFrameworks
InteroperabilityServices:Identifiers, Ontologies, Schemas.
API
FAIR Interoperability Backbone ServicesPrepare for interop
![Page 16: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/16.jpg)
• Various species: maize,
pine, potato…
• Various data types: from
genomes (sequences and annotations) to phenomes (traits)
• Various ontologies: Crop
Ontology, Plant Ontology…
• Emerging standards: MIAPPE (Minimum Information on Plant Phenotyping Experiment)
Need for infrastructureo Manage identifiers o Register/access
services and data sets
o Metadata driven search
© Paul Kersey
Crop and forest plants
![Page 17: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/17.jpg)
Ontology ServicesOntology mappingData-Ontology Tools
OLS3
![Page 18: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/18.jpg)
Identifiers – the pivot of everything!
Identifier Mapping Service (IMS)
Identifier Resolution Service (IRS2)
![Page 19: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/19.jpg)
FAIR Metadata at many levels
Tool that provisioned the dataset
Dataset Collection
Dataset Profile
Data recordcontent
mappingsbetween entities
mappingsbetween datasets
Interface API and Access
Tool using the dataset
![Page 20: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/20.jpg)
What is ELIXIR?
![Page 21: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/21.jpg)
Metadata Profiles and Dataset RegistrationGovernance, Compliance, Release Protocols
Dataset Profile
![Page 22: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/22.jpg)
DataDiabetic nephropathy (EFO_0000401)
Data
BioSolr
(and Elasticsearch)
Search, Index and Linked Data
![Page 23: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/23.jpg)
Biological knowledge bases
Curated and annotated biological entities and their
relationships
Uniprot, Ensembl, ChEMBL, Orphanet
Two tiers of data repository
![Page 24: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/24.jpg)
Two tiers of data repository
Biological knowledge bases
Curated and annotated biological entities and their
relationships
Uniprot, Ensembl, ChEMBL, Orphanet
data records are dynamic and incomplete
records update, diverge, merge
over time, interpretation
changes
identifier resolution varies over time –
relationships between records are
unstable
“reproducibility” potentially
compromised
a novel gene-rare disease relationship is reported
consequences of a single nucleotide change in a regulatory genomic region is better understood.
![Page 25: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/25.jpg)
Legacy of Open PHACTS. Mappings are first class.
Data recordcontent
mappingsbetween entities
linksets
provenance, versioning, mappinglinksets
![Page 26: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/26.jpg)
VoID – Vocabulary of Interlinked Datasets
• Create description of a Linkset that connects two datasets.
• Select datasets from existing descriptions.
• Capture link predicate and justification
![Page 27: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/27.jpg)
Legacy of Open PHACTS.Releasing Data Sets: Software-Like Research ObjectsLinked Data Manifests
“Publishing data the software way”
Controlled data Distribution
ContainersBuilds
DependenciesVersioningVerification
data-maven-plugin
Docker
![Page 28: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/28.jpg)
Genotype-Phenotype
Genotype-Phenotype
Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, et al. (2015) Finding Our Way through Phenotypes. PLoS Biol 13(1):
e1002033. doi:10.1371/journal.pbio.1002033
http://journals.plos.org/plosbiology/article?id=info:doi/10.1371/journal.pbio.1002033
Mapping terms
Cross linking datasets
Tracking provenance
Linked Data Services
![Page 29: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/29.jpg)
Publishing FAIR Data
Interoperating Applications
InteroperabilityBackbone
Interoperability Services Backbone
![Page 30: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/30.jpg)
Linked Data – Big Picture• lower the barriers to linking data
• connect related data that wasn't previously linked
• self-describe and annotate data in a common, machine readable form
• expose linking as a first class information element
“a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.“ Wikipedia
![Page 31: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/31.jpg)
Impact of Open PHACTS on ELIXIR Linked Data
Components & Know-how
• Identifiers & Links
• Annotation & Ontologies
• Dataset Containers
• Integrate into off the shelf apps
Publishing and Consuming
• Metadata & Mappings
• On boarding & Release pipelines
• APIs, Search
Data …….when it supports interoperability….retain native forms ….preparation and maintenance….data governance…..
![Page 32: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/32.jpg)
Challenges of Linked Data
Getting data providers to generate LOD
Getting agreement on URIs
Choosing ontologies and relations
Modelling challenges (data vs biological reality)
Appropriate Extract/Load/Transform pipelines
Appropriate representation for datatypes
Getting machine readable dataset descriptions
Expertise in the community to effectively produce/consume LD
Services for finding and reusing URIs & ontologies
Data annotation services (mapping data to ontologies)
Provide an API
Link resources to ontology terms
SPARQL fetish
![Page 33: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/33.jpg)
[Mons]
![Page 34: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/34.jpg)
What is ELIXIR?
![Page 35: The ELIXIR of Linked Data - Open · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD](https://reader033.vdocuments.us/reader033/viewer/2022051405/5aa245977f8b9a436d8cbdd3/html5/thumbnails/35.jpg)
Human data: The European Genome-phenome Archive EGA