2016 acs semantic approaches for biochemical knowledge discovery
TRANSCRIPT
Semantic Approachesfor Biochemical Knowledge Discovery
1
Michel Dumontier, Ph.D.
Associate Professor of Medicine (Biomedical Informatics)Stanford University
@micheldumontier::ACS:15-03-2016
Science!
@micheldumontier::ACS:15-03-20162
3 @micheldumontier::ACS:15-03-2016
Most published research findings are false.- John Ioannidis, Stanford University
4 @micheldumontier::ACS:15-03-2016
Science is hard.
Scientific knowledge is growing at an unprecedented rate
5 @micheldumontier::ACS:15-03-2016
Reusing raw and curated data in thousands of databases is challenging: identifiers, formats, access methods, links
6 @micheldumontier::ACS:15-03-2016
Various software are needed to analyze data(problems: OS, versioning, input/output formats)
7 @micheldumontier::ACS:15-03-2016
Ultimately, scientists develop fairly sophisticated programs/workflows to test hypotheses
8 @micheldumontier::ACS:15-03-2016
The absence of intelligent systems
requires vast amounts of experience and technical expertise
@micheldumontier::ACS:15-03-20169
How can we automatically find the evidence that support or dispute a scientific hypothesis using the latest data, tools and scientific knowledge?
@micheldumontier::ACS:15-03-201610
So what do we need to achieve this?
1. Data Science Tools and Methods– To identify, represent, interlink, integrate, and query
data and services– To identify and uncover support for known or novel
associations
2. Community Standards to share and interrogate a massive, decentralized network of interconnected data and software
@micheldumontier::ACS:15-03-201611
First, we need FAIR data
Findable– Globally unique identifiers for datasets and the data they contain– Rich set of descriptors to search and filter with– Indexed and searchable
Accessible– Metadata is eternally available.– Identifiers are used to retrieve representations using standard protocols (e.g.
HTTP)
Interoperable– Data represented with formal knowledge representations– Include links to other datasets/vocabularies
Reusable– Licensing, Provenance, Community standards
@micheldumontier::ACS:15-03-201612
“Numbers have no way of speaking for themselves. We need to imbue them with meaning.” - Nate Silver, The signal and the noise
@micheldumontier::ACS:15-03-201613
FAIR: Findable, Accessible, Interoperable, Re-usable
See paper for motivation and examples
We are now starting to think about quality measures.
The Semantic Webis the new global web of knowledge
14 @micheldumontier::ACS:15-03-2016
standards for publishing, sharing and querying facts, expert knowledge and services
scalable approach for the discoveryof independently formulated
and distributed knowledge
Linked Data is FAIR data
15 @micheldumontier::ACS:15-03-2016Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
@micheldumontier::ACS:15-03-2016
Linked Data for the Life Sciences
16
Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF.
chemicals/drugs/formulations, genomes/genes/proteins, domainsInteractions, complexes & pathwaysanimal models and phenotypesDisease, genetic markers, treatmentsTerminologies & publications
• 11B+ interlinked statements from 35 biomedical datasets
• dataset description, provenance & statistics• A growing interoperable ecosystem with the EBI,
NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers
@micheldumontier::ACS:15-03-201617
Bio2RDF shows how datasets are connected together
@micheldumontier::ACS:15-03-201618
graph methods for data qualityto find mismatches and discover new links
@micheldumontier::ACS:15-03-201619
W Hu, H Qiu, M Dumontier. Link Analysis of Life Science Linked Data. International Semantic Web Conference (2) 2015: 446-462.
Federated Queriesover public SPARQL EndPoints
Get all protein catabolic processes (and more specific GO terms) in biomodels
SELECT ?go ?label count(distinct ?x) WHERE {service <http://bioportal.bio2rdf.org/sparql> {
?go rdfs:label ?label .?go rdfs:subClassOf+ ?tgo?tgo rdfs:label ?tlabel .FILTER regex(?tlabel, "^protein catabolic process")}service <http://biomodels.bio2rdf.org/sparql> {?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .
}}
@micheldumontier::ACS:15-03-201620
EbolaKBUsing Linked Data and Software
@micheldumontier::ACS:15-03-201621
Kamdar, Dumontier. An Ebola virus-centered knowledge base. Database. 2015 Jun 8;2015. doi: 10.1093/database/bav049.
@micheldumontier::ACS:15-03-201622
Network analysis and discovery
Jim McCusker & Deb McGuiness
David Wild, Ying Ding
@micheldumontier::ACS:15-03-201623
HyQue
tactical formalization
@micheldumontier::ACS:15-03-201624
Take what you needand represent it in a way that directly serves your objective
STANDARDSfor broader reuse
APPLICATIONSfor optimized experience
High Quality Metadata are Essential
for Large-Scale Reuse and Biomedical Discovery
25 @micheldumontier::ACS:15-03-2016
Making it Easier, Possibly Even Pleasant, to Author Interoperable Experimental Metadata
26 @micheldumontier::ACS:15-03-2016
smartAPI
The goal is to reduce the barrier for the discovery andreuse of web APIs through richer semantic metadata.
i) a coordinated facility for the intelligent annotation ofsmart APIs
ii) a web application to discover smart APIs and howthey connect to each other.
iii) The augmentation of existing APIs to provide FAIRdata
28 @micheldumontier::ACS:15-03-2016
smartAPI
29
Gene
myGene.infomyVariant.info
Linking API Data
Web Services
Linked DataCloud
@micheldumontier::ACS:15-03-2016
Evan’s Questions
• What should we be doing now?– Encouraging researchers to publish FAIR data and
services• How should we be doing it?
– As Linked Data – Institutional repositories and available in wikidata and
other aggregators• Where are things going in the future?
– Reproducible analyses over indexed, archived, and massively connected knowledge graphs
@micheldumontier::ACS:15-03-201630
Website: http://dumontierlab.comPresentations: http://slideshare.com/micheldumontier
31 @micheldumontier::ACS:15-03-2016