Barcelona
Annual Conference
Monday, 10th October 2016
Semantics 101 for Pharma
Tim Williams,
UCB Biosciences Inc., USA
Marc Andersen
StatGroup ApS, Denmark
37
Everything has a unique, linkable reference.
38
Resource Description Framework (RDF)
• Semantic Web
• Clinical Trials Context
• Querying
• Creating
• Data Cubes Use Case
39
Explore a Studyhttps://www.clinicaltrials.gov/ “Evaluation of Efficacity and Safety of
Oseltamivir and Zanamivir”
Without knowing anything about Triples!
40
Find the NCTID
41
Explore NCTID Linked Datahttp://lod.openlinksw.com/describe/?uri=http://bio2rdf.org/clinicaltrials:NCT00799760
42
type Clinical Study
NCT00799760Evaluation of Efficacity and Safety
of Oseltamivir and Zanamivir
phase
condition
Phase 3
Gastric Influenza
http://bio2rdf.org/clinicaltrials_resource:f773736eaf3a1da739bc23f48dae6954
http://bio2rdf.org/clinicaltrials/NCT00799760
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://bio2rdf.org/clinicaltrials_vocabulary:Clinical-Study
http://bio2rdf.org/clinicaltrials/NCT00799760
http://bio2rdf.org/clinicaltrials_vocabulary:phase
http://bio2rdf.org/clinicaltrials_resource:8357418e2694434468870b487644532d
http://bio2rdf.org/clinicaltrials/NCT00799760
http://bio2rdf.org/clinicaltrials_vocabulary:condition
Phase 3 Code
Gastric Influenza Code
Subject Predicate Object
43
type Clinical Study
NCT00799760Evaluation of Efficacity and Safety
of Oseltamivir and Zanamivir
phase
condition
Phase 3
Gastric Influenza
ns3:f773736eaf3a1da739bc23f48dae6954
ns1:NCT00799760
rdf:type ns2:Clinical-Study
ns2:phase
Phase 3 Code
Gastric Influenza Code
Subject Predicate Object
@prefix ns1: <http://bio2rdf.org/clinicaltrials:> .@prefix ns2: <http://bio2rdf.org/clinicaltrials _vocabulary:> .@prefix ns3: <http://bio2rdf.org/clinicaltrials_resource:>.@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# >.
ns3:8357418e2694434468870b487644532d
ns3:condition
Terse Triple Language
44
Native• 4Store http://www.4store.org/
• AllegroGraph http://franz.com/agraph/allegrograph/
• Apache Jena TDB http://jena.apache.org/
• GraphDB http://ontotext.com/products/graphdb/
• MarkLogic http://www.marklogic.com
DBMS-backed• Apache Jena SDB http://jena.apache.org/
• Oracle Spatial and Graph http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/rdfse
mantic-graph-1902016.html
Hybrid Sesame http://rdf4j.org/
Virtuoso http://virtuoso.openlinksw.com/
List at the W3C: https://www.w3.org/2001/sw/wiki/Category:Triple_Store
Storing RDF: Triple Stores
Adapted from Dr. Harold StackKnowledge Engineering with Semantic Web Technologies 2015
45
Resource Description Framework (RDF)
• Semantic Web
• Clinical Trials Context
• Querying
• Creating
• Data Cubes Use Case
46
• SPARQL – SPARQL Protocol
And RDF Query Language
• Not limited to RDF
– Utilities for relational database, spreadsheets, XML, JSON
• Protocol
– Rules for queries and results exchange
Query RDF with SPARQL
47
DataQuery
ns3:title
ns2:primary-outcome
ns1:NCT00799760 ?outURI
?outcome
SELECT ?outcome
"RT-PCR for influenza A virus…"@en ;
Graph Path for Primary Outcome
48
ns2:primary-outcome
Graph Path for Primary Outcome
ns1:NCT00799760 ns3:d821848f0fb8dc44f390a40e066e9224
ns4:title
“T-PCR for influenza A virus in nasal secretion 2 days”@en
Primary Outcome URI
Primary Outcome title ashuman readable code list value
49
PREFIX ns1: <http://bio2rdf.org/clinicaltrials:>
PREFIX ns2: <http://bio2rdf.org/clinicaltrials_vocabulary:clinicaltrials_vocabulary:>
PREFIX ns3: <http://purl.org/dc/terms/>
SELECT ?outcome
WHERE
{
ns1:NCT00799760 ns2:primary-outcome ?outURI .
?outURI ns3:title ?outcome .
} Retrieve data that matches the Graph Pattern
NCTID ?outURIprimary-outcome title
?outcome
SPARQL Query for Primary Outcome
Try it at: http://lod.openlinksw.com/sparql
50
Query using:
• SPARQL Endpoint
– Example: lod.openlinksw.com/sparql
• R with package rrdf - see exercises
• SAS macro, PROC GROOVY- see exercises
See Exercises
51
Query with RR Packages:• rrdf• rrdflibs
http://github.com/egonw/rrdf
Requires Java 7 or higher
rrdf, rrdflibs
Willighagen E. (2014) Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v3See https://dx.doi.org/10.7287/peerj.preprints.185v3
52
Query an Endpoint with R
library(rrdf)
endpoint = "http://localhost:3030/test/query"
query = "SELECT * WHERE {?s ?p ?o . } LIMIT 10 "
queryResult = sparql.remote(endpoint, query)
queryResult
See Exercises
53
Query with SASSAS Macros:%sparqlquery - SPARQL query%sparqlupdate - SPARQL update
https://github.com/MarcJAndersen/SAS-SPARQLwrapper
Implementation:• SAS PROC HTTP to access the
service • Send query/update as text file• Input result using SAS LIBNAME
for XML
Other approaches: • PROC groovy to execute Java Code
fromApache Jena (see directory show-res-sasin https://github.com/MarcJAndersen/poc-analysis-results-metadata)
• SAS Java objects to interface to Apache Jena
Requires running SPARQL service, for example Apache Jena
See Exercises
54
Query a Remote SourceAt: http://lod.openlinksw.com/sparql
55
Which variables are used?
Given: CSR appendix 14. 1 as RDF data cubes
SPARQL query using property paths:
select distinct ?columnwhere { ?ds a qb:DataSet ;
(<>|!<>)*/rrdfqbcrnd0:D2RQ-PropertyBridge/^d2rq:property/d2rq:column ?column .}order by ?ds
56
Details (1/2)select distinct ?column
where {
{
?ds a qb:DataSet ;
qb:structure ?structure.
?structure qb:component ?component .
?component qb:dimension ?dimension .
?dimension qb:codeList ?codeList .
?codeList rrdfqbcrnd0:DataSetRefD2RQ ?DataSetRefD2RQ .
?DataSetRefD2RQ rrdfqbcrnd0:D2RQ-PropertyBridge ?D2RQPropertyBridge .
?Correctd2rqPropertyBridge d2rq:property ?D2RQPropertyBridge ;
d2rq:column ?column .
}
(continued on next slide)
57
(continued from previous slide)union {?ds a qb:DataSet ;
qb:structure ?structure.
?structure qb:component ?component .?component qb:dimension ?dimension .?dimension qb:codeList ?codeList .?codeList skos:hasTopConcept ?codeValue .?codeValue rrdfqbcrnd0:DataSetRefD2RQ ?DataSetRefD2RQ .?DataSetRefD2RQ rrdfqbcrnd0:D2RQ-PropertyBridge ?D2RQPropertyBridge .
?Correctd2rqPropertyBridge d2rq:property ?D2RQPropertyBridge ;d2rq:column ?column .
}}
58
Which data are used for a result?select ?s ?obs
where {
?s ?variable ?value .
{select
(iri(concat('http://www.example.org/datasets/vocab/',
replace(str(?vnop),'http://www.example.org/rrdfqbcrnd0/([A-Z0-9_]+)$', '$1', 'i' ))) as
?variable)
?matchvalue
?obs
where {
?obs ?dim ?codevalue .
?dim a qb:DimensionProperty .
?codelist skos:hasTopConcept ?codevalue .
?codelist rrdfqbcrnd0:DataSetRefD2RQ ?vnop .
?codelist rrdfqbcrnd0:R-columnname ?vn .
?codelist rrdfqbcrnd0:codeType ?vct .
?codevalue skos:prefLabel ?clprefLabel .
?codevalue rrdfqbcrnd0:R-selectionoperator '==' .
?codevalue rrdfqbcrnd0:R-selectionvalue ?matchvalue.
values (?obs) { (ds:obs223) }
}
}
BIND(IF(?value!=?matchvalue,1,0) AS ?notequal)
}
group by ?s ?obs
having(SUM(?notequal)=0)
order by ?s
59
Federated Query: Join data across sources
LINK
60
61
More SPARQL
SPARQL Query Language for RDF https://www.w3.org/TR/rdf-sparql-query/
SPARQL 1.1 Query Language https://www.w3.org/TR/sparql11-query/
“Learning SPARQL” - Bob DuCharme
http://www.learningsparql.com/index.html - examples for download