semantics 101 for pharma - phuse wiki · 2016-10-17 · monday, 10th october 2016 semantics 101 for...
TRANSCRIPT
Barcelona
Annual Conference
Monday, 10th October 2016
Semantics 101 for Pharma
Tim Williams,
UCB Biosciences Inc., USA
Marc Andersen
StatGroup ApS, Denmark
101
Related PhUSE 2016 Presentations• Interactive Visualization of Linked Data
Monday, 14:30 Data Visualization
• Generating Analysis Results and MetadataMonday, 16:00 Trends and Technology
• Constructing Interoperable Study Documents From A Semantic Technology-based Repository
Poster
• CS Discussion ClubTuesday 11:00 – 12:30
102
103
Thank you
and
Enjoy the Conference!
104
Learning Resources• PhUSE Wiki “Semantic Technology Working Groups”
http://www.phusewiki.org/wiki/index.php?title=Semantic_Technology
• PhUSE Wiki “Semantic Technology Curriculum” http://www.phusewiki.org/wiki/index.php?title=Semantic_Technology_Curriculum
• White papers, publications, presentations.
• “Learning SPARQL” by Bob DuCharmehttp://www.learningsparql.com/index.html - examples for download
• Semantic University by Cambridge Semanticshttp://www.cambridgesemantics.com/semantic-university
• RDF Primerhttp://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/
• CDISC Standards in RDF User Guide v1 Final
http://www.cdisc.org/system/files/members/standard/RDF/CDISC%20Standards%20RDF%20User%20Guide%201.0%20Final%202015-07-21.pdf
• Knowledge Engineering with Semantic Web Technologies 2015 https://open.hpi.de/courses/semanticweb2015
105
Exercises
Due to time constraints and the large number of attendees, we were unable to provide hands-on experience during the session. This section provides exercises and a link to materials so you may try creating and querying Linked Data on your own.
To obtain files for the exercises, go to:http://www.phusewiki.org/wiki/index.php?title=Semantic_Technology_Curriculum
Download the file: PhUSECSS-Semantics101-AttendeeFiles.zip
106
Introduction to Jena Fuseki
• Apache-Jena – contains the APIs, SPARQL engine, the TDB native RDF database and command line tools
ARQ, RIOT …• Apache-Jena-Fuseki – the Jena SPARQL
server
107
Load a File into Fuseki• File: ex001.ttl
@prefix css: <http://www.example.org/CSS/> .
@prefix ct: <http://bio2rdf.org/clinicaltrials/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ct:NCT00799760 css:title "Evaluation of Efficacity…"@en ;
css:phase "Phase 3"@en ;
css:enrollment "541"^^xsd:int .
Instructions sent to attendees/available on wiki
108
Query #1: Getting StartedSee
Exercises
File: ex002.rq
PREFIX css: <http://www.example.org/CSS/>
SELECT *
WHERE{
?s ?p ?o .
} LIMIT 10
109
PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?nctid ?title
WHERE{
?nctid css:title ?title .
}
ct:NCT00799760 css:title "Evaluation of Efficacity and Safety…”@en ;
S
Query #2: Graph Pattern for Title
Query
PData
O
?nctidcss:title
?title
110
Query for Study TitleFile: ex003.rq
PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?nctid ?title
WHERE{
?nctid css:title ?title .
}
See Exercises
111
Upload another fileFile: ex004.TTL
@prefix css: <http://www.example.org/CSS/> .
@prefix ct: <http://bio2rdf.org/clinicaltrials/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ct:NCT00799760 css:title "Evaluation of Efficacity …”@en ;
css:phase "Phase 3"@en ;
css:enrollment "541"^^xsd:integer ;
css:primOutcome css:outcome1 .
css:outcome1 rdf:type ct:primary-outcome;
ct:measure "RT-PCR for influenza A virus…"@en ;
ct:time-frame "2 days".
See Exercises
112
css:title "Evaluation of Efficacity …”@en ;
css:phase "Phase 3"@en ;
css:enrollment "541"^^xsd:integer ;
css:outcome1 rdf:type ct:primary-outcome;
css:primOutcome css:outcome1.
ct:NCT00799760
"RT-PCR for influenza A virus…"@en ;ct:measure
ct:time-frame
Graph Query
ct:NCT00799760 ?outURIcss:primOutcome
Query for Primary Outcome
"2 days".
Data
?outURIct:measure
?outcome
113
SPARQL Query PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?outcome
WHERE
{
ct:NCT00799760 css:primOutcome ?outURI .
?outURI ct:measure ?outcome .
}
Retrieve data that matches the Graph Pattern
NCTID ?outURIprimOutcome measure
?outcome
114
Query for Study Outcome
PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?outcome
WHERE{
ct:NCT00799760 css:primOutcome ?outURI .
?outURI ct:measure ?outcome . }
File: ex005.rq
See Exercises
115
ns1:NCT00799760 rdf:type ns2:Resource ,
ns2:Clinical-Study .
ns1:NCT00799760 ns3:title "Evaluation of Efficacity and Safety
of Oseltamivir and Zanamivir"@en .
ns2:actual-enrollment 541 ;
…AND MUCH MORE….
Trial Triples with SPARQLhttp://lod.openlinksw.com/sparql
DESCRIBE <http://bio2rdf.org/clinicaltrials:NCT00799760>
116
Query for Study Outcome
PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?outcome
WHERE{
ct:NCT00799760 css:primOutcome ?outURI .
?outURI ct:measure ?outcome . }
File: ex005.rq
See Exercises
117
Query with RR Packages:• rrdf• rrdflibs
http://github.com/egonw/rrdf
Requires Java 7 or higher
rrdf, rrdflibs
Willighagen E. (2014) Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v3See https://dx.doi.org/10.7287/peerj.preprints.185v3
118
File: queryLocalTTL.R
library(rrdf)
dataSource = load.rdf(“<path to the TTL file>/ex004.ttl",
format="N3")
query = 'PREFIX css: <http://www.example.org/CSS/>
PREFIX ct: <http://bio2rdf.org/clinicaltrials/>
SELECT ?primaryOutcome
WHERE
{
ct:NCT00799760 css:primOutcome ?outURI .
?outURI ct:measure ?primaryOutcome .
}'
queryResult = as.data.frame(sparql.rdf(dataSource, query))
queryResult
See Exercises
119
Query an Endpoint with R
library(rrdf)
endpoint = "http://localhost:3030/test/query"
query = "SELECT * WHERE {?s ?p ?o . } LIMIT 10 "
queryResult = sparql.remote(endpoint, query)
queryResult
File: queryLocalFuseki.R
See Exercises
120
Query with SASSAS Macros:%sparqlquery - SPARQL query%sparqlupdate - SPARQL update
https://github.com/MarcJAndersen/SAS-SPARQLwrapper
Implementation:• SAS PROC HTTP to access the
service • Send query/update as text file• Input result using SAS LIBNAME
for XML
Other approaches: • PROC groovy to execute Java Code
from Apache Jena• SAS Java objects to interface to Apache
Jena
Requires running SPARQL service, for example Apache Jena
121
File: queryLocalFuseki.sas
Assumptions: • Service active at endpoint• TTL file uploaded to store
122
Query a Remote SourceAt: http://lod.openlinksw.com/sparql
123
Create RDF using R
• R with rrdf, rrdflibs
https://github.com/egonw/rrdf
• R Data frame to RDF
– Excel->data frame-> to RDF
– SAS dataset -> data frame -> RDF
rrdf, rrdflibs
124
Create RDF using R
Packages: rrdf, rrdflibs• add.triple()
– Add a triple :object is a URI
• add.data.triple()
– Add triple: object is a literal
125
Create RDF using R
Try or follow along
File: createTTLFromR.R
Output File: createTTLFromR.TTL
126
Create RDF using SAS
• SAS accessing SPARQL service using PROC HTTP– All functions provided by the service, see SPARQL 1.1
Protocol (https://www.w3.org/TR/sparql11-protocol/)– Implemented as SAS macros
https://github.com/MarcJAndersen/SAS-SPARQLwrapper
• SAS generating text files with– RDF in Turtle– SPARQL INSERT statements
127
Output File:
createTTLFromSAS.TTL
Create RDF using SASFile: createTTLFromSAS.SAS
21
3
Try or follow along
128
Validate• Apache Jena RIOT (RDF I/O Technology)
riot –validate CreateTTLFromEditor.TTL
Example errors1. Forgot PAV prefix
08:45:44 ERROR riot :: line: 9, col: 16] Undefined prefix: pav
2. Incorrect triples termination
08:45:44 ERROR riot :: [line: 9, col: 32] Unexpected IRI
for predicate…
* note: requires Apache Jena in the system path