semantics 101 for pharma - phuse...barcelona annual conference monday, 10th october 2016 semantics...

26
Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA [email protected] Marc Andersen StatGroup ApS, Denmark [email protected]

Upload: others

Post on 11-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

Barcelona

Annual Conference

Monday, 10th October 2016

Semantics 101 for Pharma

Tim Williams,

UCB Biosciences Inc., USA

[email protected]

Marc Andersen

StatGroup ApS, Denmark

[email protected]

Page 2: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

37

Everything has a unique, linkable reference.

Page 3: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

38

Resource Description Framework (RDF)

• Semantic Web

• Clinical Trials Context

• Querying

• Creating

• Data Cubes Use Case

Page 4: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

39

Explore a Studyhttps://www.clinicaltrials.gov/ “Evaluation of Efficacity and Safety of

Oseltamivir and Zanamivir”

Without knowing anything about Triples!

Page 5: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

40

Find the NCTID

Page 6: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

41

Explore NCTID Linked Datahttp://lod.openlinksw.com/describe/?uri=http://bio2rdf.org/clinicaltrials:NCT00799760

Page 7: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

42

type Clinical Study

NCT00799760Evaluation of Efficacity and Safety

of Oseltamivir and Zanamivir

phase

condition

Phase 3

Gastric Influenza

http://bio2rdf.org/clinicaltrials_resource:f773736eaf3a1da739bc23f48dae6954

http://bio2rdf.org/clinicaltrials/NCT00799760

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

http://bio2rdf.org/clinicaltrials_vocabulary:Clinical-Study

http://bio2rdf.org/clinicaltrials/NCT00799760

http://bio2rdf.org/clinicaltrials_vocabulary:phase

http://bio2rdf.org/clinicaltrials_resource:8357418e2694434468870b487644532d

http://bio2rdf.org/clinicaltrials/NCT00799760

http://bio2rdf.org/clinicaltrials_vocabulary:condition

Phase 3 Code

Gastric Influenza Code

Subject Predicate Object

Page 8: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

43

type Clinical Study

NCT00799760Evaluation of Efficacity and Safety

of Oseltamivir and Zanamivir

phase

condition

Phase 3

Gastric Influenza

ns3:f773736eaf3a1da739bc23f48dae6954

ns1:NCT00799760

rdf:type ns2:Clinical-Study

ns2:phase

Phase 3 Code

Gastric Influenza Code

Subject Predicate Object

@prefix ns1: <http://bio2rdf.org/clinicaltrials:> .@prefix ns2: <http://bio2rdf.org/clinicaltrials _vocabulary:> .@prefix ns3: <http://bio2rdf.org/clinicaltrials_resource:>.@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# >.

ns3:8357418e2694434468870b487644532d

ns3:condition

Terse Triple Language

Page 9: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

44

Native• 4Store http://www.4store.org/

• AllegroGraph http://franz.com/agraph/allegrograph/

• Apache Jena TDB http://jena.apache.org/

• GraphDB http://ontotext.com/products/graphdb/

• MarkLogic http://www.marklogic.com

DBMS-backed• Apache Jena SDB http://jena.apache.org/

• Oracle Spatial and Graph http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/rdfse

mantic-graph-1902016.html

Hybrid Sesame http://rdf4j.org/

Virtuoso http://virtuoso.openlinksw.com/

List at the W3C: https://www.w3.org/2001/sw/wiki/Category:Triple_Store

Storing RDF: Triple Stores

Adapted from Dr. Harold StackKnowledge Engineering with Semantic Web Technologies 2015

Page 10: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

45

Resource Description Framework (RDF)

• Semantic Web

• Clinical Trials Context

• Querying

• Creating

• Data Cubes Use Case

Page 11: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

46

• SPARQL – SPARQL Protocol

And RDF Query Language

• Not limited to RDF

– Utilities for relational database, spreadsheets, XML, JSON

• Protocol

– Rules for queries and results exchange

Query RDF with SPARQL

Page 12: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

47

DataQuery

ns3:title

ns2:primary-outcome

ns1:NCT00799760 ?outURI

?outcome

SELECT ?outcome

"RT-PCR for influenza A virus…"@en ;

Graph Path for Primary Outcome

Page 13: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

48

ns2:primary-outcome

Graph Path for Primary Outcome

ns1:NCT00799760 ns3:d821848f0fb8dc44f390a40e066e9224

ns4:title

“T-PCR for influenza A virus in nasal secretion 2 days”@en

Primary Outcome URI

Primary Outcome title ashuman readable code list value

Page 14: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

49

PREFIX ns1: <http://bio2rdf.org/clinicaltrials:>

PREFIX ns2: <http://bio2rdf.org/clinicaltrials_vocabulary:clinicaltrials_vocabulary:>

PREFIX ns3: <http://purl.org/dc/terms/>

SELECT ?outcome

WHERE

{

ns1:NCT00799760 ns2:primary-outcome ?outURI .

?outURI ns3:title ?outcome .

} Retrieve data that matches the Graph Pattern

NCTID ?outURIprimary-outcome title

?outcome

SPARQL Query for Primary Outcome

Try it at: http://lod.openlinksw.com/sparql

Page 15: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

50

Query using:

• SPARQL Endpoint

– Example: lod.openlinksw.com/sparql

• R with package rrdf - see exercises

• SAS macro, PROC GROOVY- see exercises

See Exercises

Page 16: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

51

Query with RR Packages:• rrdf• rrdflibs

http://github.com/egonw/rrdf

Requires Java 7 or higher

rrdf, rrdflibs

Willighagen E. (2014) Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v3See https://dx.doi.org/10.7287/peerj.preprints.185v3

Page 17: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

52

Query an Endpoint with R

library(rrdf)

endpoint = "http://localhost:3030/test/query"

query = "SELECT * WHERE {?s ?p ?o . } LIMIT 10 "

queryResult = sparql.remote(endpoint, query)

queryResult

See Exercises

Page 18: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

53

Query with SASSAS Macros:%sparqlquery - SPARQL query%sparqlupdate - SPARQL update

https://github.com/MarcJAndersen/SAS-SPARQLwrapper

Implementation:• SAS PROC HTTP to access the

service • Send query/update as text file• Input result using SAS LIBNAME

for XML

Other approaches: • PROC groovy to execute Java Code

fromApache Jena (see directory show-res-sasin https://github.com/MarcJAndersen/poc-analysis-results-metadata)

• SAS Java objects to interface to Apache Jena

Requires running SPARQL service, for example Apache Jena

See Exercises

Page 19: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

54

Query a Remote SourceAt: http://lod.openlinksw.com/sparql

Page 20: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

55

Which variables are used?

Given: CSR appendix 14. 1 as RDF data cubes

SPARQL query using property paths:

select distinct ?columnwhere { ?ds a qb:DataSet ;

(<>|!<>)*/rrdfqbcrnd0:D2RQ-PropertyBridge/^d2rq:property/d2rq:column ?column .}order by ?ds

Page 21: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

56

Details (1/2)select distinct ?column

where {

{

?ds a qb:DataSet ;

qb:structure ?structure.

?structure qb:component ?component .

?component qb:dimension ?dimension .

?dimension qb:codeList ?codeList .

?codeList rrdfqbcrnd0:DataSetRefD2RQ ?DataSetRefD2RQ .

?DataSetRefD2RQ rrdfqbcrnd0:D2RQ-PropertyBridge ?D2RQPropertyBridge .

?Correctd2rqPropertyBridge d2rq:property ?D2RQPropertyBridge ;

d2rq:column ?column .

}

(continued on next slide)

Page 22: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

57

(continued from previous slide)union {?ds a qb:DataSet ;

qb:structure ?structure.

?structure qb:component ?component .?component qb:dimension ?dimension .?dimension qb:codeList ?codeList .?codeList skos:hasTopConcept ?codeValue .?codeValue rrdfqbcrnd0:DataSetRefD2RQ ?DataSetRefD2RQ .?DataSetRefD2RQ rrdfqbcrnd0:D2RQ-PropertyBridge ?D2RQPropertyBridge .

?Correctd2rqPropertyBridge d2rq:property ?D2RQPropertyBridge ;d2rq:column ?column .

}}

Page 23: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

58

Which data are used for a result?select ?s ?obs

where {

?s ?variable ?value .

{select

(iri(concat('http://www.example.org/datasets/vocab/',

replace(str(?vnop),'http://www.example.org/rrdfqbcrnd0/([A-Z0-9_]+)$', '$1', 'i' ))) as

?variable)

?matchvalue

?obs

where {

?obs ?dim ?codevalue .

?dim a qb:DimensionProperty .

?codelist skos:hasTopConcept ?codevalue .

?codelist rrdfqbcrnd0:DataSetRefD2RQ ?vnop .

?codelist rrdfqbcrnd0:R-columnname ?vn .

?codelist rrdfqbcrnd0:codeType ?vct .

?codevalue skos:prefLabel ?clprefLabel .

?codevalue rrdfqbcrnd0:R-selectionoperator '==' .

?codevalue rrdfqbcrnd0:R-selectionvalue ?matchvalue.

values (?obs) { (ds:obs223) }

}

}

BIND(IF(?value!=?matchvalue,1,0) AS ?notequal)

}

group by ?s ?obs

having(SUM(?notequal)=0)

order by ?s

Page 25: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

60

Page 26: Semantics 101 for Pharma - PhUSE...Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc

61

More SPARQL

SPARQL Query Language for RDF https://www.w3.org/TR/rdf-sparql-query/

SPARQL 1.1 Query Language https://www.w3.org/TR/sparql11-query/

“Learning SPARQL” - Bob DuCharme

http://www.learningsparql.com/index.html - examples for download