linking clinical data standards
DESCRIPTION
Presentation material for the CDISC Interchange Europe 2011, conference in Brussels, for the track "eHR and the World Beyond", 13 AprilTRANSCRIPT
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Presented by Kerstin Forsberg
AstraZeneca R&D, Clinical Information Strategy
kerfors on Twitter, LinkedIn, SlideShare, Blogspot, citulike
1
Linking Clinical Data Standards
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Things I want to show you
• Beyond a Web of Documents
Web of Data
• Forerunners
The UK and US Government
• Two things to remember
Triples and Global Identifiers
• Three live examples
From the Linking Open Data cloud
2
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Things for CDISC, and for us in
the industry, to consider
• How this relates to the Innovative Medicine
Initiative (IMI) projects
• Pragmatic first steps for CDISC, together with NCI
Linking Clinical Data Standards
• Opportunities across the industry
Linking Clinical Study Metadata
Linking Clinical Data
3
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Acknowledgements
Bosse Andersson, for sharing insights from AstraZeneca’s engagement in
Wayne Kubick, CDISC and Oracle, for good email discussions last summer
Martin Agfjord, Gothenburg University, for the Bachelor Thesis work
Chimezie Ogbuji, Case Western Reserve University's Center for Clinical
Investigation and previously Cleveland Clinics, for the explorative work using
the Patient Controlled Health Records (PCHR) Ontology for Clinical Data
Sam Hume, Simon Lundberg, Dan Ringenbach, Lee Evans,
Gunnar Magnusson for being ”healthy volunteers”
4
Semantic Web Health Care and
Life Sciences (HCLS) Interest Group
Linking Open Drug Data
EU project The Large Knowledge Collider
Linked Life Data
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Web of Data
Web 3.0
Web of Documents
Image Source: Frederic Martin
5
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Forerunners:
The UK and US Government
6
Ja
nu
ary
1,
20
09
“Openness will strengthen
our democracy and promote
efficiency and effectiveness
in Government.”
--- President Obama
Putting
Government
Data online
Ma
y 2
1, 2
00
9
Ja
nu
ary
19
, 2
01
0
data.gov.uk online
Ma
y 2
1, 2
01
0
data.gov online data.gov relaunch
6.4 billion RDF triples
Ju
ne
30,2
009
Dece
mb
er
8,
20
09
“Open Government
Directive” released
Illustration by Prof .Jim Hendler
The Semantic Web 2010 Status Update
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Linking Open Data cloud
7
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
What’s a Triple?
subject predicat object
has the color The sky blue
netAmount Payment number 8605670 120.00
enrollment Clinical Trial number NCT00755378 58
populationTotal The Brussels Capital Region 1080790
comment The property netAmount “The net amount of the payment. This is the effective cost to
the payer after any reclaimable tax has been deducted.”
subClassOf The type of entity Active Ingredient Type of entity Chemical Substance
domain The data property populationTotal Type of object Populated Place
An example from a text book
Three example of facts from the Open Linked Data cloud
Three example representing some of the standards for these three facts
8
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
And, what’s RDF? And, how is XML
related to this?
subject predicat object
Common Model for Data
Resource Description Framework
Alternative serialization formats
9
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
What’s a Global Identifier?
http://spending.lichfielddc.gov.uk/spend/8605670
http://data.linkedct.org/resource/trial/NCT00755378
Three examples of identifiers from the Open Linked Data cloud
Three examples of identifiers used in the standards behind the three examples
http://dbpedia.org/resource/Brussels
Linked Data Principles
1. Use URIs (Uniform Resource Identifiers)
to identify things.
.
2. Use HTTP URIs so that these things can be
referred to and looked up ("dereferenced") by
people and “machines”.
http://reference.data.gov.uk/def/payment#netAmount
http://www.w3.org/2001/sw/hcls/ns/transmed/TMO_0000
http://dbpedia.org/ontology/populationTotal
10
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
And, what’s the difference between
URI and URL?
Brussels Capital Region
is a “real world” entity Identified by this URI
http://dbpedia.org/resource/Brussels
Locator (URL) of the “people friendly” view of the
data about the Brussels Capital Region
http://dbpedia.org/page/Brussels
Locator (URL) of the “machine-processeable”
data about the Brussels Capital Region
http://dbpedia.org/data/Brussels.rdf
Linked Data Principles
3. Provide useful, structured,
information about the thing
when its URI is
lock-up:ed (de-referenced).
11
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
So, what’s the fourth Linked Data
principle?
Linked Data Principles
4. Include links to other, related
URIs in the exposed data to
improve discovery of other
related information.
http://dbpedia.org/resource/Brussels
http://data.nytimes.com/N78748399240553400231
http://sws.geonames.org/2800866/
owl:sameAs
owl:sameAs
12
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Semantic Web standards is a key topic
across the IMI projects
13
Innovative Medicine Initiative
OpenPHACTS
The Open Pharmacological
Concepts Triple Store
DDMoRE
Drug Disease Model
Resources
EHR4CR
Electronic Health Records
for Clinical Research
RICORDO
Researching Interoperability using
Core Reference Datasets and
Ontologies for the Virtual
Physiological Human
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Pragmatic first steps for CDISC,
together with NCI
• Learn from others, such as the UK and US
Government
• Apply the Linked Data principles
Start with the SDTM CT:s e.g. the Trial Summary
Parameters
• Strive for a 5-star rating of Linked Open Data
14
Pragmatic first steps, more details
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Opportunities across the industry
• Applying the four Linked Data principles for
Linking Clinical Study Metadata
Linking Clinical Subject Data
15
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Internal
categorization to
support Design &
Interpretation
decisions
A scenario:
Linking Clinical Study Metadata
16
http://clinial.data.astrazeneca.com/id/study/D8180C00011
http://data.linkedct.org/resource/trials/NCT00755378
What would we like to see as the
linked data description of it?
What would we like to see on a
internal webpage
presenting linked data
describing a clinical study?
owl:sameAs
More ideas
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
So, why I'm so enthusiastic about
all of this!
• Well applied Linked Data principles and cautious
steps building on existing insights …
• … would improve the research utility of clinical
datasets
Organized for associations
Prepared for not yet defined use
Ready for automation where computers can
function alongside us to
• Mitigate the complexity in clinical research
• Improve the productivity in clinical data
management
• …
17
Pragmatic first steps More ideas Live examples
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Extras
18
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Three Live Examples
1. Net amount for an expenditure from a local authority in UK
2. Enrollment number for a study from ClinicalTrial.gov
3. Population of Brussels Capital Region from Wikipedia
19
2 of 10 Triples 2 of 48 Triples 4 of 992 Triples
Three different approaches to standardization
1 2 3
Pragmatic first steps More ideas
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Expenditure amount for a local
authority in UK
20
1
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Expenditure amount for a local
authority in UK
21
Live view using the Web Data Inspector
2 of 10 Triples
1
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
An example of a top-down approach to
standardization for Linked Data
22
Live view using the Web Data Inspector
The Linked Data Cube Vocabulary
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
UK government: Top-down approach to
standardization for Spending Data
23
Statistical Data perspective
Linked Data Cube Vocabolary
Payment Ontology
Live example index
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Enrollment number for a study
from ClinicalTrial.gov
24
Semantic Web Health Care
and Life Sciences (HCLS)
Interest Group
Linking Open Drug Data
(Life Science part of the
Linking Open Data cloud)
2
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Enrollment number for a study
from ClinicalTrial.gov
25
2 of 48 Triples
Live view using the Web Data Inspector
2
Semantic Web Health Care
and Life Sciences (HCLS)
Interest Group
Linking Open Drug Data
(Life Science part of the
Linking Open Data cloud)
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
An example of first using the source data
structure as the ”standard” …
26
2 of 48 Triples
Live view using the Web Data Inspector
2
Database key Variabel name Table name
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
… and then as a next step look for a
common standard
27
Live view using the Web Data Inspector
Semantic Web Health Care
and Life Sciences (HCLS)
Interest Group
Translational Medicine Ontology
(a.k.a. Pharma Ontology)
Live example index
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Population of Brussels Capital
Region from Wikipedia
28
Live view using the Web Data Inspector
3
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Linking between different sources about
the same entity using different identifiers
Live view using the Web Data Inspector
3
29
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Bottom-up standardization: Community
curated, and a central shallow Ontology
30
http://dbpedia.org/ontology/PopulatedPlace
http://dbpedia.org/ontology/populationTotal
http://dbpedia.org/ontology/populationAsOf
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Pragmatic first steps for CDISC,
together with NCI
• Learn from others, such as the UK and US
Government
• Apply the Linked Data principles
Start with the SDTM CT:s e.g. the Trial Summary
Parameters
• Strive for a 5-star rating of Linked Open Data
31
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Linked Open Data star scheme
32
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Pragmatic first steps
Learn from others, two examples
• AGFA
Clinical drug administration forms mapped to SNOMED
CT and FDA codes using the SKOS (Simple Knowledge
Organization System) vocabulary
33
http://www.agfa.com/w3c/2009/drugAdministrationForms#
http://linkedlifedata.com/resource/umls/id/C1879952
http://linkedlifedata.com/resource/umls/id/C0013153
• EU Project – LarKC
• Published CDISC CT:s as part of Linked Life Data
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Pragmatic first steps, start with
Trial Summary Parameters
34
Existing text strings published in a long file in Excel txt format (soon also as ODM/XML) 0
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Pragmatic first steps, start with
Trial Summary Parameters
35
Examples: http://reference.data.cdisc.org/ct/sdtm/ROUTE#ORAL
or alt. using the C-code from NCI Thesaurus http://reference.data.cdisc.org/ct/sdtm/C66729#C38288
Establish a CDISC URI scheme
Proposal: Build on http://data.gov.uk/resources/uris 1
Model CDISC SDTM CT:s as RDF Triples 2
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Pragmatic first steps, start with
Trial Summary Parameters
36
Establish a CDISC URI scheme
Proposal: Build on http://data.gov.uk/resources/uris 1
Model CDISC SDTM CT:s as RDF Triples 2
Publish in RDF/XML 3
Create a so called SPARQL endpoint so that CDISC
standards published as triples can be queried
directly using the RDF query language.
5
De-reference/look-up service so people
and applications can get descriptions of
individual code lists and code.
4
Examples: http://reference.data.cdisc.org/ct/sdtm/ROUTE#ORAL
or alt. using the C-code from NCI Thesaurus http://reference.data.cdisc.org/ct/sdtm/C66729#C38288
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Some thoughts and ideas explored
• Applying the four Linked Data principles
Linking Clinical Study Metadata
• A scenario
• Explore the use of
– BRIDG Domain Model (to be published in OWL/RDF)
– Translational Medicine Ontology - TMO
(a.k.a. Pharma Ontology)
Linking Clinical Data
• Best practice for URI scheme and minting URI:s
• Explorative work: using the Computer-Based
Patient Record (CPR) Ontology for Clinical Data
37
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Internal
categorization to
support Design &
Interpretation
decisions
A scenario:
Linked Clinical Study Metadata
38
http://clinial.data.astrazeneca.com/id/study/D8180C00011
http://data.linkedct.org/resource/trials/NCT00755378
What would we like to see as the
linked data description of it?
What would we like to see on a
internal webpage
presenting linked data
describing a clinical study?
owl:sameAs
More ideas
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Explore the use of BRIDG and TMO
for common classes and properties
39
Semantic Web Health Care
and Life Sciences (HCLS)
Interest Group
Translational Medicine Ontology
(a.k.a. Pharma Ontology)
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Linking Clinical Data: Explore URI
scheme and minting URI:s
40
Proposal: Build on http://data.gov.uk/resources/uris
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Explorative work: using the Computer-Based
Patient Record (CPR) Ontology for Clinical Data
Acknowledgements:
Chimezie Ogbuji, Case Western Reserve University's Center for
Clinical Investigation, previously Cleveland Clinics.
Martin Agfjord, IT University, Göteborg
41
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
If you want to learn more
42
An Intro To The Semantic Web: Why You Need To
Know About It Sooner Than Later , by Samantha Wong
Image Source: Frederic Martin
The Semantic Web 2010 Status Update
by Prof .Jim Hendler
Open data: accountability, citizen utility
and economic opportunity.
http://data.gov.uk/linked-data
http://www.data.gov/semantic
Linked Spending Data –
How and Why Bother
Linking Open Drug Data
Interest Group for Semantic Web in
Health Care and Life Science
http://www.w3.org/blog/hcls
Linked Open Data star scheme by example
The Linking Open Data
cloud diagram
Guide to the
Payments Ontology
The RDF Data Cube
vocabulary
Presentation:
Statistical Data in RDF
How DBpedia Treats
Wikipedia as a Database
Excellent article “More than Words: Biomedical Ontologies”
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
Kerstin Forsberg CDISC Interchange Europe 2011 eHR and the World Beyond
And now – An Evening with TinTin
at the Brussels Comic Strip Center!
43