cdisc2rdf poster for conference on data integration in the life sciences 2013

1
CDISC2RDF Making clinical data standards linkable, computable and queryable The CDISC2RDF initiative exploits Semantic Web standards and Linked Data principles for clinical data standards from CDISC (Clinical Data Interchange Standards Consortium). Introduction Clinical data standards have been identified as one of five initial areas by the TransCelerate BioPharma, the non-profit organization formed by ten leading pharmaceutical companies, to accelerate the development of new medicines. The European Medicines Agency (EMA) is developing a policy on the proactive publication of clinical-trial data in the interests of public health including clear and understandable clinical data formats. The FDA has a long-held goal of making better use of submitted clinical trial data. Pharmaceutical companies have attempted to use submission standards to create study repositories. Exploiting Semantic Web technologies stands to simplify the interpretation of individual studies, and improve cross-study integration. Kerstin Forsberg, Informatics Scientist [email protected] Analysis, Informatics & Knowledge Engineering Practice, AstraZeneca, Sweden CDISC2RDF Schemas The first version of the core CDISC2RDF schemas were intentionally developed to represent a minimal part of the ISO11179 model for metadata registries. The Meta Model Schema (mms) represents the core Data Description part of the ISO11179 model, Part 3: Registry metamodel and basic attributes From human readable documentation and “Text stringsIn the domain of clinical research CDISC, a non-profit organization, have developed standards for study design (SDM), study data collection (CDASH), study data analysis (ADAM), and submission to the regulatory bodies (SDTM). These represent a limited set of data elements with names such as RACE“, that also have a value set derived from NCI Thesaurus. However, most of the data elements are containers for contextual variables with names such as VSDATE” and “AEACN” (Date of measurement of Vital Signs and Action Taken for Adverse events), and of the data elements for the results of the measurements. These are indirectly indicated in variables called “TESTCD” with a term, or rather a text string such as “DIABP”, “BMI”, “HGB” representing the measurement procedures, “ listed in the so called controlled terminologies (CT) for SDTM (Study Data Tabulation Model). Today all data standards and controlled terminologies, are published as PDF:s, Excel , and traditional XML, by CDISC and NCI EVS. Human readable documentation in PDF:s, Excel:s (and some in XML) CDISC2RDF Schemas (based on the core of ISO11179) Machine processable linked data structured as RDF triples Meta model schema (mms) (Data definition, the core part of ISO 11179) Controlled Terminology schema (cts) (a few additional properties from the NCI Thesaurus export) SDTM 1.2 schema (sdtms) (classifiers: Data Element roles and types) SDTM 3.1.2 IG schema (sdtmigs) (a few additional properties) To machine processable RDF triples and “URI:sThe first deliverable from the CDISC2RDF project was published early 2013. It contained OWL/RDF files (triples) for CDISC submission standards: SDTM 1.2, Implementation Guideline (IG) 3.1.2 and Controlled Terminology (CT), plus CTs for data capture standards (CDASH) and analysis standards (ADaM). Each data element / column, dataset, code list, classifier etc. have got URI:s (Uniform Resource Identifiers) assigned to them: Meta model schema (mms) (Data definition, the core part of ISO 11179) The SDTM schema (sdtms) version 1.2 defines additional classifiers in the underlying model such as the data element role: Record Qualifier and also Expected variable. The Controlled Terminology schema (cts) adds to the metadata model schema (mms) a few additional classifications and properties to represent the existing NCI Thesaurus EVS export. The classes and properties are being used to annotate the Excel column headers and the standard import functionality in the TopBraid Composer tool have been used to create the RDF triples in XML, Turtle, and JSON formats. CDISC2RDF started as a cross-pharma pre- competitive project with AstraZeneca, Roche, TopQuadrant, Free University of Amsterdam and W3C HCLS to show case the use of Semantic Web standards and Linked Data principles. It is now incorporated in the Semantic Technology project, part of the FDA/PhUSE working group on Emerging Technologies with representatives across FDA, CDISC, pharmas, CRO:s and software vendors. We want to push back to CDISC and NCI, and other public and internal standard groups, and show in practice how to “Use (semantic web) standards for standards” http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEACN http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier All OWL/RDF files, schemas and standards are available on https://code.google.com/p/cdisc2rdf/

Upload: kerstin-forsberg

Post on 18-Nov-2014

545 views

Category:

Documents


3 download

DESCRIPTION

Poster on CDISC2RDF for the 9th International Conference on Data Integration in the Life Sciences, DILS2013

TRANSCRIPT

Page 1: CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013

CDISC2RDF

Making clinical data standards linkable, computable and queryable

The CDISC2RDF initiative exploits Semantic Web standards and Linked Data principles for clinical data standards from CDISC (Clinical Data Interchange Standards Consortium).

Introduction Clinical data standards have been identified as one of five initial areas by the TransCelerate BioPharma, the non-profit organization formed by ten leading pharmaceutical companies, to accelerate the development of new medicines. The European Medicines Agency (EMA) is developing a policy on the proactive publication of clinical-trial data in the interests of public health including clear and understandable clinical data formats. The FDA has a long-held goal of making better use of submitted clinical trial data. Pharmaceutical companies have attempted to use submission standards to create study repositories. Exploiting Semantic Web technologies stands to simplify the interpretation of individual studies, and improve cross-study integration.

Kerstin Forsberg, Informatics Scientist

[email protected] Analysis, Informatics & Knowledge Engineering Practice, AstraZeneca, Sweden

CDISC2RDF Schemas The first version of the core CDISC2RDF schemas were intentionally developed to represent a minimal part of the ISO11179 model for metadata registries. The Meta Model Schema (mms) represents the core Data Description part of the ISO11179 model, Part 3: Registry metamodel and basic attributes

From human readable documentation and “Text strings” In the domain of clinical research CDISC, a non-profit organization, have developed standards for study design (SDM), study data collection (CDASH), study data analysis (ADAM), and submission to the regulatory bodies (SDTM). These represent a limited set of data elements with names such as “RACE“, that also have a value set derived from NCI Thesaurus. However, most of the data elements are containers for contextual variables with names such as “VSDATE” and “AEACN” (Date of measurement of Vital Signs and Action Taken for Adverse events), and of the data elements for the results of the measurements. These are indirectly indicated in variables called “TESTCD” with a term, or rather a text string such as “DIABP”, “BMI”, “HGB” representing the measurement procedures, “ listed in the so called controlled terminologies (CT) for SDTM (Study Data Tabulation Model). Today all data standards and controlled terminologies, are published as PDF:s, Excel , and traditional XML, by CDISC and NCI EVS.

Human readable documentation in PDF:s, Excel:s (and some in XML)

CDISC2RDF Schemas (based on the core of ISO11179)

Machine processable linked data structured as RDF triples

Meta model schema (mms)

(Data definition, the core part of ISO 11179)

Controlled Terminology schema (cts)

(a few additional properties from the NCI Thesaurus export)

SDTM 1.2 schema (sdtms)

(classifiers: Data Element roles and types)

SDTM 3.1.2 IG schema (sdtmigs) (a few additional properties)

To machine processable RDF triples and “URI:s” The first deliverable from the CDISC2RDF project was published early 2013. It contained OWL/RDF files (triples) for CDISC submission standards: SDTM 1.2, Implementation Guideline (IG) 3.1.2 and Controlled Terminology (CT), plus CTs for data capture standards (CDASH) and analysis standards (ADaM). Each data element / column, dataset, code list, classifier etc. have got URI:s (Uniform Resource Identifiers) assigned to them:

Meta model schema (mms)

(Data definition, the core part of ISO 11179)

The SDTM schema (sdtms) version 1.2 defines additional classifiers in the underlying model such as the data element role: Record Qualifier and also Expected variable. The Controlled Terminology schema (cts) adds to the metadata model schema (mms) a few additional classifications and properties to represent the existing NCI Thesaurus EVS export. The classes and properties are being used to annotate the Excel column headers and the standard import functionality in the TopBraid Composer tool have been used to create the RDF triples in XML, Turtle, and JSON formats.

CDISC2RDF started as a cross-pharma pre-competitive project with AstraZeneca, Roche, TopQuadrant, Free University of Amsterdam and W3C HCLS to show case the use of Semantic Web standards and Linked Data principles. It is now incorporated in the Semantic Technology project, part of the FDA/PhUSE working group on Emerging Technologies with representatives across FDA, CDISC, pharmas, CRO:s and software vendors.

We want to push back to CDISC and NCI, and other public and internal standard groups, and show in practice how to “Use (semantic web) standards for standards”

http://rdf.cdisc.org/sdtmig-3-1-2/std#Column.AE.AEACN

http://rdf.cdisc.org/sdtmig-3-1-2/std#Table.AE

http://rdf.cdisc.org/sdtm-1-2/schema#Classifier.RecordQualifier

All OWL/RDF files, schemas and standards

are available on https://code.google.com/p/cdisc2rdf/