lexrdf: a semantic-web compatible extension of lexgrid cui tao jyotishman pathak harold r. solbrig...

35
LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical Statistics and Informatics Mayo Clinic, College of Medicine

Upload: cecilia-matthews

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

LexRDF: A Semantic-Web

Compatible Extension of

LexGrid

Cui Tao

Jyotishman Pathak

Harold R. Solbrig

Wei-Qi Wei

Christopher G. Chute

Division of Biomedical Statistics and InformaticsMayo Clinic, College of Medicine

Page 2: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Introduction

• LexGrid provides

• A common information model to represent multiple vocabulary/ontology sources

• A scalable and robust API for accessing such information

• The Semantic Web community provides:

• OWL: formal, sound, and complete logic-based

• Tools: • Editor: Protégé• Reasoner: RACER, FaCT, FaCT++, Pellet• Storage: Triple stores

Page 3: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

The Classic Web

• Single information space• Built on URIs

• globally unique IDs• retrieval mechanism

• Built on Hyperlinks• are the glue that holds

everything together

Web Browsers

B C

HTML HTMLHTML

Search Engines

hyper-links

A

hyper-links

Source: Chris Bizer

Page 4: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Search by Search Engines

Find the protein and the animo-acids

information for gene “cdk-4"

• One problem with the web of documents is that we can only search by documents. By using a search engine such as Google, we can only guess the keywords that can best represent our questions and hope that they will lead us to the documents that contain the answers we are looking for. However, Google usually returns hands of thousands of documents contain the keywords and users have to manually go through the returned documents in order to obtain the information of interest.

Page 5: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

The Answer is Here

•The Hidden Web:• Hidden behind forms• Hard to query

Find the protein and the animo-acids

information for gene “cdk-4"

• Results from a search engine classic search.

• Even if we found the page contains the answer, we will have to read through the documents and locate the information of interest. There is currently no way for Google to return the piece of data we are looking for directly.

Page 6: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Source: Chris Bizer

• Use Semantic Web technologies to

• publish structured data on the Web

• set links between data with the same source or across sources

• The data and the links could be annotated by ontologies

A Web of Data

Page 7: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

• This is an RDF graph showing that each data has an URI and they can be all linked together.

RDF Graph

Page 8: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Resource Description Framework (RDF)

• A language that allows machines to understand

• XML-based

• Used to identify things on the web (URI)

• Triple structure makes it efficiently implemented and stored

• A direct graph

• Fully parallelized processing, everyone can contribute simultaneously

• Easy to merge data from different sources

Page 9: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

An RDF Statement

• RDF Triple

• Subject: thing the statement is about

• Predicate: property or characteristic of subject

• Object: value of the property

Page 10: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

An Example

Source: Hoot72

• This is an example from Hoot72. They are trying to convert information in HL7 to RDF.

Page 11: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

RDF Data

• This is an example showing how easy to use RDF graph the link data together.

Page 12: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Linked Health Data

Source: Hoot72

• We can link the whole health care domain together to have the linked health data. It can include patient data, scientific findings, doctor information, insurance information and centered by standard coding systems.

Page 13: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

An Example

Source: Tim Berners-Lee

• Tim Berners-Lee gave this example during one of his talk early this year. The questions is “what proteins are involved in signal transduction and are related to pyramidal neurons?”

Page 14: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Search Engine: 223,000 Hits, 0 Results

• A Google search returned more than 200k hits without

a result because no one has asked the exact question before.

Source: Tim Berners-Lee

Page 15: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Linked Health Data: 32 Hits, 32 Results

Source: Tim Berners-Lee

• With the linked health data, he got 32 hits with 32 results. This is because with the linked data, all the data are semantic annotated. With the semantic web, we can also have semantic query so that the question can be described more precisely. Therefore our questions can be answered better.

Page 16: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

LexRDF

• Provide an unified RDF-based model for biomedical ontologies and terminologies

• Directly apply tools and technologies developed for the semantic web

• Provide a public-accessible repository for the biomedical ontologies/terminologies

Page 17: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

LexRDF System Overview

• This is the LexRDF/LexGrid system overview. The left side is our current LexGrid model. We use relational databases as the backend storage, load information from different source terminologies/ontologies to the LexGrid format and this information can be queried through the LexEVS API. LexRDF, instead, use triple stores as the backend storage and loaded information will be stored as triples based on the LexRDF model. The LexEVS API will stay the same from the user’s point of view. All the current functionalities will still be provided with additional features that semantic web can potentially provide to us.

LexEVS API

Triple StoreRelational Database

Load

LexGrid Model

LexRDF Model

Service Layer

Query LoadQuery

Grid Service

WebService

GraphicalBrowser

Ajax API

REST Interface

QueryEngine

Reasoner

LexRDFMapping

Specification

Page 18: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

W3C Recommendations

• The Resource Description Framework (RDF)

• RDF Schema (RDFs)

• Web Ontology Language (OWL)

• Simple Knowledge Organization System (SKOS)

• SKOS eXtension for Labels (SKOS-XL)

• Dublin Core metadata element set (dc) (not W3C)

Page 19: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Mapping Specification

Ontology Information

• This slide shows the mapping specification between for the LexGrid components related to ontology information. We successfully mapped all those LexGrid components to W3C components

Page 20: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Mapping Specification

• ENTITY

This is a graph showing the mapping about LexGrid entity. LexGrid entity has three sub-components: concepts, instance, and association. We were able to find equivalent components for them in the W3C name spaces.

Page 21: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Mapping Specification

• This is a table about entity mapping.

Page 22: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Mapping Specification

PROPERTY

This is a graph for property mapping. A LexGrid property can be further specified as a presentation, a definition, or a comment. lg:presentation

is mapped to skos:prefLabel and skos:altLabel. Lg:definition is mapped

to skos:definition. Lg:comment is mapped to a subset of skos:note.

When no type is specified, we use owl:AnnotationProperty to describe

the general lg:property for now.

Page 23: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Mapping Specification

• PROPERTY

This is a table about the detailed

information for property mapping.

Page 24: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Mapping Specification

ASSOCIATION

This is a table about the detailed information

for association mapping.

Page 25: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Example

Concept, Property, and Reification

FAO:000025 skos:definition “middle stages of reproductive phase”

reification

• This is an example showing mapping for concepts, properties, and reification. We show an OBO term as an example. This term has a name, a definition, and a synonym. LexGrid uses two presentations and one definition to represent the information. The presentation for the name is defined as preferred and the presentation for the synonym is defined as not preferred. LexRDF creates an owl class for the term. It uses skos:preLabel for the name and skos:altLabel for the synonym. For the definition, LexRDF uses skos:definition. Because the definition also has a source information, we had to use RDF reification to describe the source information.

Page 26: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Example: Property Link

translation

This is an example for property link. In LexGrid, we can describe relations between any two properties. Here is

an example for Agrovoc. Agrovoc is a multi-lingual terminologies. In this example, the term has 17 scope notes in 17 different languages. Here we just show 2. Suppose we want to describe the relation “translation” between the English note and the Deutsch note, we then need to add a LexRDF property link between the two statements.

Page 27: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

• This is an example for association mapping. In LexGrid, we not only can describe the relations between two concept, we also can add qualifiers to the relations. In this example, we want to show if a given disease has certain clinical signs. We also want to describe how frequently the clinical sign appears in the disease. In this case, LexRDF creates an association for the relation HAS_CLINICAL_SIGN between the disease and the clinical sign. It also add a qualifier for the frequency to the association.

Example: Association

Page 28: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Discussion

• Generic holder for properties and comments

During the mapping process, we encountered some challenges

and problems. We also suggested some possible solutions.

The following slides will address these challenges and suggestions.

In LexGrid, we have a generic holder for property and comment.

But we cannot find the appropriate equivalent component for them

in the W3C name spaces.

Page 29: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Discussion

•Newly defined properties

LexGrid: can define name and value for a particular property

Concept C001 has Property

Name = “short_name”

Value = “A”

LexRDF:

New annotation properties have to be

defined interoperability problems?

Subject Predicate Object

short_name rdf:type owl:AnnotationProperty

C001 short_name A

Page 30: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Discussion

Preferred properties

In LexGrid we can define both presentation and definition as preferred. But skos only has prefLabel and altLabel. It will be great if they can include preDefinition and altDefinition in the future.

LexRDF:isPreferred

Page 31: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Discussion

• Association qualification

Page 32: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Discussion

• Relation among properties

skosxl:labelRelation• Only between skosxl:label• symmetric property

PropertyLink is more general!

Page 33: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Discussion

•Property groups

CUI1 AUI1 CUI2 AUI2 Rel Rela Sab

C001 A001 C002 A002 PAR sub_type LNC

C001 A003 C002 A004 PAR is_a SNOMED

C002C001PAR

Qualifier Group 1: Rela = sub_type Source = LNC Source_AUI = A001 Target_AUI = A002

Qualifier Group 1: Rela = sub_type Source = SNOMED Source_AUI = A003 Target_AUI = A004

How do we handle a group of properties?

Page 34: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Initial Implementation Status

Web Browser

DatabaseDatabase

Ontologies/Terminologies

LBOWLManagerLexRDFManager

LexRDF

LexEVSAPI

SesameAPI

LexRDFMapping

LexGrid

JDBC Memory Native

RDBMS100101010110100101111001010

Sesame Server

Query Engine

SesameAPI

Page 35: LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical

Conclusion and Future Work

• Next Step

• Implementation

• Evaluation Result Analysis

• Query functionality Analysis

• Formalize the mapping specification by using standards such as the OMG Ontology Definition Meta-model

• LexRDF mapping specification:

• successfully mapped 32 out of 42 LexGrid elements

• high degree of reusability

• LexRDF documentation:

https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/LexGrid_to_RDF_Triple_Store