wider data integration - biomedbridges · the role of ontologies within the semantic web wider data...

Post on 30-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The role of ontologies within the semantic web

Wider data integrationWider data integration

Simon Jupp, James Malone

jupp@ebi.ac.uk , malone@ebi.ac.uk

Outline

• Overview of the Semantic Web

• Resource Description Framework

• The role of ontologies in the semantic web

• Linked data

• Demo

Evolution of the web

• 1st generation web - linked documents

• Hand coded HTML

• 2nd generation web – the web as platform

• Web service, XML and REST APIs

• 3rd generation web – semantic web

• The web as a platform for publishing data

• Semantic markup that adds meaning to data for machine processing

A resource of information http://en.wikipedia.org/wiki/Barack_Obama

Lots of data hidden in this page

How could a machine pull data about US presidents from Illinois?

Data as a graph

• Data naturally fits into graph based datastructures

Barack Obama

Person

Illlinois

Unites States Presidents

USA State

USA

How do we publish this data on the Web so a machine can process and “understand” it?

RDF – Resource Description Framework

• RDF is a graphical language used for representing information about resources on the web.

• Resources are described in terms of properties and property values using RDF statements

• All statements in RDF are triple, consisting of a subject, predicate and object.

Triple statement

Barack Obama

Honolulubirth place

Subject

Predicate

Object

Identify things on the web

• Using existing web technology – the URI

http://dbpedia.org/page/Barack_Obama

http://dbpedia.org/page/Honolulu

http://dbpedia.org/property/birthPlace

Subject

Predicate

Object

Publishing and RDF as XML

• URIs give us a mechanism to identify “things” on the web

• RDF provides a base vocabulary describing how those things are related

• We need other vocabularies that give us different kinds of relationships

• Add shared meaning to things using additional vocabularies and ontologies

Ontologies provide the meaning

Barack Obama

Person

Honolulu

Unites States Presidents

USA State

USA

rdfs:subClassOf

rdf:type

dbpedia:birthPlace dbpedia:partOf

dbpedia:partOf

dbpedia:birthPlace

rdf:type

Dbpedia project

• Convert wikipedia info boxes into structured RDF

• RDF even has a query language called SPARQL

http://dbpedia.org/page/Barack_Obamahttp://dbpedia.org/data/Barack_Obama.rdf

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX umbel:<http://umbel.org/umbel/rc/>PREFIX dbpediaowl:<http://dbpedia.org/ontology/>PREFIX resource:<http://dbpedia.org/resource/>

SELECT ?subject WHERE {?subject rdf:type umbel:PresidentOfOrganization .?subject dbpediaowl:birthPlace resource:Honolulu} http://tinyurl.com/cpbmjjf

Building a web of linked data

Life sciences

Uniprot resourcehttp://www.uniprot.org/uniprot/Q0S7M9

Now serving up RDFhttp://purl.uniprot.org/core/Q5UA70

Atlas - Which genes are expressed where and when?• Questions (we collected) that biologists would like to ask of EBI data:

“Differentially expressed genes in adult mice, bred in oxygen rich vs oxygen poor environments? Of this set, which biological processes (GO) are enriched?”

“Where are genes with antigen binding function differentially expressed, which disease and which associated pathways?”

“Get metformin associated pathways with differentially expressed genes, find any proteins that are targets for known diabetes drugs”

How can we help to answer these?

HTML request (Human view)

GXA schema as an RDF graph

Mapping ontologies

Integration adds knowledge

liver cancer

Pathway x Protein a

Gene Y

Gene X

species

Gene ZRegulates

GXA

Data integration

• Primary use-case is to link to pathways• Reactome already publishing RDF

http://purl.uniprot.org/uniprot/Q5UAB1

Differentially expressed gene

Sample

Assay

Experiment

http://identifiers.org/ensembl/ENSG00000175793

Reactome Pathway

UniProt

GOAChEMBL

sio:encodes

sio:’is attribute of’

Demo queries collected from users (live demo)

• http://wwwdev.ebi.ac.uk/fgpt/gxa-sparql

• “Get top 10 overexpressed genes where the experimental factor is asthma”

• “Is the NID1 gene differentially in other respiratory system diseases?”

• “For the genes overexpressed in asthma, get the associated Reactome pathways.”

• “What is the function of genes from the previous query? (This query retrieves GO annotations from the UniProt)”

• “Which ChEMBL molecules target these proteins (retrieves data from ChEMBL”?

• Detecting inconsistencies in data annotations

Summary

• Semantic Web has promised much and delivered little to date

• Recently things have began to improve

• Google’s Knowledge Graph, Good Relations such examples

• EBI now has a trial group for RDF

• Not mature technology but something to keep in mind

top related