The role of ontologies within the semantic web
Wider data integrationWider data integration
Simon Jupp, James Malone
Outline
• Overview of the Semantic Web
• Resource Description Framework
• The role of ontologies in the semantic web
• Linked data
• Demo
Evolution of the web
• 1st generation web - linked documents
• Hand coded HTML
• 2nd generation web – the web as platform
• Web service, XML and REST APIs
• 3rd generation web – semantic web
• The web as a platform for publishing data
• Semantic markup that adds meaning to data for machine processing
A resource of information http://en.wikipedia.org/wiki/Barack_Obama
Lots of data hidden in this page
How could a machine pull data about US presidents from Illinois?
Data as a graph
• Data naturally fits into graph based datastructures
Barack Obama
Person
Illlinois
Unites States Presidents
USA State
USA
How do we publish this data on the Web so a machine can process and “understand” it?
RDF – Resource Description Framework
• RDF is a graphical language used for representing information about resources on the web.
• Resources are described in terms of properties and property values using RDF statements
• All statements in RDF are triple, consisting of a subject, predicate and object.
Triple statement
Barack Obama
Honolulubirth place
Subject
Predicate
Object
Identify things on the web
• Using existing web technology – the URI
http://dbpedia.org/page/Barack_Obama
http://dbpedia.org/page/Honolulu
http://dbpedia.org/property/birthPlace
Subject
Predicate
Object
Publishing and RDF as XML
• URIs give us a mechanism to identify “things” on the web
• RDF provides a base vocabulary describing how those things are related
• We need other vocabularies that give us different kinds of relationships
• Add shared meaning to things using additional vocabularies and ontologies
Ontologies provide the meaning
Barack Obama
Person
Honolulu
Unites States Presidents
USA State
USA
rdfs:subClassOf
rdf:type
dbpedia:birthPlace dbpedia:partOf
dbpedia:partOf
dbpedia:birthPlace
rdf:type
Dbpedia project
• Convert wikipedia info boxes into structured RDF
• RDF even has a query language called SPARQL
http://dbpedia.org/page/Barack_Obamahttp://dbpedia.org/data/Barack_Obama.rdf
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX umbel:<http://umbel.org/umbel/rc/>PREFIX dbpediaowl:<http://dbpedia.org/ontology/>PREFIX resource:<http://dbpedia.org/resource/>
SELECT ?subject WHERE {?subject rdf:type umbel:PresidentOfOrganization .?subject dbpediaowl:birthPlace resource:Honolulu} http://tinyurl.com/cpbmjjf
Building a web of linked data
Life sciences
Uniprot resourcehttp://www.uniprot.org/uniprot/Q0S7M9
Now serving up RDFhttp://purl.uniprot.org/core/Q5UA70
Atlas - Which genes are expressed where and when?• Questions (we collected) that biologists would like to ask of EBI data:
“Differentially expressed genes in adult mice, bred in oxygen rich vs oxygen poor environments? Of this set, which biological processes (GO) are enriched?”
“Where are genes with antigen binding function differentially expressed, which disease and which associated pathways?”
“Get metformin associated pathways with differentially expressed genes, find any proteins that are targets for known diabetes drugs”
How can we help to answer these?
HTML request (Human view)
GXA schema as an RDF graph
Mapping ontologies
Integration adds knowledge
liver cancer
Pathway x Protein a
Gene Y
Gene X
species
Gene ZRegulates
GXA
Data integration
• Primary use-case is to link to pathways• Reactome already publishing RDF
http://purl.uniprot.org/uniprot/Q5UAB1
Differentially expressed gene
Sample
Assay
Experiment
http://identifiers.org/ensembl/ENSG00000175793
Reactome Pathway
UniProt
GOAChEMBL
sio:encodes
sio:’is attribute of’
Demo queries collected from users (live demo)
• http://wwwdev.ebi.ac.uk/fgpt/gxa-sparql
• “Get top 10 overexpressed genes where the experimental factor is asthma”
• “Is the NID1 gene differentially in other respiratory system diseases?”
• “For the genes overexpressed in asthma, get the associated Reactome pathways.”
• “What is the function of genes from the previous query? (This query retrieves GO annotations from the UniProt)”
• “Which ChEMBL molecules target these proteins (retrieves data from ChEMBL”?
• Detecting inconsistencies in data annotations
Summary
• Semantic Web has promised much and delivered little to date
• Recently things have began to improve
• Google’s Knowledge Graph, Good Relations such examples
• EBI now has a trial group for RDF
• Not mature technology but something to keep in mind