graphconnect europe 2016 - building a repository of biomedical ontologies with neo4j - simon jupp

20
Building a repository of biomedical ontologies with Neo4j Simon Jupp [email protected], @simonjupp Samples, Phenotypes and Ontologies Team European Bioinformatics Institute Cambridge, UK.

Upload: neo4j-the-fastest-and-most-scalable-native-graph-database

Post on 08-Jan-2017

374 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Building a repository of biomedical ontologies with Neo4j

Simon Jupp [email protected], @simonjuppSamples, Phenotypes and Ontologies TeamEuropean Bioinformatics InstituteCambridge, UK.

Page 2: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Biological data heavily interlinked

Proteome

Metabolome

Genome

tissue

CE-MS

antibody array LC-MS/MSm/z

600 800 1000 1200 1400 1600

10

20

30

40

50

60

70

80

90

100

Inte

nsity

609.256b6

755.422y8

882.357b9

852.476y9

995.435b10

1092.506b11

1181.252y12

1318.578b13

1587.759b16

1715.817b18

858.408b18 ++

794.380b16 ++

0

miRNAarray

mRNA array

PathwaysProtein Interaction

Drug targets

Page 3: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

We need terminology standards

Dyschromatopsia

Page 4: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Search PubMed for “color blindness”

Page 5: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Search PubMed for “Dyschromatopsia”

Page 6: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Search PubMed for "abnormality of the eye"

Page 7: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

The ontology of color blindness

HP:0011518 (Dichromacy )HP:0011518 (Eye)

HP:0000551 (Abnormality of color vision )

HP:0007641 (Dyschromatopsia)

Is-a

Is-aDisease-location

Page 8: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

The ontology of color blindness

HP:0011518 (Dichromacy )HP:0011518 (Eye)

HP:0000551 (Abnormality of color vision )

HP:0007641 (Dyschromatopsia)

Is-a

Is-aDisease-location

“Colorblindness”

“A form of colorblindness in which only two of the three fundamental colors can be distinguished due to a lack of one of the retinal cone pigments.”

synonym

definition

Page 9: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

9

Genotype Phenotype

Sequence

Proteins

Gene products Transcript

Pathways

Cell type

BRENDA tissue / enzyme source

Development

Anatomy

Phenotype

Plasmodium life cycle

- Sequence types and features

- Genetic Context

- Molecule role - Molecular Function- Biological process - Cellular component

- Protein covalent bond - Protein domain - UniProt taxonomy

-Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction

-Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version

-Mosquito gross anatomy-Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy-Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development

-NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype - Human phenotype-Habronattus courtship -Loggerhead nesting -Animal natural history and life history

eVOC (Expressed Sequence Annotation for Humans)

Ontologies for life sciences

Page 10: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Ontology Lookup Service

• Ontology search engine (Solr)• Graph database of terms (Neo4j)• Powerful RESTful API (Built with Spring data neo4j / rest)• Open source project

• Generic infrastructure (can load any ontology represented in OWL)https://github.com/EBISPOT/OLS

Repository of over 140 biomedical ontologies (4.5 million terms, 11 million relations)

http://www.ebi.ac.uk/ols/beta

Page 11: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Web Ontology Language – (OWL)

• W3C standard vocabulary for describing ontologies• Powerful knowledge representation

However• OWL ontologies aren’t graphs, but…

… can be represented as an RDF graph… people want to use them as graphs

• Plenty of RDF databases around • But incomplete w.r.t. OWL semantics• SPARQL is an acquired taste

Page 12: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

OWL to Neo4j schema• Each node label one of {Class, Property, Individuals} AND {Ontology name}• All OWL annotations become properties (labels, id, descriptions etc)• Superclass of (named and simple existentials) become edges in Neo4j

• E.g. In OWL “heart” subclassOf (part-of some “cardiovascular system”) In Neo4j “heart” part-of “cardiovascular system”

Page 13: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

What are the sub types of “colorblindess”?MATCH (n:Class {obo_id: 'HP:0007641'})<-[r*]-(types:Class) RETURN n, r, types

Page 14: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

What parts of the eye are related to diseases?MATCH

(eye:Class {obo_id: 'UBERON:0000970'})<-[r:Related {label : "part_of"}]-(eye_part:Class)<-

[r1:Related {label : "has_disease_location"}]-(disease:Class) RETURN eye, r,r1, eye_part, disease

Page 15: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Finding common ancestors via shortest pathMatch p=shortestPath( (a:Class)-[r:SUBCLASSOF*]-(b:Class) )Return nodes(p)

What is the common taxonomic superfamily of Gibbons and Chimpanzees?(or Hylobatidae and Pan troglodytes!)

https://commons.wikimedia.org/wiki/File:Hylobates_lar_pair_of_white_and_black_01.jpg

Page 16: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

OLS visualisations• Partonomy for heart from the UBERON anatomy

ontology MATCH path = (n:Class)-[r:SUBCLASSOF|PartOf*]->(ancestor)

Page 17: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

REST API (Spring Data REST + Neo4j)

• Crawlable API - Hypermedia drivel (HAL)

• Get ontology and term meta data • /ontologies• /ontologies/{name}• /ontologies/{name}/terms• /ontologies/{name}/terms/{termid}

• Get related terms and navigate ontology structure• /ontologies/{name}/terms/{termid}/parent• /ontologies/{name}/terms/{termid}/children• /ontologies/{name}/terms/{termid}/descendants• /ontologies/{name}/terms/{termid}/ancestors• /ontologies/{name}/terms/{termid}/{relation} e.g. part_of

http://www.ebi.ac.uk/ols/beta/api

Page 18: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Building the index• We check all 140 external ontology files nightly for

changes• We have a master build index

• When ontology updates we remove the old version and reload using the Neo4j BatchInserter (Potentially fragile)

• We push master index to various production data centers• Provides load balancing

Nightly crawl of all >140 registered ontologies

Page 19: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Conclusion• We’ve built a scalable repository of biomedical

ontologies with Neo4j• Generic OWL indexer (simplified OWL)• Powerful REST API built with Spring

• Acts as standalone OWL ontology server• Now being deployed externally

• Beta ~2000 users / 10 Million requests per month• Would like to discuss

• Batch Inserter• Migrating to Spring Data Neo4j 4

Page 20: GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

Acknowledgements• Sample Phenotypes and Ontologies Team - Tony

Burdett, James Malone, Dani Welter, Catherine Leroy, Sira Sarntivijai, Ilinca Tudose, Helen Parkinson

• Matt Pearce – Flax (BioSOLR project)• Michal Bachman and GraphAware team (Neo4j

training)

• Funding • European Molecular Biology Laboratory (EMBL)• European Union projects: DIACHRON, BioMedBridges

and CORBEL