master headline rdfizing the ebi gene expression atlas james malone, electra tapanari...

18
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari [email protected]

Upload: ralph-norris

Post on 17-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

RDFizing the EBI Gene Expression Atlas

James Malone, Electra Tapanari

[email protected]

Page 2: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

- Initial motivation is explorative- Can we ask new questions?- Do we get new answers?- Can we integrate this data with other related

data?- Is there a sufficient user community to justify an

RDF Atlas resource?

Motivation

Page 3: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

SESL Project

- Semantic Enrichment of Scientific Literature Working Group

- Includes EBI (Dietrich Rebholz) and Pistoia Alliance

- Pilot project in 2010 looking at Developing knowledge brokering standards for semantic integration of gene to Type II diabetes data using Gene Expression Atlas, OMIM, UniProt literature

Page 4: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Gene Expression: Archive to Atlas

AE/GEO acquire

>250,000 Assays

>10,000 experiments

Re-annotate & summarizeATLAS

ArrayExpress

Curation Curation

Page 5: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline 04/21/235

Experimental Factor Ontology• We consume parts of reference ontologies from domain• Construct new classes and relations to answer our use cases• Aim is reuse of existing resources, shared frameworks and mapping of equivalencies where they exist

EFO

Disease Ontology Anatomy Reference Ontology

Ontology Biomedical Investigations

Chemical Entities of Biological Interest

(ChEBI)

Various Species Anatomy

Ontologies

Relation Ontology

Text mining

Page 6: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Gene Expression Atlas @ www.ebi.ac.uk/gxa

Query for Cell adhesion genes in all ‘organism parts’

‘View on EFO’

Ontologically Modeling Sample Variables in Gene Expression Data [email protected]

Page 7: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Input XML

Page 8: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Mapping XML Results to RDF (1)

Id here is an ENSEMBL Gene ID, e.g. RUNX1 (ENSG00000159216)

• Gene to related transcripts, sequence and gene functions • Also EFO ontology classes in RDF form (shown is label to IRI

triple)

Page 9: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Mapping XML Results to RDF (2)

• Connecting gene and ontology id together with experimental metrics

Page 10: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Mapping XML Results to RDF (3)

• Connecting gene with experimental metadata

Page 11: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Relationship Issues

• EFO attempts to follow OBO Foundry guidance and uses the OBO Relation Ontology

• OBI model is more complex, e.g. the relation between sample and measure is indirect*

• Relationship between some of entities is still not well represented across community, even protein product to gene (see my post to OBO list)

• is_about relation is very generic and largely meaningless

• We will use RO where possible, subclass RO otherwise and continue to monitor OBO

*see Brinkman et al, (2010) Modeling biomedical experimental processes with OBI, JBMS, 1(Suppl 1):S7

Page 12: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Display of query results in Gene Expression Atlas DB

Already: 1) JSON format 2) XML format Plus now: 3) RDF format

Page 13: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Java code RDF triples XML doc

XML result doc from Atlas

INPUT

PROCESS OUTPUT

XML doc with triple patterns

RDF pipeline

• Pipeline for generating the RDF given the XML input

• note this works with any XML code

Page 14: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Triple Pattern specification

Page 15: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Example RDF

Page 16: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Blank Node Connections

• First row (n1_0 ) 7 triples

Page 17: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

• Is there a community that warrants directing resources towards this?

• Can we answer new questions?

• Can we integrate with other data sources?

• Can we consolidate complex, non-interoperable ontologies?

• EFO represents a view on this but is a scoped, pragmatic choice – will this indeed always be the case?

Discussion

Page 18: Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari malone@ebi.ac.uk

Master headline

Acknowledgements

• Electra Tapanari (intern that did bulk of implementation)

• Dietrich Rebholz-Schumann (funding internship)

• Christoph Grabmuller

• Misha Kapushesky

• Helen Parkinson

• Contact me

James Malone: [email protected]