open phacts: semantic interoperability for drug discovery · 16/03/2016  · surechembl system...

23
ACS National Meeting, San Diego Herman van Vlijmen 16 Mar 2016 Open PHACTS: Semantic interoperability for drug discovery Judith Hinton Andrew, Rock Composite 22 Artwork from The Creative Center

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

ACS National Meeting, San Diego

Herman van Vlijmen 16 Mar 2016

Open PHACTS:

Semantic interoperability for drug discovery

Judith Hinton Andrew, Rock Composite 22 Artwork from The Creative Center

Page 2: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

2

What is Linked Data?

Page 3: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

3

What is Linked Data?

"LOD Cloud 2014" by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak - http://lod-cloud.net/. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:LOD_Cloud_2014.svg#/media/File:LOD_Cloud_2014.svg

Page 4: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

4

How can data be linked?

Requires linking to standards: common “concepts” – Names, units, chemical

structures, etc

Data storage format – Triples, graphs

Query tools – SPARQL

Provenance – Original data source

Chen et al. BMC Bioinformatics 2010, 11:255

Page 5: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

5

Linked Data Storage example: RDF triples

Basic format Linking data sets Concept standards

http://www.accessola2.com/olita/insideolita/wordpress/?p=60281

Page 6: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

6

Examples of Linked Data challenges

Data types and units for pharmacological activity in ChEMBL

Lee and Gobbi. J. Chem. Inf. Model. 2012, 52, 285−292

Stereochemistry Tautomerism

Page 7: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

7

Why do we need Linked Data?

Multiple data sources can be queried at once – For example: In-house data, CHEMBL, PubChem, Thomson-Reuters,

DrugBank, GOSTAR, all have compound pharmacology data

– Time savings

– Certain to get full picture from private, public, and commercial data

Complex questions can be asked relatively easily – Databases from multiple domains, e.g. compounds, diseases,

genes, pathways, etc.

– Scientists will ask things they would not ask otherwise

Completely new type of analysis – Network based queries, semantic reasoning: not possible previously

Page 8: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

8

Answering more complex questions

Page 9: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

9

Page 10: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

10

Answering more complex questions

What are the Janssen compounds active in this Janssen assay?

Give me all internal/commercial/public data on compounds that are active on my target and other closely related targets.

What is the difference in gene expression profile between tumor and normal tissue?

Given the differences in gene expression profiles between these tissues, give me the compounds with biochemical activity profiles that resemble the difference profile most

I have a CDK2 lead compound. Is there anything known in PubMed on toxicity of CDK2 inhibitors?

Given my CDK2 lead compound, what are the most likely mechanisms by which this compound class could cause toxicity

TODAY WITH LINKED DATA

Page 11: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

11

New types of analysis with Linked Data

Search PubMed for potential target-disease association: “bcl2 and schizophrenia”

Show me all possible direct and indirect links between bcl2 and schizophrenia, ranked by level of scientific data support

Search a gene disease association database like DISGENET for possible genes/proteins that can serve as biomarkers for colorectal cancer

Based on all data that I have access to, provide a prioritized list of potential biomarkers for colorectal cancer that satisfy specific tissue constraints and are obtainable from blood, urine, or stool

TODAY WITH LINKED DATA

Page 12: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

12

Example: Gene variant disease association workflow

Step 1 Step 2

Slides from Euretos (www.euretos.com)

Page 13: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

13

Gene variant disease association workflow

Step 3 Step 4

Slides from Euretos (www.euretos.com)

Page 14: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

14

Gene variant disease association workflow bcl2 - schizophrenia

Slides from Euretos (www.euretos.com)

Page 15: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

15

Current activities in Linked Data environment? Some examples

Open PHACTS: EU and Pharma sponsored IMI (Innovative Medicines Initiative) project to develop Linked Data database and semantic applications in biomedical field (2011-2016)

ELIXIR: sustainable European infrastructure for biological information. Interoperability of data is key objective

Strong emphasis on making data sources FAIR (Findable, Accessible, Interoperable, Reusable) in ongoing ELIXIR and NIH activities

Development of advanced Linked Data analysis tools – For example: Euretos, Cambridge Semantics, Ontoforce

Pharma and Biotech companies are actively integrating internal with public and commercial databases with data companies and public-private consortia

Page 16: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

16

Open PHACTS consortium partners

Associated partners

Consortium partners

Page 17: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

17

www.openphacts.org Mission: Integrate multiple research biomedical data resources into a single open, sustainable and free access point www.openphactsfoundation.org The Open PHACTS Foundation is a registered charity dedicated to sustaining and developing the Open PHACTS Discovery Platform after completion of the IMI project

Page 18: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

18

Open PHACTS data sources

Page 19: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

19

Applications that use the Open PHACTS API API freely accessible via http://dev.openphacts.org

App Ecosystem

Page 20: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

Open PHACTS access

Free API access: http://dev.openphacts.org

Virtual Machine install of Open PHACTS behind firewall, using docker image Beta testing with a Pharma partner Allows you to customize and load

your own data

Page 21: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

Phenotypic Drug Discovery Workflows

Digles et al, MedChemComm, submitted

“Knowing the knowns”

Page 22: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

22

Recent Open PHACTS developments: Patent Info Huge amount of knowledge in patent corpus, most of which

will never be published elsewhere, but potentially great value to drug discovery

SureChEMBL system (EBI) already extracts compounds from these documents

Open PHACTS consortium funded project to also extract gene/disease information (EMBL-EBI and SciBite)

~4 million patents in total, 260 million annotations (patent-compound, patent-gene or patent-disease associations)

Example use cases: – For a given target, give me all the compounds

that are linked to this target through patents – For a given disease, give me all the targets that

are linked to this disease through patents – Tell me how reliable these links are

Page 23: Open PHACTS: Semantic interoperability for drug discovery · 16/03/2016  · SureChEMBL system (EBI) already extracts compounds from these documents Open PHACTS consortium funded

23

Acknowledgements

Janssen – Edgar Jacoby

– Jean-Marc Neefs

– Dmitrii Rassokhin

– Doug Martin

Open PHACTS and Open PHACTS Foundation

Euretos – Albert Mons

– Arie Baak