open innovation contributions from rsc resulting from the open phacts project

49
Open innovation and chemistry data management contributions from RSC resulting from the Open PHACTS project Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov, Colin Batchelor, Jon Steele & David Sharpe ACS San Francisco August 2014

Upload: karen-karapetyan

Post on 09-Jul-2015

102 views

Category:

Science


0 download

DESCRIPTION

The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.

TRANSCRIPT

Page 1: Open innovation contributions from RSC resulting from the Open Phacts project

Open innovation and chemistry data management contributions

from RSC resulting from the Open PHACTS project

Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov, Colin Batchelor, Jon Steele & David Sharpe

ACS San Francisco

August 2014

Page 2: Open innovation contributions from RSC resulting from the Open Phacts project

What’s the structure?

What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?Known

Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Page 3: Open innovation contributions from RSC resulting from the Open Phacts project

Fundamental issue:

•There is a LOT of science online!

•Chaotic, varying quality and very valuable!

•Scientists want to find information quickly and easily

•Often they just “can’t get there” (or don’t even know where “there” is)

•And you have to manage it all (or not)

Page 4: Open innovation contributions from RSC resulting from the Open Phacts project

Pre-competitive Informatics:Pharma are all accessing, processing, storing & re-processing external research data

LiteraturePubChem

GenbankPatents

DatabasesDownloads

Data Integration Data AnalysisFirewalled Databases

Repeat @ each

companyx

Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944

Page 5: Open innovation contributions from RSC resulting from the Open Phacts project

ChEMBLChEMBL DrugBankDrugBank Gene OntologyGene

Ontology WikipathwaysWikipathways

UniProtUniProt

ChemSpiderChemSpider

UMLSUMLS

ConceptWikiConceptWiki

ChEBIChEBI

TrialTroveTrialTrove

GVKBioGVKBio

GeneGoGeneGo

TR IntegrityTR Integrity

“Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM”

“What is the selectivity profile of known p38 inhibitors?”

“Let me compare MW, logP and PSA for known oxidoreductase inhibitors”

Page 6: Open innovation contributions from RSC resulting from the Open Phacts project

Business Question Driven Approach

Page 7: Open innovation contributions from RSC resulting from the Open Phacts project

• 3-year Innovative Medicines Initiative project• Integrating chemistry and biology data using

semantic web technologies• Open source code, open data and open

standards• Academics, Pharmas, Publishers…• To put medicines in the pipeline…

Page 8: Open innovation contributions from RSC resulting from the Open Phacts project

The Open PHACTS community ecosystem

Page 9: Open innovation contributions from RSC resulting from the Open Phacts project

Originally used ChemSpider..

Page 10: Open innovation contributions from RSC resulting from the Open Phacts project

Open PHACTS Deliverables

• Many details but overall…• Deliver an Open Source chemical registry

service, independent of ChemSpider• Development of Open Source CVSP platform• Deliver widgets and APIs to the project• Deliver high quality, standardized Open Data• Deliver structure data in RDF format

Page 11: Open innovation contributions from RSC resulting from the Open Phacts project

Standardize

• Use the SRS as guidance for standardization• Adjust as necessary to our needs

Page 12: Open innovation contributions from RSC resulting from the Open Phacts project

Nitro groups

Page 13: Open innovation contributions from RSC resulting from the Open Phacts project

Salt and Ionic Bonds

Page 14: Open innovation contributions from RSC resulting from the Open Phacts project

Depositions Gateway User Interface

Page 15: Open innovation contributions from RSC resulting from the Open Phacts project

Validate and Standardize

Page 17: Open innovation contributions from RSC resulting from the Open Phacts project

CVSP Filtering of DrugBank

Page 18: Open innovation contributions from RSC resulting from the Open Phacts project

ChEMBL (1.3 million records)

• 11,020 records with 4 bonds and zero charge, e.g. CHEMBL501101 or CHEMBL501973

• 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine

• 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704

Page 19: Open innovation contributions from RSC resulting from the Open Phacts project

OPS1

DrugBank ID DB07241

OPS5OPS4

OPS3

OPS2

OPS6

ops:OPS1 skos:exactMatch <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB07241> .

ops:OPS2 skos:relatedMatch ops:OPS1 .

ops:OPS3 skos:relatedMatch ops:OPS1 .

ops:OPS3 skos:closeMatch ops:OPS4 .

ops:OPS3 skos:closeMatch ops:OPS5 .

ops:OPS4 skos:closeMatch ops:OPS6 .

ops:OPS5 skos:closeMatch ops:OPS6 .

Chemical Registry Service

Page 20: Open innovation contributions from RSC resulting from the Open Phacts project

Open Sourcing Data and Code

• All Open PHACTS data is licensed as Open Data and available from Open PHACTS website – ca. 2 Million chemicals

• The Chemical Registration Service, including Chemical Validation and Standardization Platform preparing as Open Source now!

Page 21: Open innovation contributions from RSC resulting from the Open Phacts project

RSC data in Open PHACTS

1. Molecule synonyms and identifiers

2. Linksets between ChEBI, ChEMBL, DrugBank

and OPS identifiers

3. Molecule–molecule relations (“parent–child”) of

interest for drug discovery

4. Calculated physicochemical properties for

compounds (both molecular and macroscopic)

Page 22: Open innovation contributions from RSC resulting from the Open Phacts project

Our RDF schema

Two dozen calculated properties >106 molecules

•CHEMINF ontology for cheminformatics

•QUDT for units and numeric values•ChemSpider IDs for molecules

Page 23: Open innovation contributions from RSC resulting from the Open Phacts project

Synonyms and identifiers

Newly added to the CHEMINF ontology:

•Validated ChemSpider synonyms•Unvalidated ChemSpider synonyms•Validated database identifiers•Unvalidated database identifiers •InChI, InChIKey, SMILES •Preferred ChemSpider name

Page 24: Open innovation contributions from RSC resulting from the Open Phacts project

Physicochemical properties• log P• log D (at pH 5.5 and 7.4)• bioconcentration factor KOC (at pH 5.5, at pH 7.4)• index of refraction• polar surface area• molar refractivity• molar volume• Polarizability• surface tension• density at STP• flash point at 1 atm• boiling point at 1 atm• enthalpy of vaporization at STP• vapour pressure at STP

Page 25: Open innovation contributions from RSC resulting from the Open Phacts project

RDF exports from CRS

Page 26: Open innovation contributions from RSC resulting from the Open Phacts project

benzene’s connection table

OPSbenzene

calculation result

QUDTdimensionless

quantity

“2.17”^^xsd:float

IAOis about

OBIhas specified

output

OBIhas specified

input

QUDThas value

QUDThas standard uncertainty

QUDThas unit

CHEMINFcalculated log P

rdf:type

CHEMINFconnection table

rdf:type

“0.234”^^xsd:float

calculation process

CHEMINFexecution of

ACD/Labs PhysChem software library version 12.01

rdf:type

It is actually more complicated..

Page 27: Open innovation contributions from RSC resulting from the Open Phacts project

What’s built on top of this?

Page 28: Open innovation contributions from RSC resulting from the Open Phacts project

Important for other projects

• Multiple outputs from the project available for reuse to underpin other projects:• Chemical registry service• Chemical validation and standardization• APIs and visualization widgets

Page 29: Open innovation contributions from RSC resulting from the Open Phacts project

New Repository Architecturedoi: 10.1007/s10822-014-9784-5

Page 30: Open innovation contributions from RSC resulting from the Open Phacts project

New Repository Architecture

Compounds Reactions Spectra Materials Documents

CompoundsAPI

ReactionsAPI

SpectraAPI

MaterialsAPI

DocumentsAPI

CompoundsWidgets

ReactionsWidgets

SpectraWidgets

MaterialsWidgets

DocumentsWidgets

Data tier

Data access tier

User interface

components tier

Analytical Laboratory application

User interface tier

(examples) Electronic Laboratory Notebook

Paid 3rd party integrations (various platforms – SharePoint, Google, etc)

Chemical Inventory application

Page 31: Open innovation contributions from RSC resulting from the Open Phacts project

Input data pipeline

Deposition Gateway

Staging databases

Compounds

Reactions

Spectra

Materials

Articles / CSSP

Compounds Module

Spectra Module

Reactions Module

Materials Module

TextminingModule

Module

Web UI for unified depositions

DropBox, Google Drive, SkyDrive, etc

LabTrove and other templated data

Documents

API, FTP, etc

Raw data Validated dataStaging

databases

All databases are sliced by data sources/data

collections and have simple

security model where each data

slice/source is private, public or

embargoed

Page 32: Open innovation contributions from RSC resulting from the Open Phacts project

Compounds

Page 33: Open innovation contributions from RSC resulting from the Open Phacts project

Reactions

Page 34: Open innovation contributions from RSC resulting from the Open Phacts project

Analytical data

Page 35: Open innovation contributions from RSC resulting from the Open Phacts project

For Deposition of Data• Quality of data at source

• ensuring chemicals are correct - VALIDATION• reactions map and balance as appropriate –

VALIDATION and STANDARDIZATION• file format handling for analytical data types –

binary file formats are proprietary - STANDARDIZATION

• valid interpretation of data – VALIDATION and ANNOTATION

Page 36: Open innovation contributions from RSC resulting from the Open Phacts project

Input data pipeline

Deposition Gateway

Staging databases

Compounds

Reactions

Spectra

Materials

Articles / CSSP

Compounds Module

Spectra Module

Reactions Module

Materials Module

TextminingModule

Module

Web UI for unified depositions

DropBox, Google Drive, SkyDrive, etc

LabTrove and other templated data

Documents

API, FTP, etc

Raw data Validated dataStaging

databases

All databases are sliced by data sources/data

collections and have simple

security model where each data

slice/source is private, public or

embargoed

Page 37: Open innovation contributions from RSC resulting from the Open Phacts project

Deposition of Data

Page 38: Open innovation contributions from RSC resulting from the Open Phacts project

User Interface Approach

Compounds Reactions Spectra Materials Documents

CompoundsAPI

ReactionsAPI

SpectraAPI

MaterialsAPI

DocumentsAPI

CompoundsWidgets

ReactionsWidgets

SpectraWidgets

MaterialsWidgets

DocumentsWidgets

Data tier

Data access tier

User interface

components tier

Analytical Laboratory application

User interface tier

(examples) Electronic Laboratory Notebook

Paid 3rd party integrations (various platforms – SharePoint, Google, etc)

Chemical Inventory application

Page 39: Open innovation contributions from RSC resulting from the Open Phacts project
Page 40: Open innovation contributions from RSC resulting from the Open Phacts project
Page 41: Open innovation contributions from RSC resulting from the Open Phacts project

User Interface Approach

Compounds Reactions Spectra Materials Documents

CompoundsAPI

ReactionsAPI

SpectraAPI

MaterialsAPI

DocumentsAPI

CompoundsWidgets

ReactionsWidgets

SpectraWidgets

MaterialsWidgets

DocumentsWidgets

Data tier

Data access tier

User interface

components tier

Analytical Laboratory application

User interface tier

(examples) Electronic Laboratory Notebook

Paid 3rd party integrations (various platforms – SharePoint, Google, etc)

Chemical Inventory application

Page 42: Open innovation contributions from RSC resulting from the Open Phacts project
Page 43: Open innovation contributions from RSC resulting from the Open Phacts project

Work in Progress

Page 44: Open innovation contributions from RSC resulting from the Open Phacts project

User Interface Approach

Compounds Reactions Spectra Materials Documents

CompoundsAPI

ReactionsAPI

SpectraAPI

MaterialsAPI

DocumentsAPI

CompoundsWidgets

ReactionsWidgets

SpectraWidgets

MaterialsWidgets

DocumentsWidgets

Data tier

Data access tier

User interface

components tier

Analytical Laboratory application

User interface tier

(examples) Electronic Laboratory Notebook

Paid 3rd party integrations (various platforms – SharePoint, Google, etc)

Chemical Inventory application

Page 45: Open innovation contributions from RSC resulting from the Open Phacts project

A Compounds Repository Interface

Page 46: Open innovation contributions from RSC resulting from the Open Phacts project

The PharmaSea Website

Page 47: Open innovation contributions from RSC resulting from the Open Phacts project

The Open PHACTS community ecosystem

Page 48: Open innovation contributions from RSC resulting from the Open Phacts project

Open PHACTS Project Partners

Pfizer Limited – Coordinator

Universität Wien – Managing entity

Technical University of Denmark

University of Hamburg, Center for Bioinformatics

BioSolveIT GmBH

Consorci Mar Parc de Salut de Barcelona

Leiden University Medical Centre

Royal Society of Chemistry

Vrije Universiteit Amsterdam

Spanish National Cancer Research Centre

University of Manchester

Maastricht University

Aqnowledge

University of Santiago de Compostela

Rheinische Friedrich-Wilhelms-Universität Bonn

AstraZeneca

GlaxoSmithKline

Esteve

Novartis

Merck Serono

H. Lundbeck A/S

Eli LillyNetherlands Bioinformatics CentreSwiss Institute of BioinformaticsConnectedDiscoveryEMBL-European Bioinformatics Institute

Janssen

OpenLink

Page 49: Open innovation contributions from RSC resulting from the Open Phacts project

Thank you

Email: [email protected]: 0000-0002-2668-4821 Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams