open innovation contributions from rsc resulting from the open phacts project
DESCRIPTION
The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.TRANSCRIPT
Open innovation and chemistry data management contributions
from RSC resulting from the Open PHACTS project
Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov, Colin Batchelor, Jon Steele & David Sharpe
ACS San Francisco
August 2014
What’s the structure?
What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?Known
Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
Fundamental issue:
•There is a LOT of science online!
•Chaotic, varying quality and very valuable!
•Scientists want to find information quickly and easily
•Often they just “can’t get there” (or don’t even know where “there” is)
•And you have to manage it all (or not)
Pre-competitive Informatics:Pharma are all accessing, processing, storing & re-processing external research data
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
Repeat @ each
companyx
Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
ChEMBLChEMBL DrugBankDrugBank Gene OntologyGene
Ontology WikipathwaysWikipathways
UniProtUniProt
ChemSpiderChemSpider
UMLSUMLS
ConceptWikiConceptWiki
ChEBIChEBI
TrialTroveTrialTrove
GVKBioGVKBio
GeneGoGeneGo
TR IntegrityTR Integrity
“Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM”
“What is the selectivity profile of known p38 inhibitors?”
“Let me compare MW, logP and PSA for known oxidoreductase inhibitors”
Business Question Driven Approach
• 3-year Innovative Medicines Initiative project• Integrating chemistry and biology data using
semantic web technologies• Open source code, open data and open
standards• Academics, Pharmas, Publishers…• To put medicines in the pipeline…
The Open PHACTS community ecosystem
Originally used ChemSpider..
Open PHACTS Deliverables
• Many details but overall…• Deliver an Open Source chemical registry
service, independent of ChemSpider• Development of Open Source CVSP platform• Deliver widgets and APIs to the project• Deliver high quality, standardized Open Data• Deliver structure data in RDF format
Standardize
• Use the SRS as guidance for standardization• Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
Depositions Gateway User Interface
Validate and Standardize
CVSP Filtering
CVSP Filtering of DrugBank
ChEMBL (1.3 million records)
• 11,020 records with 4 bonds and zero charge, e.g. CHEMBL501101 or CHEMBL501973
• 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine
• 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704
OPS1
DrugBank ID DB07241
OPS5OPS4
OPS3
OPS2
OPS6
ops:OPS1 skos:exactMatch <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB07241> .
ops:OPS2 skos:relatedMatch ops:OPS1 .
ops:OPS3 skos:relatedMatch ops:OPS1 .
ops:OPS3 skos:closeMatch ops:OPS4 .
ops:OPS3 skos:closeMatch ops:OPS5 .
ops:OPS4 skos:closeMatch ops:OPS6 .
ops:OPS5 skos:closeMatch ops:OPS6 .
Chemical Registry Service
Open Sourcing Data and Code
• All Open PHACTS data is licensed as Open Data and available from Open PHACTS website – ca. 2 Million chemicals
• The Chemical Registration Service, including Chemical Validation and Standardization Platform preparing as Open Source now!
RSC data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between ChEBI, ChEMBL, DrugBank
and OPS identifiers
3. Molecule–molecule relations (“parent–child”) of
interest for drug discovery
4. Calculated physicochemical properties for
compounds (both molecular and macroscopic)
Our RDF schema
Two dozen calculated properties >106 molecules
•CHEMINF ontology for cheminformatics
•QUDT for units and numeric values•ChemSpider IDs for molecules
Synonyms and identifiers
Newly added to the CHEMINF ontology:
•Validated ChemSpider synonyms•Unvalidated ChemSpider synonyms•Validated database identifiers•Unvalidated database identifiers •InChI, InChIKey, SMILES •Preferred ChemSpider name
Physicochemical properties• log P• log D (at pH 5.5 and 7.4)• bioconcentration factor KOC (at pH 5.5, at pH 7.4)• index of refraction• polar surface area• molar refractivity• molar volume• Polarizability• surface tension• density at STP• flash point at 1 atm• boiling point at 1 atm• enthalpy of vaporization at STP• vapour pressure at STP
RDF exports from CRS
benzene’s connection table
OPSbenzene
calculation result
QUDTdimensionless
quantity
“2.17”^^xsd:float
IAOis about
OBIhas specified
output
OBIhas specified
input
QUDThas value
QUDThas standard uncertainty
QUDThas unit
CHEMINFcalculated log P
rdf:type
CHEMINFconnection table
rdf:type
“0.234”^^xsd:float
calculation process
CHEMINFexecution of
ACD/Labs PhysChem software library version 12.01
rdf:type
It is actually more complicated..
What’s built on top of this?
Important for other projects
• Multiple outputs from the project available for reuse to underpin other projects:• Chemical registry service• Chemical validation and standardization• APIs and visualization widgets
New Repository Architecturedoi: 10.1007/s10822-014-9784-5
New Repository Architecture
Compounds Reactions Spectra Materials Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
MaterialsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
MaterialsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
Input data pipeline
Deposition Gateway
Staging databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds Module
Spectra Module
Reactions Module
Materials Module
TextminingModule
Module
Web UI for unified depositions
DropBox, Google Drive, SkyDrive, etc
LabTrove and other templated data
Documents
API, FTP, etc
Raw data Validated dataStaging
databases
All databases are sliced by data sources/data
collections and have simple
security model where each data
slice/source is private, public or
embargoed
Compounds
Reactions
Analytical data
For Deposition of Data• Quality of data at source
• ensuring chemicals are correct - VALIDATION• reactions map and balance as appropriate –
VALIDATION and STANDARDIZATION• file format handling for analytical data types –
binary file formats are proprietary - STANDARDIZATION
• valid interpretation of data – VALIDATION and ANNOTATION
Input data pipeline
Deposition Gateway
Staging databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds Module
Spectra Module
Reactions Module
Materials Module
TextminingModule
Module
Web UI for unified depositions
DropBox, Google Drive, SkyDrive, etc
LabTrove and other templated data
Documents
API, FTP, etc
Raw data Validated dataStaging
databases
All databases are sliced by data sources/data
collections and have simple
security model where each data
slice/source is private, public or
embargoed
Deposition of Data
User Interface Approach
Compounds Reactions Spectra Materials Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
MaterialsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
MaterialsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
User Interface Approach
Compounds Reactions Spectra Materials Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
MaterialsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
MaterialsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
Work in Progress
User Interface Approach
Compounds Reactions Spectra Materials Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
MaterialsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
MaterialsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
A Compounds Repository Interface
The PharmaSea Website
The Open PHACTS community ecosystem
Open PHACTS Project Partners
Pfizer Limited – Coordinator
Universität Wien – Managing entity
Technical University of Denmark
University of Hamburg, Center for Bioinformatics
BioSolveIT GmBH
Consorci Mar Parc de Salut de Barcelona
Leiden University Medical Centre
Royal Society of Chemistry
Vrije Universiteit Amsterdam
Spanish National Cancer Research Centre
University of Manchester
Maastricht University
Aqnowledge
University of Santiago de Compostela
Rheinische Friedrich-Wilhelms-Universität Bonn
AstraZeneca
GlaxoSmithKline
Esteve
Novartis
Merck Serono
H. Lundbeck A/S
Eli LillyNetherlands Bioinformatics CentreSwiss Institute of BioinformaticsConnectedDiscoveryEMBL-European Bioinformatics Institute
Janssen
OpenLink
Thank you
Email: [email protected]: 0000-0002-2668-4821 Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams