exploring available compound data with the open phacts ...€¦ · linked data api (rdf/xml, ttl,...
TRANSCRIPT
Exploring available compound data with the Open PHACTS Discovery Platform and KNIME
252nd ACS National Meeting
Daniela Digles, Gerhard F. Ecker Philadelphia, PA, August, 21, 2016
Pre-competitive Informatics:Pharma are all accessing, processing, storing & re-processing external research data
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
Repeat @
each
company
x
Lowering industry firewalls: pre-competitive informatics in drug discovery
Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
Different concept types
@gray_alasdair Big Data Integration 4
Nanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
Co
re P
latf
orm
P12374
EC2.43.4
CS4532
“Adenosine
receptor
2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public
Ontologies
User
Annotations
Apps
Workflow tools
9Daniela Digles
• Single „blocks“ for each data processing step(e.g. data reader, calculations, visualization, …)
• Blocks are placed via drag-and-drop andconnected to each other with arrows.
• Commercial (e.g. Pipeline Pilot) and free tools(e.g. KNIME) available.
Data inputData
processingData
processing
Data export
View data
KNIME
10Daniela Digles
• KNIME Analytics Platform
• Available from www.knime.org
• Open source data analytics, reporting andintegration platform
• Workflows can be built by connecting „Nodes“
• Open PHACTS KNIME nodes available fromgithub: https://github.com/openphacts/OPS-Knime
Open PHACTS KNIME
11Daniela Digles
executable API call
Swagger
12Daniela Digles
Structured format for the generation of API documentation.https://dev.openphacts.org/swagger/spec/ops_1_5.json
(….)
Open PHACTS KNIME
14Daniela Digles
Open PHACTS KNIME
15Daniela Digles
Obtaining the results
16Daniela Digles
Answering “scientific competency questions”
17Daniela Digles
• 20 questions defined at the beginning of theproject.
• Example: Give me all oxidoreductase inhibitors active <100 nM in human and mouse.
• Many questions need a combination of queriesto the Open PHACTS Platform.
Questions: Azzaoui K et al. (2013) Drug Discov. Today 18: 843 – 852.
Workflows: Chichester C et al. (2015) Drug Discov. Today 20: 399 – 405.
Example workflow
18Daniela Digles
Q10: For a given compound, summarize all similar compounds and their activities
CC1=C(C(C(=C(N1)C)C(=O)OC)C2=CC=CC=C2[N+](=O)[O-])C(=O)OC
Workflow to collect compound data
19Daniela Digles
• Data for retrieved molecules:• Function and toxicity annotation (Drugbank)
• Role of the molecule (ChEBI)
• Pharmacology data, activity < 10 µM (ChEMBL)
• Patent data (SureChEMBL)
• Data for retrieved targets:• Pathways (WikiPathways)
• Diseases (DisGeNET)
• Example: propafenone derivative
Workflow to collect compound data
20Daniela Digles
Collected results:Compound information
21Daniela Digles
• Structure search: 96 molecules, including themolecule itself.
• Compound information/classification: 1 knowndrug propafenone
Collected results:Patent information
22Daniela Digles
• Highest confidence score:• Patents found for 3 molecules
• No patents found for the original structure
• Lower confidence score:• Patents found for 8 molecules
• 2 patents found for the original structure (High throughput assay for discovering new inhibitors of the GIRK1/4 channel)
• Restriction: Markush structures are not enumerated in SureChEMBL
Collected results:Bioactivity values
23Daniela Digles
191 activity values (lower than 10 µM) against 33 targets.
P-glycoprotein
Cells expressing P-glycoprotein
Propafenone
Collected results:Targets
24Daniela Digles
Target classifications (per compound)
Target classifications (unique targets)
Collected results:Pathways for targets
25Daniela Digles
• 98 Pathways in total
• 4 pathways contain > 5 of the identified targets• GPCR downstream
signalling
• GPCR ligand binding
• Relevance of the pathways?
Collected results:Diseases for targets
26Daniela Digles
> 2000 diseases in 25 disease classes
• Workflow allows the easy preparation of a first overview on known data for a compound of interest.
• New ideas for targets to test the compounds against. • Example: Serotonin receptor for propafenone
derivatives
• Literature (Pubmed) is returned for the results.
• Additional external or in-house data can be added.
• Methods for prioritization needed:• Relevance of pathways
• Relevance of diseases
Daniela Digles 27
Conclusions
Useful links
28Daniela Digles
Open PHACTS: http://www.openphactsfoundation.org/
API: https://dev.openphacts.org/
Support portal: http://support.openphacts.org/
Example Workflows: http://www.myexperiment.org/groups/1125.html
Presentations on YouTube:
https://www.youtube.com/user/OpenPHACTS
For help or feedback: [email protected]
Acknowledgements
Pharmacoinformatics research group, University of Vienna
– Gerhard F. Ecker
– Barbara Zdrazil
– Jana Gurinova
Open PHACTS – KNIME
– Ronald Siebes, VU Amsterdam
– Christine Chichester, SIB
– Evan Tzanis, QMUL
Daniela Digles