rdf workshop - biomedbridges · migrating a relational data model • two approaches were used to...
TRANSCRIPT
Building an RDF representation of the the ChEMBL Database
RDF Workshop
Mark Davies ChEMBL Group, Technical Lead
30/04/2014
Overview • Brief introduction to ChEMBL database
• Approaches to mapping relational data to RDF data model based on ChEMBL experience
• New features in ChEMBL RDF (version 18)
• Future ChEMBL plans
• Open access database for drug discovery
• Freely available (searchable and downloadable)
• Content:
• Bioactivity data manually extracted from the primary medicinal chemistry literature from journals such as J. Med. Chem.
• Subset of data from PubChem
• Deposited data e.g. neglected disease screening, GSK kinase set
• Bioactivity data is associated with a biological target and a chemical structure
• Compounds are stored in a structure searchable format
• Protein targets are linked to protein sequences in UniProt
• Updated regularly with new data
• Secure searching (https://www.ebi.ac.uk/chembldb )
What is ChEMBL?
ChEMBL
https://www.ebi.ac.uk/chembl/
• ChEMBL 18 Release
• 1,359,508 compounds
• 12,419,715 activities
• 1,042,374 assays
• 9,414 targets
• 53,298 documents
• 19 bioactivity sources
• 6 compound-only sources
Compound Target Activity Assay Ref
What does ChEMBL data look like?
How can I access ChEMBL data?
Website Web Services
Widgets Downloads
Virtual Machine (myChEMBL)
Semantic Web
ChEMBL + Semantic Web
• The creation of the RDF version of ChEMBL is funded by the Open PHACTS project - http://www.openphacts.org/
• Migrate the ChEMBL relational data model to RDF based data model – ‘triplify’ everything
• RDF generation is part of official ChEMBL release process
• Identify and use ontologies important in the field of bioactivity data
• ChEMBL RDF to be made available through EBI RDF Platform
Goals of the ChEMBL RDF conversion
• Responding to the demands of the community
• Academic and more recently industry
• Semantic data conversion and querying
• Reasoning/inferencing - providing a starting point for the community
• Ensure the conversion is part of the ChEMBL release cycle
• ChEMBL data model is still evolving so almost impossible for external efforts to keep up to speed with changes
ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/
ChEMBL RDF
Compound Bioactivity Assay Target Ref
Conversion process
ChEMBL RDF Schema ChEMBL Relational Schema
Migrating a relational data model
• Two approaches were used to convert the ChEMBL relational model to an RDF based model
• Approach 1: Semi-automated using the D2RQ Platform
• Approach 2: Manual model building
Where to start?
What tools to use?
What is your goal?
Who is the audience?
Will it be made available to the public?
How often will it be updated?
Which ontologies to use?
Do you write you own ontology?
Which format to use?
Approach 1: Semi-automated using the D2RQ Platform
D2RQ platform overview
• Query a non-RDF database using SPARQL
• Access the content of the database as Linked Data over the Web
• Create custom dumps of the database in RDF formats for loading into an RDF store
• Access information in a non-RDF database using the Apache Jena API
http://d2rq.org/
ChEMBL relational schema
• 3 core domains
• Compound
• Activity
• Target
• 52 tables (52 primary keys J)
• 341 columns
• 4 data types (40 if length, scale and precision included)
• Many indexes, constraints, triggers..
D2RQ: Prerequisites
• Download the software from http://d2rq.org/
• Java 1.5 or higher
• Oracle users will need to download database driver
D2RQ: Example usage
• Aims
• Create RDF representation of your relational data model
• Run and test SPARQL queries against your database
• Online data access and representation
• Process
• Generate “D2R Mapping File”
• Start up D2R Server using “D2R Mapping File”
• Refine model
• Support databases
• Oracle, SQL Server, PostgreSQL, MySQL, HSQLDB,…
http://d2rq.org/
D2RQ: Mapping File Creation
• Command used to generate D2R Mapping File (you just need a database connection string):
• Command above will inspect database and create RDF based definitions of tables and relationships between tables
• Possible to skip/restrict schemas/tables/columns with additional argument – useful for Oracle
$>./generate-mapping -o example_d2r_mapping.ttl \! -u <USER> \! -p <PASSWORD> \! -d oracle.jdbc.driver.OracleDriver jdbc:oracle:thin:@<SERVER>:<PORT>:<DATABASE>!
D2RQ: Mapping file example @prefix map: <#> .!@prefix db: <> .!@prefix vocab: <vocab/> .!@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .!@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .!@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .!@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .!@prefix jdbc: <http://d2rq.org/terms/jdbc/> .!!map:database a d2rq:Database;! d2rq:jdbcDriver "oracle.jdbc.driver.OracleDriver";! d2rq:jdbcDSN "jdbc:oracle:thin:@<SERVER>:<PORT>:<DATABASE>";! d2rq:username "<USER>";! d2rq:password "<PASSWORD>";! .!!# Table CHEMBL_18.ACTION_TYPE!map:CHEMBL_18_ACTION_TYPE a d2rq:ClassMap;! d2rq:dataStorage map:database;! d2rq:uriPattern "CHEMBL_18/ACTION_TYPE/@@CHEMBL_18.ACTION_TYPE.ACTION_TYPE|urlify@@";! d2rq:class vocab:CHEMBL_18_ACTION_TYPE;! d2rq:classDefinitionLabel "CHEMBL_18.ACTION_TYPE";! .!map:CHEMBL_18_ACTION_TYPE__label a d2rq:PropertyBridge;! d2rq:belongsToClassMap map:CHEMBL_18_ACTION_TYPE;! d2rq:property rdfs:label;! d2rq:pattern "ACTION_TYPE #@@CHEMBL_18.ACTION_TYPE.ACTION_TYPE@@";! .!map:CHEMBL_18_ACTION_TYPE_ACTION_TYPE a d2rq:PropertyBridge;! d2rq:belongsToClassMap map:CHEMBL_18_ACTION_TYPE;! d2rq:property vocab:CHEMBL_18_ACTION_TYPE_ACTION_TYPE;! d2rq:propertyDefinitionLabel "ACTION_TYPE ACTION_TYPE";! d2rq:column "CHEMBL_18.ACTION_TYPE.ACTION_TYPE";! .!map:CHEMBL_18_ACTION_TYPE_DESCRIPTION a d2rq:PropertyBridge;! d2rq:belongsToClassMap map:CHEMBL_18_ACTION_TYPE;! d2rq:property vocab:CHEMBL_18_ACTION_TYPE_DESCRIPTION;! d2rq:propertyDefinitionLabel "ACTION_TYPE DESCRIPTION";! d2rq:column "CHEMBL_18.ACTION_TYPE.DESCRIPTION";! .!map:CHEMBL_18_ACTION_TYPE_PARENT_TYPE a d2rq:PropertyBridge;! d2rq:belongsToClassMap map:CHEMBL_18_ACTION_TYPE;! d2rq:property vocab:CHEMBL_18_ACTION_TYPE_PARENT_TYPE;! d2rq:propertyDefinitionLabel "ACTION_TYPE PARENT_TYPE";! d2rq:column "CHEMBL_18.ACTION_TYPE.PARENT_TYPE";! .!
D2R Server
$>./d2r-server example_d2r_mapping.ttl !
• SPARQL endpoint and explorer
• Browsing database contents
• Resolvable URIs
• Content negotiation
• Downloading contents of BLOBs/CLOBs
• Serving the vocabulary
• Publishing metadata
• Command used to start D2R server (assuming you have generated mapping file):
http://d2rq.org/
D2R Server
http://d2rq.org/
D2R Server
http://d2rq.org/
(Quick) D2RQ Demo
D2RQ: Data modeling
• It is possible to model data and also create more meaningful class and property names
• Approach 1
• Edit the mapping file using advanced features of the D2RQ query language: http://d2rq.org/d2rq-language
• Approach 2
• Create mapping file based on restricted set of database objects e.g. users, schemas, views, materialised views – modeling within the database
D2RQ: Optimisations
• Review D2R server deployment
• Increase D2R server heap space
• Review configurations settings, such as page sizes, resultset limits
• Use latest built-in D2RQ optimisations, by specifying d2rq:useAllOptimizations (or --fast flag on server startup)
• Use D2RQ’s dump-rdf command to export RDF representation of database • Exported RDF can then be imported into a triplestore, e.g.
Virtuoso
D2RQ: Limitations
• General limitations
• Integration of multiple databases not possible – achieved within the database
• Read only access – update extension is available
• Limited inferencing available
• Named graphs not supported
• Users tend to end up creating ‘weird’ mapping files
• Database models are often not perfect or clean, which complicates mapping file creation process
Just mapping to RDF is not enough
Approach 2: Manual model building
Approach 2 outline
• Building a basic ontology and instantiate with data
• Steps:
• Define entities in data source
• Define relationships between entities
• Define properties of the entities
• Identify and use external ontologies
Model building considerations • Review available technologies/languages
• Preferred may not have good RDF support/libs
• RDF data formats e.g. rdf, ttl, n3
• All are interchangeable, but some are considered more readable and offer reduced size
ChEMBL relational schema revisited
Molecules
Targets
References
Activities
Assays
Binding Sites
MOAs
Drugs
Substance Activity
Assay Document
Target
Target-Component Source Journal
Protein-Classification
Bio-Component
ChEMBL Entities/Classes
ChEMBL 17 classes
• An OWL based ontology used to define ChEMBL classes
• OWL snippet used to define ChEMBL assay:
• Tools available to help build and write ontologies, e.g. Protégé and TopBraid Composer
ChEMBL class definition
@prefix : <http://rdf.ebi.ac.uk/terms/chembl#> .!!:Assay! rdf:type owl:Class ;! rdfs:label "ChEMBL Assay Class"^^xsd:string ;! rdfs:subClassOf :ChEMBL .!
Entity RDF representation
• Entity names become classes, which allow you to type your data
• ‘chembl_assay:’ and ‘cco:’ are prefixes and ‘a’ is a turtle shorthand for rdf:type
!chembl_assay:CHEMBL615672 a cco:Assay .!
Substance Activity
Assay Document
Target
Target-Component Source Journal
Protein-Classification
Bio-Component
ChEMBL Entity relationships
Relationship RDF representation
• Relationships defined between instances of your entities are object properties
!chembl_assay:CHEMBL615672 a cco:Assay ;! cco:hasTarget chembl_target:CHEMBL612910 ;! cco:hasActivity chembl_activity:CHEMBL_ACT_227195 .!
ChEMBL assay attributes
• Identify attributes from database you want to include in RDF model
• Map attribute types e.g. integers, strings, Booleans
• Some attributes map to external resources/ontologies – see later
• Denormalisation of relational data, e.g. FKs
Attribute RDF representation
• Attributes you define for your classes are datatype properties
• Good practice to add a rdfs:label to all instances
!chembl_assay:CHEMBL615672 a cco:Assay ;! cco:hasTarget chembl_target:CHEMBL612910 ;! cco:hasActivity chembl_activity:CHEMBL_ACT_227195 ;! rdfs:label "CHEMBL615672" ;! cco:chemblId "CHEMBL615672" ;! cco:assayType "Functional" ;! cco:assayCellType "3LL cell line" ;! cco:organismName "Mus musculus” .!!
More examples ChEMBL Entity properties
Substance
Target
Activity
TopBraid Composer (Free Edition)
Mapping to external ontologies
• Examples of ontologies/taxonomies mapped to in ChEMBL RDF include:
• BioAssay Ontology (BAO)
• ChEBI
• Chemical Infomation Ontology (CHEMINF)
• Bibliographic Ontology
• Unit Ontology (UO)
• QUDT Ontology
• Semantic Science Ontology (SIO)
• Cell Line Ontology (CLO)
• Experimental Factor Ontology (EFO)
External ontologies/taxonomies
• Identification of relevant external ontologies
• Community consensus + recommendations
• BioPortal - https://bioportal.bioontology.org/
• Ontology Lookup Service - https://www.ebi.ac.uk/ontology-lookup/
Substance Activity
Assay Document
Target
Target-Component Source Journal
Protein-Classification
Bio-Component
ChEMBL assay data
ChEMBL assay annotation
• The assay is the central component to the ChEMBL data model
• Current model not ideal
• Single category - binding, functional, ADMET, physchem
• Unstructured/free text used to describe assay
• Many assay parameters not captured – although often not available
• Ontologies are now being used to improve ChEMBL assay annotations - ChEMBL_17 onwards
• Mappings to BAO bioassays, assay_format, endpoints
• http://bioassayontology.org
BioAssay Ontology
Bioassay parent class, 92 descendant classes
How do we map to all these BAO assay classes?
External ontology mapping process
• In many cases mapping is straight forward
• Use common bridging identifier e.g. UniProt
• Simple text based conversion e.g. units - actually units not so straight forward in ChEMBL
• Some mappings require complex rules e.g. assay details
• Multiple database parameters
• Complex text processing
• Manual curation
• Tools available to assist with mapping process
• BioPortal Annotator (http://bioportal.bioontology.org/annotator)
• Zooma (http://www.ebi.ac.uk/fgpt/zooma/)
• ChEMBL Assay Annotator
BioPortal Annotator
ChEMBL Assay Description
Restricted to Ontology interest (optional)
Results
API available http://data.bioontology.org/documentation
http://bioportal.bioontology.org/annotator
BioPortal Annotator Example
• CHEMBL2213497 assay description
• More information here:
https://www.ebi.ac.uk/chembl/assay/inspect/CHEMBL2213497
Now use BioPortal Annotator to annotate…
“Induction of apoptosis in human Jurkat T cells overexpressing Neo assessed as loss in mitochondrial membrane potential at 30 ug/ml after 36 hrs by DiO6-based flow cytometry (Rvb = 5.4%)”
ChEMBL Assay Annotator
• ChEMBL Assay Annotator developed by Samuel Croset
• Aim is to map ChEMBL assays to BAO assay classes
• ‘Tailored’ mapping rules developed
External mapping representation
!chembl_assay:CHEMBL615672 a cco:Assay ;! cco:hasTarget chembl_target:CHEMBL612910 ;! cco:hasActivity chembl_activity:CHEMBL_ACT_227195 ;! rdfs:label "CHEMBL615672" ;! cco:chemblId "CHEMBL615672" ;! cco:assayType "Functional" ;! cco:assayCellType "3LL cell line" ;! cco:organismName "Mus musculus” ;! bao:BAO_0000205 bao:BAO_0000219 .!!!
BAO_0000205 = has_assay_format BAO_0000219 = “Cell based”
• In this example defining assay_format:
ChEMBL Core Ontology (CCO)
• The skeleton schema used to store ChEMBL classes, object properties and datatype properties
• The file is also RDF, so can be queried independent of an instances
• ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/18.0/cco.ttl.gz
• Namespace: http://rdf.ebi.ac.uk/terms/chembl#
• Initial focus on Substance (Molecule) and Target Classification
• In future an additional mapping file may be provided, which maps/aligns ChEMBL classes and properties to external resources
ChEMBL Core Ontology (CCO)
Classes Target Classes Substance Classes
ChEMBL RDF schema
https://www.ebi.ac.uk/rdf/documentation/chembl
The (raw) end result
Querying ChEMBL data
• Need to load files into triplestore (Virtuoso Open Source)
ChEMBL Data
External Ontos
e.g. BAO
CCO
ChEMBL Triplestore
ChEMBL SPARQL
Interface/LD Browser
http://www.ebi.ac.uk/rdf/services/chembl/sparql
Reactome Triplestore
UniProt Triplestore
Bio2RDF
Using ‘external’ RDF data sources
• Questions to think about when using external RDF data sources
• Who creates resource and the RDF representation?
• When was the resource last updated?
• When was RDF last updated?
• Does the data model make sense?
• Basic queries work?
• Shared entities and ontologies?
• Any data licensing issues?
VoID can help
• VoID = Vocabulary of Interlinked Datasets
• Acts a bridge between publishers and users
• EBI RDF resources provide a VoID (just an extra RDF file)
• Information contained in VoID
• Creation timestamps
• Publisher details
• Versioning
• Ontologies/vocabularies used
• Licensing
• Data formats available and where they live (not just RDF)
• More complex information such as Subsets and Linksets
Quick look at the ChEMBL VoID…
Model building recommendations
• Technology review
• URIs should resolve and be future proofed
• Ensure the correct external namespaces are being used
• Add rdfs:label to everything
• Consider using identifiers for ontology names instead of textual descriptions
• As ‘small’ descriptive ontology grows consistent naming conventions can breakdown
• Not used for CCO, but may consider future format switch e.g. CCO_000001 = ChEMBL Activity, CCO_000002 = ChEMBL Assay and so on
Technology stack
• Triple Processing
• Groovy
• OpenRDF Sesame Java API (http://www.openrdf.org)
• Rapper – useful command line utility
• Triplestore/Storage
• Virtuoso Open Source Edition 6.1.7 (Upgrade to Version 7 planned)
• Raw .ttl files available to download from ChEMBL FTP site ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/
• Domain/class specific .ttl files created – helps processing and loading
New features ChEMBL 18 RDF
• More data, now 409,989,782 triples
• New types of data
• Binding sites, cell lines, mechanism of action
• New properties
• Molecule hierarchy mappings
• Target complex mappings
• Assay parameters
• Improved mappings to the BAO ontology assay_format, e.g. biochemical, physiochemical, cell based,…
• Some example queries now follow ->
Example query 1
• Get all human cell-lines:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?cellLine ?cellName!WHERE {! ?cellLine a cco:CellLine ;! cco:taxonomy <http://identifiers.org/taxonomy/9606> ;! rdfs:label ?cellName .!}!
http://tinyurl.com/odqulmq
Example query 2
• Get all compounds that have been tested in a cell-based (bao:BAO_0000219) toxicity assay in HepG2 cells:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?mol ?assayDesc!WHERE {! ?mol ?p cco:Substance ;! cco:substanceType ?moltype ;! cco:hasActivity ?activity .! ?activity cco:hasAssay ?assay .! ?assay cco:assayCellType 'HepG2' ;! cco:assayType 'Toxicity' ;! bao:BAO_0000205 bao:BAO_0000219 ;! dcterms:description ?assayDesc .!}!
http://tinyurl.com/oyttvlr
Example query 3
• Get all concentration response assays (bao:BAO_0002162) for monoamine receptor targets (CHEMBL_PC_1266):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT distinct ?assay ?assayDesc!WHERE{! <http://rdf.ebi.ac.uk/resource/chembl/protclass/CHEMBL_PC_1266> cco:hasTargetDescendant ?target .! ?target cco:hasAssay ?assay .! ?assay cco:hasActivity ?activity ;! dcterms:description ?assayDesc .! ?activity bao:BAO_0000208 ?endpoint .! ?endpoint rdfs:subClassOf bao:BAO_0002162 .!}!
http://tinyurl.com/o6qg8uk
Example query 4
• Get the number of ADME assays carried out in organism-based (bao:BAO_0000218) format for FDA approved drugs:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?molname ?mol (count(distinct ?assay) as ?assay_count)!WHERE{! ?assay a cco:Assay ;! cco:assayType 'ADME' ;! bao:BAO_0000205 bao:BAO_0000218 .! ?assay cco:hasActivity ?activity .! ?activity cco:hasMolecule ?mol .! ?mol cco:highestDevelopmentPhase 4 ;! rdfs:label ?molname .!}!GROUP BY ?molname ?mol!ORDER BY DESC(count(distinct ?assay))!
http://tinyurl.com/psu5442
Example query 5
• Get all cell-lines that have been used in physical property assays (bao:BAO_0002128):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?cellLine ?assay!WHERE {! ?cellLine a cco:CellLine ;! cco:isCellLineForAssay ?assay .! ?assay cco:hasActivity ?activity .! ?activity bao:BAO_0000208 ?endpoint .! ?endpoint rdfs:subClassOf bao:BAO_0002128 .!}!
http://tinyurl.com/ojytly3
Example query 6
• Get all Protein Kinase (CHEMBL_PC_1100) inhibitor binding sites:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms:<http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco:<http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?target ?bindingSite ?siteName ?inhibitor!WHERE{! ?bindingSite a cco:BindingSite ;! cco:bindingSiteName ?siteName ;! cco:hasTarget ?target .! <http://rdf.ebi.ac.uk/resource/chembl/protclass/CHEMBL_PC_1100> cco:hasTargetDescendant ?target .! ?target rdfs:label ?targetName ;! cco:isTargetForMechanism ?mechanism .! ?mechanism cco:mechanismActionType 'INHIBITOR' ;! cco:mechanismDescription ?mechanismDesc ;! cco:hasMolecule ?molecule .! ?molecule rdfs:label ?inhibitor .!}!
http://tinyurl.com/onr2yto
Future Plans: SureChEMBL
• December 2013 EMBL-EBI acquired SureChem – a leading ‘chemistry patent mining’ product from Digital Science, Macmillan Group
• SureChem provides a live (updated daily) view chemical patent space
• Rebranded SureChEMBL
https://www.surechembl.org
Open PHACTS extension
• Open PHACTS project is keen to include patent data in future extensions to the project
• ENSO approved - funding to include SureChEMBL data in Open PHACTS
• RDF conversion, target indexing and API development
• EBI-RDF project benefit from RDF conversion
• SureChEMBL is updated daily, compared to quarterly ChEMBL updates
• Interesting challenge for us creating exports and systems loading SureChEMBL
Open PHACTS Platform
Nanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services
Identity Resolution
Service
Chemistry Registration Normalisation & Q/C
Identifier Management
Service
Indexing
Cor
e Pl
atfo
rm
P12374 EC2.43.4
CS4532
“Adenosine receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID Nanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
(slide author: Lee Harland)
Summary
• Review of the ChEMBL database
• Two approaches used to modeling ChEMBL data
• Approach 2 used to build RDF representation of ChEMBL
• New features included in ChEMBL_18 release
• Model enhancements
• More data
• Plans for the future
• Patents
• Open PHACTS
Acknowledgements
ChEMBL Group
• Anna Gaulton
• Samual Croset
• John Overington
Open PHACTS
• Alasdair Gray
• Antonis Loizou
• Lee Harland
• Egon Willighagen
EBI-RDF Group
• Andy Jenkinson
• Simon Jupp
• James Malone
Groups and people involved in the RDF representation of ChEMBL include: