intervet chemicals directory (icd) - a framework combining accelrys pipeline pilot and symyx...

26
Accelrys European User Group Meeting, Barcelona 10/26/2010 Intervet Chemicals Directory (ICD) A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris Frank Oellien

Upload: frank-oellien

Post on 20-Jun-2015

122 views

Category:

Health & Medicine


0 download

DESCRIPTION

Accelrys European User Group Meeting, Barcelona, Spain, October 26, 2010

TRANSCRIPT

Page 1: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

Accelrys European User Group Meeting, Barcelona10/26/2010

Intervet Chemicals Directory (ICD)A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

Frank Oellien

Page 2: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

210/26/2010SP Intervet Chemicals Directory (ICD)

Outline

• Motivation ICD project (historical review)

• Technical Implementation (2003)

• ICD Today (Enhancements in the last years)

• Technical limitations of the Isentris approach

• Solution: Combining Symyx Isentris & Accelrys PP

– Structure Registration, Synchronization

– Database Cleaning

– Property-Calculations

Page 3: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

310/26/2010SP Intervet Chemicals Directory (ICD)

Motivation

• Start of the ICD project 2003• Company was still young• BioChemInformatics group (more precisely the cheminformatics

branch) started its work on regular basis– Ligand- and Structure-based Virtual Screening (LO and Hit2Lead projects)– Property and Descriptor Calculations– QSAR– Substructure- and Similarity Searches

→ Access to many in-house data sources especially structures required→ Many exchange formats used (including Excel and SD files)→ Many diverse tools and applications used

Page 4: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

410/26/2010SP Intervet Chemicals Directory (ICD)

Pre-ICD Time (before Q2 2003)

Page 5: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

510/26/2010SP Intervet Chemicals Directory (ICD)

The Idea – A Central Data Source

SDSD SD

SDSDSD

In-house Databases Supplier Data

Other Data Sources

BCI Applications

MedicinalChemists

CompLog

ICD

Page 6: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

610/26/2010SP Intervet Chemicals Directory (ICD)

Requirements

• Standard data source for all BCI tasks• Merged data source including in-house structures, supplier structures

and other data sources• Dynamically updated• Structure database with unique structure identifier• Standardized and Normalized data (including chemical normalization)• Extendable system that can store other BCI-relevant information

(e.g. virtual screening data)

Ask other Scientists in the Drug Discovery department• Storing supplier catalogues and other supplier information • Data source for compound ordering• Accessible by other scientists (especially medicinal chemists)• Storage of physico-chemical properties for research projects

Page 7: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

710/26/2010SP Intervet Chemicals Directory (ICD)

Implementation: Reasons for Isentris (2003)

• Not many systems available in 2003 (Auspyx, Acorrd, Isentris)• Isentris used many technolgies that were already available in-house

(MDL Direct, Oracle)• Chemical Normalization available: Cheshire• Advanced J2EE architecture and API that allows a good

customization and extension

• CoRe: already an existing project based on Isentris– Intervet was an early adopter of Isentris– No additional software costs– Synergy effects (e.g. chemical business rules)

Page 8: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

810/26/2010SP Intervet Chemicals Directory (ICD)

Implementation Overview

CACTVS (Linux) Java applications (Windows)

• supplier catalogs

• TORE Updates (in-house)

File syntax normalisation

of SD files

Generation of salt information

and Parent-Hash codes

chemical normalisation registration

prepared

SD Files

SD Files

ICD

ADME data

(phys-chem properties)

Oracle SQLLoader

MDL Isentris (Client-Server)

Chemical Rules(CheckAndFix_Main.cct)

Java application

Page 9: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

910/26/2010SP Intervet Chemicals Directory (ICD)

Implementation: SD File Syntax Standardisation

• Based on CACTVS application (by Xemistry)• SD file can have different inputs• 2 generic scripts (supplier-specific, in-house specific) to standardize

the format of the input SD files and supplier-specific configuration files• SDF fields for supplier-related files:

SupplierName, OrderNo, CatalogName, CatalogType, CatalogRelease, Confidential, CompoundName, IsSalt, Salt, Quantity, Purity

• SDF fields for in-house data:AHNO, CompoundName, IsSalt, Salt

• Calculation of structural hash codes (parent structure hash code)Insensitive hash codes: isotope, salt, tautomer, stereochemistry

• Automatically knowledge-based identification of salts→ 174 different salts can be determined

Page 10: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1010/26/2010SP Intervet Chemicals Directory (ICD)

Implementation: Chemical Normalisation

• Based on Cheshire (part of the Isentris framework)• JavaScript clone• Valence checks, Ion2kov, nitro group, transition metals, queries,

geometries, stereo chemistry,…• 99 rules

– 45 correction functions– 29 warnings functions– 25 error functions

• Used by CoRe and ICD applications• Import: molfile string• Output: molfile string and message string

→ Category; No of changes???list of descriptions

Page 11: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1110/26/2010SP Intervet Chemicals Directory (ICD)

Implementation: Registration

• Based on Symyx Isentris Java Client (now Accelrys Isentris)• Using Isentris Data Sources (Data Source Factory)• 3 Java applications (in-house structures, supplier, virtual screening)→ 31 java classes, ~9.500 lines code

• Run types: command line, GUI, batch mode• Chem. Normalisation, duplicate check, registration logic

******************************************************** * ICD Supplier Registration* Version null* Frank Oellien, Intervet Innovation GmbH* *******************************************************

1:10:21 PM INFO: Chemical Normalization status:304334 records without changes4685 records fixed11 records fixed but still have warnings200 records with warnings2 records with errors

1:10:21 PM INFO: Chemical Registration status:1:10:21 PM INFO: New supplier has been registered.309230 records to register309222 records passed registration8 records failed registration180284 new structues registered128938 structues already found in the DB*******************************************************1:10:21 PM INFO: Closing Cheshire environment...1:10:22 PM INFO: Releasing the ICD datasource resources ...1:10:22 PM INFO: Closing the ICD DataSourceFactory...1:10:22 PM INFO: Logout... 1:10:22 PM INFO: All resources released.

Page 12: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1210/26/2010SP Intervet Chemicals Directory (ICD)

ICD Today - Datasheet

• ~ 11,500,000 structures• 237 different catalogues

(including screening libraries, focused data sets)

• 60 suppliers• A broad range of

standard pysico-chemical properties

• Intervet’s in-housedatabase

• Specific Intervet data sets

• References to external sources (PubChem)

Page 13: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1310/26/2010SP Intervet Chemicals Directory (ICD)

ICD Today – Change of Relevance

• Still the main data source for the BCI group, although almost all other BCI technologies have changed in the meantime

• Moreover, has become a key technology platform for the whole Drug Discovery process– Almost all compound logistic activities are based on the ICD

(Applications for compound ordering)– Stores specific essential information for CompLog– Important database for Hit2Lead and LO projects

(contains decision-critical properties)– Has become the most important structure-database for medicinal chemists

• Isentris upgrade to 3.1 → re-design of the ICD Isentris part necessary• New demands by BCI and others had to be implemented→ could not be realized with former setup because of limitations

• Solution: Combination with Pipeline Pilot

Page 14: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1410/26/2010SP Intervet Chemicals Directory (ICD)

Limitations of the original Isentris Setup

From the Beginning• Starting with Isentris 1.1, early adopters• Hard to implement: large, over-designed J2EE API, no developer

guides, only some small code snippets• Limited and complicated functions

– e.g. no support for very large structure files• Re-design of applications was necessary, because of Isentris updates• No automation, everything is done in user context!

Regarding recent Demands• Missing Automation was still most critical issue:

– Synchronisation– Adding non-structural data

• Elaborate database cleaning mechanisms

Page 15: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1510/26/2010SP Intervet Chemicals Directory (ICD)

Registration of Supplier Cataloges

CACTVS (Linux) Java applications (Windows)

structural normalisation

of SD files

Generation of salt information

and Parent-Hash codes

chemical normalisation registration

prepared

SD Files

SD Files

MDL Isentris (Client-Server)

Chemical Rules(CheckAndFix_Main.cct)

ICD

Page 16: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1610/26/2010SP Intervet Chemicals Directory (ICD)

Registration of in-house Structures by PP I

structural normalisation

of SD files

Generation of salt information

and Parent-Hash codes

chemical normalisation registration

ICD

Chemical Rules(CheckAndFix_Main.cct)

in-housedatabase

Synchronisation byPipeline Pilot (Linux)

CACTVS called by PP Cheshire PP Component

Page 17: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1710/26/2010SP Intervet Chemicals Directory (ICD)

Registration of in-house Structures by PP IIRetrieve structures from database

Call CACTVS application

Chemical Normalisation & Registration

Page 18: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1810/26/2010SP Intervet Chemicals Directory (ICD)

Cheshire PP Component (Java)

• Implemented as PP Java component• Based on Cheshire Java API• Calls Cheshire core library (shared object files called by JNI)

Page 19: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

1910/26/2010SP Intervet Chemicals Directory (ICD)

Cheshire PP Component (Java)

• Implemented as PP Java component• Based on Cheshire Java API• Calls Cheshire core library (shared object files called by JNI)

Page 20: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

2010/26/2010SP Intervet Chemicals Directory (ICD)

Cheshire PP Component (Java)

Page 21: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

2110/26/2010SP Intervet Chemicals Directory (ICD)

Importing physico-chemical Properties I

ADME data

(phys-chem properties)

Oracle SQLLoader

Java application ICD

Page 22: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

2210/26/2010SP Intervet Chemicals Directory (ICD)

Importing physico-chemical Properties I

ADME data

(phys-chem properties)

Oracle SQLLoader

Java application

Managed by Pipeline Pilot (Linux)

ICD

Retrieval ofstructures without

properties

Import properties

Externalapplication 1(standardize)

Internal PPcomponents(descriptors)

Externalapplication 2(descriptos)

Externalapplication 3(descriptos)

Externalapplication 4(descriptos)

Page 23: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

2310/26/2010SP Intervet Chemicals Directory (ICD)

Importing physico-chemical Properties II

Page 24: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

2410/26/2010SP Intervet Chemicals Directory (ICD)

Database Maintenance

Page 25: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

2510/26/2010SP Intervet Chemicals Directory (ICD)

Isentris PP Components @ SP Intervet

• Isentris Cheshire PP• Converter:

– Chime string to Molecule– Chime string to CTAB– Molecule to Chime string– CTAB to Chime string

Page 26: Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

2610/26/2010SP Intervet Chemicals Directory (ICD)

Acknowledgement

Information Management• Werner Schlüter• Thomas Fischer

BioChemInformatics• Richard Marhöfer• Andreas Krasky• (Jörg Cramer)• Jörg Schröder• Paul M. Selzer

Thank you