demonstration of chemaxon's markush technologydemo: extract exemplified structures from a...
TRANSCRIPT
Demonstration of
ChemAxon's Markush Technology
Wei Deng (David)
ChemAxon UGM
Sep 26th, 2012
Boston
Start from a Patent (US4806538)
2
Extract All Structures from a Patent
• Document to Structure
– When the patent document is available
– Works on Image PDF
• Chemicalize.org
– Full text available online
– Faster than D2S sometimes
3
Demo: Extract Exemplified Structures from a Patent
• Find patent full text online
• Copy and paste the URL into Chemicalize
• Download all structures
• Import all structures into Instant JChem
• Do a simple filtering to remove non-
exemplified structures
4
• Query atom
• any, metal, hetero ...
• Atom topology (ring, chain)
• Stereochemistry (E/Z, tetrahedral)
• Aromatic, aliphatic atoms
• Substitution count
• Block substitution (s*)
• H count
• Explicit H full support
• Ring bond count
• Isolate ring on atoms (rb*)
Additional Markush Query Features I
On Atoms
Additional Markush Query Features II
• Bond topology (chain/ring)
• Equal homology translation
• Broad translation switchable
• Simple R-group queries
On Bonds
and other ...
Demo: Generate a Markush Structure
• R-group decomposition
on exemplified structures
based on the Markush core
• Generate a Markush structures
• Add additional R-group definitions
• Enumerate the library and save as a local
file
7
• R-groups
• Multiple attachment points
• Up to thousands of R-group definitions
• Nested R-groups (any depth)
• Atom lists, bond lists
• Position variation bond
• Link nodes and repeating units
Markush Structure Features I
• Homology groups (MMS: superatoms, CAS: generic definition)
(property, growing list)
• Clean graphical representation
• All features supported in MRV format
Markush Structure Features II
Demo: Analyze Enumerated Library
• Import enumerated structures back into IJC
• Similarity search 1 - dissimilarity("PF","CCOC(=O)C1=CN=C2C=CC3=C(C(C)=C(C)N3C)C2=C1O")
• Overlap Analysis
– Between the exemplified structures and the
Markush enumerated chemical space
10
ChemAxon - Thomson Reuters
Markush project history
1987 Thomson Scientific (Derwent) starts indexing Markush structures
(in collaboration with Questel & INPI)
1998 INPI & Derwent Markush databases merge to form MMS
(Merged Markush Service)
2000 ChemAxon launches first version of JChem Base
2005 Chemaxon starts working on Markush technology
2008 Markush search & enumeration first release in JChem 5.0
2010 Markush DARC file format support in JChem 5.3
2012 Full MMS searchable with JChem 5.5-5.11
Search the Full Patent Database
• Complete patent database from Thomson Reuters dated back
to 1987
– VMN (Markush structures)
– DCR (Exemplified structures)
– DWPI text (non-structural information)
• Data stored in Amazon Cloud with Powerful virtual machine,
secure connection and confidential search
• Useful new features:
– Export exemplified structures
– Retrieve patent document
– Enumerate Markush structures and output result
– Notation
• Batch search of multiple queries
• Constantly improving search performance
12
Demo: Patent Database Search
• New search interface overview
• Customized buttons:
– Export exemplified structures
– Retrieve patent document
– Notation view
• Markush enumeration interface
• Structure search
– Improved R-group hit visualization
– Markush viewer
13
Acknowledgements
JChem base, Markush and IJC
Helpers - Cartridge, Markush Viewer,
Support, Marketing
• Steve Hajkowski
• Brian Larner
• Don Walter
• Gez Cross
• Tony Ferns
• Tim Miller
BACKUP SLIDES
15
Full Text Online
• Google Patent number (US4806538)
16
Paste Full Text URL to Chemicalize
17
Download All Structures
18
Import All Structures in Instant JChem
19
Add “Lipinski Rule of 5” Chemical Term
20
Filter Structures
21
Save as Permanent List
22
Query Builder
23
Generate a Markush Structure
24
Add new R3 Fragment
25
Enumerate Markush Structure
• Save all structures to a local file
26
Import Enumerated Structures to IJC
• Add a new data tree and structure table
• Import enumerated structures from the
Markush to this data tree
– Duplicate filter on
• Similarity calculation with “Key Compound”
SMILES: CSC[C@H]1NC(=O)[C@@H](CC2=CN(C)C3=CC=CC=C23)NC1=O
27
Similarity Search with “Key Compound”
Similarity_Command.txt
28
Overlap Analysis
• Between the exemplified structures and the
Markush enumerated chemical space
• Stereochemistry OFF
29
Current technology in use
Thomson Reuters content: Merged Markush Service, DCR, DWPI
ChemAxon software UI: Instant JChem
Current technology in use
Thomson Reuters content: MMS, Derwent’s Chemical Resource, DWPI
ChemAxon software UI: Instant JChem
Current technology in use
Thomson Reuters content: MMS, DCR, Derwent World Patent Index
ChemAxon software UI: Instant JChem
Export Exemplified Structures
33
Retrieve Patent Document
34
Add Notes
35
Notes Overview
36
Batch Search of Multiple Queries
37