chemical databasing: state of the art and current challenges · 2015-07-16 · chemical databasing:...

56
Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry Kazan Summer School on Cheminformatics Kazan, Russia July 6 th 2015

Upload: others

Post on 27-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Chemical databasing: state of the art and current

challenges

Valery Tkachenko

Royal Society of Chemistry

Kazan Summer School on Cheminformatics

Kazan, Russia

July 6th 2015

Page 2: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Why databases?

Efficient storage

Quick access (browse, search)

ACID (Atomicity, Consistency, Isolation,

Durability)

Scalability

Migrations

Security

Safety (backup/restore)

Page 3: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Database – model and data

Page 4: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Database – relational example

Page 5: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Chemical database

Page 6: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Chemistry-specific searches

Identity – same atoms connected in the same way

Substructure – find all chemicals having query as a substructure

Superstructure – find all chemicals which are substructures of a query

Similarity – find all “similar” chemicals

Page 7: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

InChI (http://www.inchi-trust.org/)

Page 8: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Pidolic acid

Page 9: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Fingerprints

Human Molecule

Page 10: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

SciFinder

Page 11: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Reaxys

Page 12: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

PubChem

Page 13: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

• 32 million chemicals and growing

• Data sourced from >500 different sources

• Crowdsourced curation and annotation

• Ongoing deposition of data from our

journals and our collaborators

• A structure centric hub for web-searching

Page 14: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

ChemSpider

Page 15: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

ChemSpider

Page 16: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Properties - experimental

Page 17: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Properties - ACDLabs

Page 18: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Properties – EPI Suite

Page 19: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Properties - ChemAxon

Page 20: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Literature references

Page 21: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Patents references

Page 22: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Books

Page 23: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Classification

Page 24: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Chemical vendors and datasources

Page 25: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Multimedia

Page 26: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry
Page 28: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Chemical space - 1060

Page 29: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

RSC Archive – since 1841

Page 30: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Digitally Enabling RSC Archive

Page 31: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Advanced Search

Page 32: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

It is so difficult to navigate…

What’s the

structure?

Are they in

our file?

What’s

similar?

What’s the

target? Pharmacology

data?

Known

Pathways?

Working On

Now? Connections

to disease?

Expressed in

right cell type?

Competitors?

IP?

Page 33: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

ChemSpider Synthetic Pages

Compounds

Reaction

Analytical Data

Text and References

Page 34: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Electronic Laboratory Notebook (ELN)

Page 35: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

RSC Data Repository

Data Repository

PropertiesNames and Identifiers

Spectra ArticlesData

CollectionsPatents Etc

Page 36: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Input pipeline

Page 37: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Output pipeline

Page 38: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

RSC Databases

RSC Compounds

RSC Reactions

RSC Spectra

RSC Crystals

RSC Polymers

RSC Materials

RSC Assays

RSC Algorithms

RSC Models

…and on…

Page 39: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Compounds domain

Page 40: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Data quality issue and CVSP

– Robochemistry

– Proliferation of errors in public and

private databases • ChemSpider

• PubChem

• DrugBank

• KEGG

• ChEBI/ChEMBL

– Automated quality control system

Page 41: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Chemistry Validation and Standardization Platform

Page 42: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Chemistry Validation and Standardization Platform

Page 43: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Reactions domain

Page 44: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Reactions domain

• ChemSpider Synthetic Pages

• Methods in Organic Synthesis

• Catalysts and Catalyzed Reactions

• USPTO

Page 45: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Reactions domain

Page 46: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Analytical data domain

Page 47: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Crystallography domain

Page 48: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

3D printable structures

Page 49: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

We are a part of a larger world

Page 50: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Who is involved?

29 partners

Page 51: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Research questions

Page 52: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

OpenPHACTS Architecture

Page 53: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

OpenPHACTS UI

http://explorer.openphacts.org/

Page 54: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

National Chemistry Database

Page 55: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry
Page 56: Chemical databasing: state of the art and current challenges · 2015-07-16 · Chemical databasing: state of the art and current challenges Valery Tkachenko Royal Society of Chemistry

Thank you

Email: [email protected]

Slides:

http://www.slideshare.net/valerytkachenko16