antony williams 5th meeting on u.s. government chemical databases and open chemistry august 2011
DESCRIPTION
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry Resources (and lessons from President Bush). Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011. I want to know about “Vincristine”. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/1.jpg)
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry Resources (and lessons from President Bush)
Antony Williams5th Meeting on U.S. Government Chemical Databases and Open Chemistry
August 2011
![Page 2: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/2.jpg)
I want to know about “Vincristine”
![Page 3: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/3.jpg)
Vincristine: Identifiers and Properties
![Page 4: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/4.jpg)
Vincristine: Vendors and Sources
![Page 5: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/5.jpg)
Vincristine: Patents
![Page 6: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/6.jpg)
Vincristine: Articles
![Page 7: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/7.jpg)
Vincristine: RSC Databases
![Page 8: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/8.jpg)
Searches: The INTERNET
![Page 9: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/9.jpg)
Validated Names for Searching…
![Page 10: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/10.jpg)
And InChIs…
![Page 11: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/11.jpg)
ChemSpider
The Free Chemical Database
A central hub for chemists to source information >26 million unique chemical records Aggregated from >400 data sources Chemicals, spectra, CIF files, movies, images,
podcasts, links to patents, publications, predictions
A central hub for chemists to deposit & curate data
![Page 12: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/12.jpg)
Essential aspects of ChemSpider
ChemSpider is a BIG database..and growing
Our focus has increasingly become QUALITY over quantity
Data curation and validation is our strength – crowdsourcing is contributing, more is required
Validated data has enabled linking of the internet
![Page 13: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/13.jpg)
There are NO errors in ChemSpider
![Page 14: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/14.jpg)
There are NO errors in ChemSpider
![Page 15: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/15.jpg)
“All That Glisters is Not Gold”What is the structure of Discodermolide?
![Page 16: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/16.jpg)
How to distinguish…who’s wrong?
![Page 17: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/17.jpg)
Neither is wrong
![Page 18: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/18.jpg)
Data Curation…long torturous task
Data curation – JUST structure-name validation is a long, torturous, iterative task.
How about validating “data” – PhysChem data such as logP data, boiling points, melting points (J.C.Bradley’s talk), spectra
![Page 19: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/19.jpg)
Hand on my heart….
![Page 20: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/20.jpg)
Hand on my heart
No offence meant by what follows! We ALL have quality issues!
![Page 21: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/21.jpg)
PHYSPROP Database
The freely downloadable database under the EPI Suite prediction software
Very Basic filters suggest data quality issues
![Page 22: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/22.jpg)
The Stereochemistry challenge.12500 chemicals with “missed” stereo
![Page 23: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/23.jpg)
NIST Webbook
![Page 24: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/24.jpg)
EPA’s DailyMed
![Page 25: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/25.jpg)
EPA’s DailyMed
![Page 26: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/26.jpg)
EPA’s DailyMed
![Page 27: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/27.jpg)
PubChem
![Page 28: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/28.jpg)
Linking
![Page 29: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/29.jpg)
![Page 30: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/30.jpg)
![Page 31: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/31.jpg)
![Page 32: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/32.jpg)
![Page 33: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/33.jpg)
Patents
![Page 34: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/34.jpg)
Patents
![Page 35: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/35.jpg)
WYSIWYG compounds
![Page 36: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/36.jpg)
WYSIWYG compounds
![Page 37: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/37.jpg)
Data Curation…long torturous task
Data curation – JUST structure-name validation is a long, torturous, iterative task.
How about validating “data” – PhysChem data such as logP data, boiling points, melting points (J.C.Bradley’s talk), spectra
The crowd in crowdsourcing is …generally small
Which of the large databases are doing careful curation. How can we share the workload? Hmm..
![Page 38: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/38.jpg)
Consider searching each of these chemical databases by chemical name (systematic name, trade name or synonym). Please mark each online resource according to how much you generally trust the results.
![Page 39: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/39.jpg)
![Page 40: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/40.jpg)
Drug Name Generic Name ChEBI ChemSpiderCAS Com.
Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia
SpirivaTiotropium Bromide
No Hits No Hits 4/0
DepakoteValproate semisodium No
Structure
Basen Voglibose No Hits No Hits 2/1 Symbicort 1) Budesonide 8/1 Symbicort 2) Formoterol WRONG No Hits 6/1 Vytorin 1) Ezetimibe No Hits Vytorin 2) Simvastatin 2/1 Taxol Paclitaxel 44/1 Thalidomid Thalidomide No Hits Zocor Simvastatin 2/1 Crestor Rosuvastatin No Hits 2/1
![Page 41: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/41.jpg)
Who does the Curation?
![Page 42: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/42.jpg)
ChemSpider can “do it” for us
ChemSpider has built a curation interface used by the community and ourselves for curating.
All curation activities are available for review, online immediately, iteratively checked.
Curators have different abilities based on their profile: There are only a few “Master Curators”.
Can we “share” the curation workload?
![Page 43: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/43.jpg)
Proof of Concept Data Curation Sharing
![Page 44: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/44.jpg)
Identifier Dictionaries
Reciprocal curation processes…share curation with each other.
If a database has a compound already then use InChiKeys to match “suggested” validation against the compound.
A series of “added” and “removed” synonyms against InChIKeys for matching.
Who will participate???
![Page 45: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/45.jpg)
Proof of Concept Data Curation Sharing
![Page 46: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/46.jpg)
Lessons Learned : Big vs Good!
![Page 47: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/47.jpg)
15 compounds called Yohimbine54 Skeletons for Yohimbine
![Page 48: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/48.jpg)
Aggegators suffer dilution…
![Page 49: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/49.jpg)
User Understanding of Data
Users searching “Yohimbine” expect to find it…not labeled versions of it, not ambiguous stereochemistries, not partial stereochemistries.
Data “aggregation” into a meaningful form is a major challenge. e.g. Assays for radiolabeled compounds linked to actual drugs.
Data curation efforts such as ChEMBL are essential!
![Page 50: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/50.jpg)
SciMobileApps.com
![Page 51: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/51.jpg)
SciDBs.com (Coming soon)
![Page 52: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/52.jpg)
Open PHACTS : partnership between European Community and EFPIA
Freely accessible for knowledge discovery and verification. Data on small molecules Pharmacological profiles Pharmacokinetics ADMET data Biological targets and pathways Proprietary and public data sources.
![Page 53: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/53.jpg)
Standardization and Quality
Our initial approaches to standardization were imperfect. We are revisiting to support OpenPHACTS.
Highly dependent on InChI and not enough standardization prior to InChI generation.
InChI is excellent and acknowledged imperfect. Way better than SMILES for linking the internet!
![Page 54: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/54.jpg)
Conclusions
ChemSpider is one of many important chemistry resources on the internet
We have assumed an important role of curating and validating data – specifically name-structure dictionaries are of high importance but data validation is also key
We are a part of the federation of internet databases serving chemistry. MORE collaboration can serve us all better…how?
![Page 55: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/55.jpg)
A Plea to Gov’t DBs…
Please improve gov’t DB communications
![Page 56: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/56.jpg)
A Plea to Gov’t DBs…
Please improve gov’t DB communications
Please buddy up and get closer together
![Page 57: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/57.jpg)
A Plea to Gov’t DBs…
Please improve gov’t DB communications
Please buddy up and get closer together
Get into deep conversations
![Page 58: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/58.jpg)
Acknowledgments
Our development team – headed by THAT man..
Many in this room: InChI, PubChem, DssTOX, FDA, ChEBI/ChEMBL, SureChem, many more
Curators – special gratitude to Barrie Walker!
Software providers – OpenEye, ChemDoodle, ACD/Labs, GGA Software, Open Source (Jmol, JSpecView, OpenBabel)
![Page 59: Antony Williams 5th Meeting on U.S. Government Chemical Databases and Open Chemistry August 2011](https://reader036.vdocuments.us/reader036/viewer/2022062520/56815dfd550346895dcc3a15/html5/thumbnails/59.jpg)
Thank you
Email: [email protected] Twitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams