hosting public domain chemicals data online for the community – the challenges of handling...
TRANSCRIPT
![Page 1: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/1.jpg)
Hosting public domain chemicals data online for the community – the
challenges of handling materials
Antony WilliamsOpportunities in Materials Informatics, University of Wisconsin-Madison
February 9th, 2015
0000-0002-2668-4821
![Page 2: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/2.jpg)
About Me…• I am NOT a materials chemist• I am an NMR spectroscopist by training• Worked on a LIMS while at Kodak• 10 years in commercial cheminformatics• Built the ChemSpider database as a hobby• Worked on validating compounds on Wikipedia• Manage cheminformatics team for RSC• Believer in the value of social networking and
Open Data for science• Dane Morgan asked me to tell jokes…
![Page 3: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/3.jpg)
I would tell a chemistry joke…
But all of the good ones…
![Page 4: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/4.jpg)
An ambitious idea….
• Let’s map together all online chemistry data and build systems to integrate it
• Heck, let’s integrate chemistry and biology data and add in disease data too if we can
• Let’s extract property data and model it and see if we can extract new relationships – quantitative and qualitative
• Let’s make it all available on the web…for free
![Page 5: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/5.jpg)
![Page 6: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/6.jpg)
What about this….
• We’re going to map the world
• We’re going to take photos of as many places as we can and link them together
• We’ll let people annotate and curate the map• Then let’s make it available free on the web• We’ll make it available for decision making • Put it on Mobile Devices, give it away…
![Page 7: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/7.jpg)
Where is chemistry online?
• Encyclopedic articles (Wikipedia)• Chemical vendor databases• Metabolic pathway databases• Property databases• Patents with chemical structures• Drug Discovery data• Scientific publications
• Compound aggregators• Blogs/Wikis and Open Notebook Science
![Page 8: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/8.jpg)
Chemistry on the Internet…
• Most searching for chemistry on the internet…• Name searching Google/Bing/Yahoo• Name searching Wikipedia• Name searching Wolfram Alpha• Name, name, name, name…searching• Structure searching DOZENS of websites, each
with different information or…
![Page 9: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/9.jpg)
Chemistry on the Internet…
• Most searching for chemistry on the internet…• Name searching Google/Bing/Yahoo• Name searching Wikipedia• Name searching Wolfram Alpha• Name, name, name, name…searching• Structure searching DOZENS of websites, each
with different information or…
• Search ONE website integrating the others!
![Page 10: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/10.jpg)
• ~30 million chemicals and growing
• Data sourced from >500 different sources• Crowd sourced curation and annotation• Ongoing deposition of data from our
journals and our collaborators• Structure centric hub for web-searching
• …and a really big dictionary!!!
• Note…NOT all websites connected
![Page 11: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/11.jpg)
ChemSpider
![Page 12: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/12.jpg)
ChemSpider
![Page 13: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/13.jpg)
ChemSpider
![Page 14: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/14.jpg)
Experimental/Predicted Properties
![Page 15: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/15.jpg)
Literature references
![Page 16: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/16.jpg)
Patents references
![Page 17: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/17.jpg)
RSC Books
![Page 18: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/18.jpg)
Google Books
![Page 19: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/19.jpg)
Vendors and data sources
![Page 20: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/20.jpg)
APIs
![Page 21: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/21.jpg)
APIs
![Page 22: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/22.jpg)
Organic Chemistry is hard…
![Page 23: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/23.jpg)
…it has alkynes of trouble
![Page 24: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/24.jpg)
Flavors of Chemistry
![Page 25: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/25.jpg)
Molfiles 10 9 0 0 1 0 0 0 0 0 1 V2000 31.2937 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6526 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 31.2937 -7.7066 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -9.6877 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 25.5096 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 28.9731 -9.0366 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 27.8163 -9.7016 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 26.6664 -7.7066 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 32.4367 -9.6877 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 30.1161 -11.0177 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3 1 2 0 0 0 0 4 1 1 0 0 0 0 9 1 1 0 0 0 0 7 2 1 0 0 0 0 5 2 2 0 0 0 0 8 2 1 0 0 0 0 6 4 1 0 0 0 0 4 10 1 6 0 0 0 7 6 1 0 0 0 0M END
![Page 26: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/26.jpg)
Molfiles
• Molfiles are the primary exchange format between structure drawing packages
• Can be different between different drawing packages
• Most commonly carry X,Y coordinates for layout• Can support polymers, organometallics, etc.• Can carry 3D coordinates
![Page 27: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/27.jpg)
SMILES
• SMILES is a common format • Can support polymers,
organometallics, etc.• Does NOT carry X,Y or Z
coordinates for layout so requires layout algorithms – can be problematic!
• Generally different between drawing packages
![Page 28: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/28.jpg)
Stereo
![Page 29: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/29.jpg)
Tautomeric forms
![Page 30: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/30.jpg)
Vendor-dependent SMILESACD/LabsCC(C)CCC[C@@H](C)CCC[C@@H](C)CCCC(\C)=C\CC2=C(C)C(=O)c1ccccc1C2=O
OpenEyeCC1=C(C(=O)c2ccccc2C1=O)C/C=C(\C)/CCC[C@H](C)CCC[C@H](C)CCCC(C)C
ChEMBLCC(C)CCC[C@@H](C)CCC[C@@H](C)CCC\C(=C\CC1=C(C)C(=O)c2ccccc2C1=O)\C
![Page 31: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/31.jpg)
Chemists are good…
![Page 32: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/32.jpg)
The InChI Identifier
![Page 33: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/33.jpg)
InChI
• SINGLE code base managed by IUPAC – integrated into drawing packages. No variability as with SMILES
• InChI Strings can be reversed to structures – same problem as with SMILES – no layout
• Adopted by the community (databases, blogs, Wikipedia) – good for searching the internet
![Page 34: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/34.jpg)
Multiple Layers
![Page 35: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/35.jpg)
Tautomers
![Page 36: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/36.jpg)
Stereo
![Page 37: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/37.jpg)
InChIStrings Hash to InChIKeys
![Page 38: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/38.jpg)
Structure search the web
![Page 39: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/39.jpg)
Exact Search
![Page 40: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/40.jpg)
Skeleton Search
![Page 41: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/41.jpg)
Data Quality/Standardization
• MANY structures meant to be something online are MISREPRESENTED.
• Commonly you will have better success finding information by name searches than structure – with many caveats of course…
• Validating chemical structure representations is laborious work – and it’s shocking to review data…
![Page 42: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/42.jpg)
Data Quality IssuesWilliams and Ekins, DDT, 16: 747-750 (2011)
Science Translational Medicine 2011
![Page 43: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/43.jpg)
Data quality is a known issue
![Page 44: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/44.jpg)
Data quality is a known issue
![Page 45: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/45.jpg)
Substructure # of
Hits
# of
Correct
Hits
No
stereochemistry
Incomplete
Stereochemistry
Complete but
incorrect
stereochemistry
Gonane 34 5 8 21 0
Gon-4-ene 55 12 3 33 7
Gon-1,4-diene 60 17 10 23 10
Only 34 out of 149 structures were correct!
![Page 46: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/46.jpg)
Patent data in public databases
![Page 47: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/47.jpg)
Patent data in public databases
![Page 48: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/48.jpg)
You just can’t trust atoms!
![Page 49: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/49.jpg)
You just can’t trust atoms!They make up everything…
![Page 50: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/50.jpg)
ALL variants of Yohimbine!!!
![Page 51: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/51.jpg)
What’s Methane? OLD PUBCHEM
![Page 52: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/52.jpg)
What ELSE is Methane???
![Page 53: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/53.jpg)
NEW PUBCHEM
![Page 54: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/54.jpg)
Depiction vs Accurate Representation
![Page 55: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/55.jpg)
Depiction vs Accurate Representation
![Page 56: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/56.jpg)
What is the Structure of Vitamin K1?
![Page 57: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/57.jpg)
Standardize
• Use the SRS as a guidance document for standardization
• Adjust as necessary to our needs
![Page 58: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/58.jpg)
Nitro groups
![Page 59: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/59.jpg)
Salt and Ionic Bonds
![Page 60: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/60.jpg)
Ammonium salts
![Page 61: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/61.jpg)
Can we MAKE Quality Data?
• We are building systems for everyone to validate and standardize their data
![Page 62: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/62.jpg)
DICTIONARIES are powerful
• Search all forms of structure IDs• Systematic name(s)• Trivial Name(s)• SMILES• InChI Strings• InChIKeys• Database IDs
• Registry Number
![Page 63: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/63.jpg)
Many Names, One Structure
![Page 64: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/64.jpg)
But big and often noisy
![Page 65: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/65.jpg)
Text-Mining and Markup…
![Page 66: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/66.jpg)
Text-Mining and Markup…
![Page 67: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/67.jpg)
With links out to platforms
![Page 68: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/68.jpg)
Dictionaries are invaluable
![Page 69: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/69.jpg)
Text Mining on IUPAC Names
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser .
The reaction mixture was heated at reflux with stirring , for a period of about one-half hour .
After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
![Page 70: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/70.jpg)
Text Mining on IUPAC Names
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser .
The reaction mixture was heated at reflux with stirring , for a period of about one-half hour .
After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
![Page 71: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/71.jpg)
Name to Structure Conversion
![Page 72: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/72.jpg)
Name to Structure Conversion
![Page 73: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/73.jpg)
ChemSpider “Annotations”
• Users can add • Descriptions, Syntheses and Commentaries• Links to PubMed articles• Links to articles via DOIs • Add spectral data• Add Crystallographic Information Files• Add photos• Add MP3 files• Add Videos
![Page 74: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/74.jpg)
Spectral Data
• Spectral data to be deposited in standard formats – JCAMP or images
• All spectra available at: http://www.chemspider.com/spectra.aspx
• Data are deposited on a regular basis• Students
• Chemical vendors• Growing collection now
![Page 75: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/75.jpg)
Student Submissions
![Page 76: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/76.jpg)
Data on ChemSpider
![Page 77: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/77.jpg)
Spectral Data EXTRACTION
![Page 78: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/78.jpg)
ORIGINAL
EXTRACTED
![Page 79: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/79.jpg)
It’s exactly the WRONG WAY!
• We should NOT be mining data out of future publications
• Structures should be submitted “correctly” • Spectra should be digital spectral formats,
not images• ESI should be RICH and interactive,
preferably with OPEN DATA
![Page 80: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/80.jpg)
An Adventure into the World of Small but significant contribution..
![Page 81: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/81.jpg)
ChemSpider SyntheticPages
![Page 82: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/82.jpg)
Micropublishing with Peer Review(a chemical synthesis blog?)
![Page 83: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/83.jpg)
Multi-Step Synthesis
![Page 84: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/84.jpg)
Interactive Data
![Page 85: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/85.jpg)
Chemistry data is of value?
• Reference databases generate hundreds of millions of dollars/euros per year
• So much data generated that could go public• Maybe 5% of all data generated is published• There is no “Journal of Failed Experiments”• Funding agencies start to demand Open Data• Scientists want funding but also recognition
![Page 86: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/86.jpg)
A shift to Openness
![Page 87: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/87.jpg)
How will I get recognized?
• Who in the room has an ORCID?
![Page 88: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/88.jpg)
Deposition of Research Data
• If we manage compounds, syntheses and analytical data…
• If we have security and provenance of data…
• If we deliver user interfaces to satisfy the various use cases…
• Then we have delivered electronic lab notebooks for chemistry laboratories. ELNs are research data repositories
![Page 89: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/89.jpg)
Recognition: need to have Impact
![Page 90: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/90.jpg)
Quantitating scientists?
![Page 91: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/91.jpg)
National Information Standards Organization and “Altmetrics”
http://www.niso.org/apps/group_public/download.php/13295/niso_altmetrics_white_paper_draft_v4.pdf
![Page 92: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/92.jpg)
What are we building?
• We are building the “RSC Data Repository”
• Containers for compounds, reactions, analytical data, tabular data
• Algorithms for data validation and standardization • Flexible indexing and search technologies• A platform for modeling data and hosting existing
models and predictive algorithms
![Page 93: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/93.jpg)
Compounds
![Page 94: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/94.jpg)
Reactions
![Page 95: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/95.jpg)
Analytical data
![Page 96: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/96.jpg)
Crystallography data
![Page 97: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/97.jpg)
Deposition of Data
• Developing systems that provides feedback to users regarding data quality• Validate/standardize chemical compounds
• Check for balanced reactions• Checks spectral data
• EXAMPLE Future work• Properties – compare experimental to pred.• Automated structure verification - NMR
![Page 98: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/98.jpg)
So we know about ORGANICS
• Comment – you don’t know all of the challenges until you start to work in the area!
• We, and cheminformatics companies, have solved MANY, but not all of the issues regarding organic chemistry management
• The majority of our approaches do not map to materials • No standard ways to represent compounds• No InChI for materials
![Page 99: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/99.jpg)
Questions to consider…
• Organics are hard enough!
• What are your best dictionaries of materials?
• We have chemical ontologies. Status for materials?
• Is open annotation of your databases possible?
• What standards do you have for materials data exchange?
![Page 100: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/100.jpg)
Polymorphism is common
![Page 101: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/101.jpg)
Known Challenges
• Many materials are non-stoichiometric
• How to represent composite materials (e.g. supported catalysts)?
• Methods to distinguish novelty in materials (equivalent to diversity in organic structures)?
• Many more I will learn at this workshop..?
![Page 102: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/102.jpg)
Collaboration is key
![Page 103: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/103.jpg)
Internet Data
The Future
Commercial SoftwarePre-competitive Data
Open ScienceOpen DataPublishersEducators
Open DatabasesChemical Vendors
Small organic moleculesUndefined materialsOrganometallicsNanomaterialsPolymersMineralsParticle boundLinks to Biologicals
![Page 104: Hosting public domain chemicals data online for the community – the challenges of handling materials](https://reader034.vdocuments.us/reader034/viewer/2022042817/55a609d51a28ab26638b46b6/html5/thumbnails/104.jpg)
Thank you
Email: [email protected]: 0000-0002-2668-4821 Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams