the royal society of chemistry: research data and...
TRANSCRIPT
The Royal Society of Chemistry:
Research Data and Community
Initiatives Serin Dabb
Executive Editor, Data
@SerinDabb
Open Access
Open Data
• Recently acquired MarinLit
• The Merck Index* Online
• Literature
alerting services
Today
1. National Chemical Database Service
2. ChemSpider
3. Data Repository (in progress)
4. Data mining of Royal Society of Chemistry
publications (in progress)
• An online collection of scientific resources and databases
• Hosted by the RSC,
- funded by the EPSRC
National Chemical Database Service
Free for all UK
academia
http://cds.rsc.org
@cds_rsc
Predicts physicochemical properties, NMR spectra and chemical shifts
Also assesses prediction reliability and includes searchable content databases.
New data policies
EPSRC-funded research data is a public good produced in the public interest
and should be made freely and openly available with as few restrictions as
possible in a timely and responsible manner.
Institutional and project specific data management policies and plans should
be in accordance with relevant standards and community best practice and
should exist for all data. Data with acknowledged long term value should be
preserved and remain accessible and useable for future research
EPSRC Policy Framework on Research Data
+ + = Free to use
28 million structures
Over 400 different data sources, including:
PubChem, ChEBI
GSK Malarial compounds
Chemical Suppliers Catalogues
Patents
RSC journals
Physical properties, spectra, safety information and much more
Access it anywhere there’s internet
www.chemspider.com/
Adding and curating data
Adds quality and quantity… • Users can comment for others to action
• Registered users can: Curate names Add links, spectra, and multimedia resources
• Depositors can add new compounds
• Curators and Master Curators assess and approve additions
Repository
Chemical
Sciences Data Repository
User
searches
repository
Online
deposition
gateway
Compounds
Reactions
Spectra
Researcher collects data
(ELNs, drop box, PC,
documents)
Validation of
data
Repository
And more…
private
embargoed
public
Unanswered questions
• Level of curation from us
• Number of ‘required fields’
• Who can deposit? Who can have access (even if ‘private)?
• Standardisation of metadata
• What do people want to search on (properties? spectral features?)
• How does it link to the ESI?
• Should we assign DOIs for data?
Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser .
The reaction mixture was heated at reflux with stirring , for a period of about one-half hour .
After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t,
1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz,
C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
Figure Spectra into “Real
Spectra”?
• We are turning text into structures
• We are turning text into spectra
• And we are turning figures into spectra
Progress: Figures - Spectra
• Validation tests performed with William Brouwer. Good enough to proceed with larger test set
• Ready to run process across larger collection
• Focus on 21st century articles only for now
Example
Input : 74 supplementary data documents/ 3444 pages
1151 spectra > 80% of peaks extracted to within 1-2 decimal
places (ppm)
Acknowledgments
• Bill Brouwer (Penn State) – Plot2Txt Development
• Carlos Cobas and Santi Dominguez
• Bob Hanson and Bob Lancashire for Jmol/JSpecView Javascript version
• Leah McEwan and Will Dichtel
• ACD/Labs – Provider of spectroscopy tools
Royal Society of Chemistry eScience team:
Antony Williams, Colin Batchelor, Peter Corbett, Ken Karapetyan and Valery Tkachenko
Contact me: [email protected]
• Will you deposit your NMR spectra with ChemSpider?
• Would you like to be involved in future trials of the data
repository?
• Feedback on ChemSpider or the National Chemical Database
Service?
• Are there any repositories or data banks members of your
community currently use?
• What incentives would you need to deposit data?
• What functionality in a repository would you want (for example,
searching capabilities)?
• How is your institution currently handling research data
management? What plans are being put in place?
Questions for you