the royal society of chemistry: research data and...

36
The Royal Society of Chemistry: Research Data and Community Initiatives Serin Dabb Executive Editor, Data [email protected] @SerinDabb

Upload: phamxuyen

Post on 06-Feb-2018

234 views

Category:

Documents


7 download

TRANSCRIPT

The Royal Society of Chemistry:

Research Data and Community

Initiatives Serin Dabb

Executive Editor, Data

[email protected]

@SerinDabb

Open Access

Open Data

• Recently acquired MarinLit

• The Merck Index* Online

• Literature

alerting services

Today

1. National Chemical Database Service

2. ChemSpider

3. Data Repository (in progress)

4. Data mining of Royal Society of Chemistry

publications (in progress)

• An online collection of scientific resources and databases

• Hosted by the RSC,

- funded by the EPSRC

National Chemical Database Service

Free for all UK

academia

http://cds.rsc.org

@cds_rsc

Predicts physicochemical properties, NMR spectra and chemical shifts

Also assesses prediction reliability and includes searchable content databases.

Learning materials:

Videos

Factsheets

Worked examples

National Chemical Database Service

…..AND

A data

repository for

the UK

academic

community

New data policies

EPSRC-funded research data is a public good produced in the public interest

and should be made freely and openly available with as few restrictions as

possible in a timely and responsible manner.

Institutional and project specific data management policies and plans should

be in accordance with relevant standards and community best practice and

should exist for all data. Data with acknowledged long term value should be

preserved and remain accessible and useable for future research

EPSRC Policy Framework on Research Data

+ + = Free to use

28 million structures

Over 400 different data sources, including:

PubChem, ChEBI

GSK Malarial compounds

Chemical Suppliers Catalogues

Patents

RSC journals

Physical properties, spectra, safety information and much more

Access it anywhere there’s internet

www.chemspider.com/

Adding and curating data

Adds quality and quantity… • Users can comment for others to action

• Registered users can: Curate names Add links, spectra, and multimedia resources

• Depositors can add new compounds

• Curators and Master Curators assess and approve additions

Repository

Chemical

Sciences Data Repository

User

searches

repository

Online

deposition

gateway

Compounds

Reactions

Spectra

Researcher collects data

(ELNs, drop box, PC,

documents)

Validation of

data

Repository

And more…

private

embargoed

public

Data Repository: Compounds

Data Repository: Compounds

Data Repository: Compounds

Data Repository: Reactions

Data Repository: Spectra

Data Repository: Spectra

ChemSpider SyntheticPages

Unanswered questions

• Level of curation from us

• Number of ‘required fields’

• Who can deposit? Who can have access (even if ‘private)?

• Standardisation of metadata

• What do people want to search on (properties? spectral features?)

• How does it link to the ESI?

• Should we assign DOIs for data?

Data mining of historic

content

Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser .

The reaction mixture was heated at reflux with stirring , for a period of about one-half hour .

After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue

Converting Text to Spectra

1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t,

1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz,

C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)

Visualisation of spectra

• Javascript viewer with JMol

Figure Spectra into “Real

Spectra”?

• We are turning text into structures

• We are turning text into spectra

• And we are turning figures into spectra

Turn “Figures” Into Data

• Figure

• Extracted

Data

EXTRACTED

DATA

FIGURE

Progress: Figures - Spectra

• Validation tests performed with William Brouwer. Good enough to proceed with larger test set

• Ready to run process across larger collection

• Focus on 21st century articles only for now

Example

Input : 74 supplementary data documents/ 3444 pages

1151 spectra > 80% of peaks extracted to within 1-2 decimal

places (ppm)

Acknowledgments

• Bill Brouwer (Penn State) – Plot2Txt Development

• Carlos Cobas and Santi Dominguez

• Bob Hanson and Bob Lancashire for Jmol/JSpecView Javascript version

• Leah McEwan and Will Dichtel

• ACD/Labs – Provider of spectroscopy tools

Royal Society of Chemistry eScience team:

Antony Williams, Colin Batchelor, Peter Corbett, Ken Karapetyan and Valery Tkachenko

Contact me: [email protected]

• Will you deposit your NMR spectra with ChemSpider?

• Would you like to be involved in future trials of the data

repository?

• Feedback on ChemSpider or the National Chemical Database

Service?

• Are there any repositories or data banks members of your

community currently use?

• What incentives would you need to deposit data?

• What functionality in a repository would you want (for example,

searching capabilities)?

• How is your institution currently handling research data

management? What plans are being put in place?

Questions for you