marrying acd/labs technologies to escience projects at the royal society of chemistry

65
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013

Upload: ianthe

Post on 27-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013. RSC eScience. Royal Society of Chemistry is a member society (>47,000), Publisher and Innovator in eScience Host of many online databases and services - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry

Antony WilliamsACD/Labs User Meeting

June 2013

Page 2: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

RSC eScience• Royal Society of Chemistry is a member society

(>47,000), Publisher and Innovator in eScience• Host of many online databases and services

– ChemSpider, SyntheticPages, SpectraSchool,…

• Participant in multiple grant-based projects– National Chemical Database Service– Open PHACTS – PharmaSea

Page 3: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Multiple ACD/Labs Tools in use…

• Structure “checking” routines for data• Nomenclature generation and conversion• Physicochemical prediction algorithms• Web-based spectral display widget• “Interactive Lab” web-based prediction tools

• But first an intro to ChemSpider…

Page 5: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

I want to know about “Vincristine”

Page 6: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

I want to know about “Vincristine”

Page 7: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Vincristine: Identifiers and Properties

Page 8: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Predicted Properties

Page 9: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Vincristine: Vendors and SourcesLinked by Structure

Page 10: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Vincristine: Patents

Page 11: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Google Patents

Page 12: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Vincristine: ArticlesLinked by Name

Page 13: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

RSC Databases

Page 14: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

RSC Database Linkthrough

Page 16: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Spectra

Page 17: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Where do data come from?

• ChemSpider users deposit data• Some contributions from NIST• Chemical vendors are starting to provide data.

Synthonix are one of our major contributors (www.synthonix.com)

Page 18: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Crowdsourced “Annotations”• Users can add

– Compounds– Descriptions/Syntheses/Commentaries– Links to articles via DOIs – Add spectral data– Add Crystallographic Information Files– Add photos– Add MP3 files– Add Videos

Page 19: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Crowdsourced Curation

• Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Page 21: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Spectral Uploading• Various types of NMR spectra supported

Page 22: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Regular Updates

Page 23: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Multiple Spectra for One Structure

Page 24: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

ChemSpider ID 24528095 H1 NMR

Page 25: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

ChemSpider ID 24528095 C13 NMR

Page 26: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

ChemSpider ID 24528095 HHCOSY

Page 27: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

ChemSpider ID 24528095 HSQC

Page 28: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

ChemSpider ID 24528095 HMBC

Page 29: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Available Spectra http://www.chemspider.com/spectra.aspx

Page 30: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Number of Spectra

• IR 5389• HNMR 1679• CNMR 1207• UV-Vis 183• EI 90• 2D1H13CD 68

• Raman 51• NIR 32• 2D1H1HCOSY 21• 2D1H13CLR 10• CI+ve 8• PNMR 7

• 9746 spectra against 6890 compounds

Page 31: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Some usage statistics• ca. 200 visitors at any one time, ~30,000 visits per day• Mar 4-Apr 3, 2013

– Visits = 731,656– Unique Visitors = 527,008

• Independent servers to support other projects

• Does not include web service calls

Page 32: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

ChemSpider as a Foundation

• ChemSpider is a foundation for projects:– >500 data sources aggregated and mapped – Continually curated and updated with new data– Normalized data around a structure centric data

model– Providing an API allows integration to support other

internal projects– Providing API access outside RSC extends the reach

Page 33: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Micropublishing Syntheses

Page 34: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

ChemSpider SyntheticPages

Page 35: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Olympicene

Page 36: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry
Page 37: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry
Page 38: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Web ServicesExample: Spectral Data

Page 39: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

www.SpectralGame.comhttp://www.jcheminf.com/content/1/1/9

Page 40: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Spectral Game

Page 41: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Increasing Complexity

Page 42: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

SpectralGame in the hand

Page 43: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

SpectraSchool http://spectraschool.rsc.org/

Page 44: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

SpectraSchool

Page 45: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Recently Added– THANKS ACD/Labs!• Storage and display of ASSIGNED spectra

Page 46: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Access ChemSpider

• APIs– Programmatic access used by Mobile Apps, Funded

Consortia projects, many Academic groups

• Widgets– UI components for embedding in other websites

• Data– Data access, downloads, reuse, licensing

Page 47: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Flexible ChemSpider API

Page 48: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Flexible ChemSpider API

Page 49: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Linking Names to Structures

Page 50: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

It is so difficult to navigate…

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections to

disease?Connections to

disease?

Expressed in right cell type?

Expressed in right cell type?

Competitors?Competitors?

IP?IP?

Page 51: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

• 3-year Innovative Medicines Initiative project

• Integrating chemistry and biology data using semantic web technologies

• Open source code, open data and open standards

• Academics, Pharma companies, Publishers

Page 52: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

ChemSpider Contributions

• The host of the chemistry services– Supplier of “standardized” chemical data files– Chemistry searching (structure, substructure etc)– Curator and data quality checking

• Presently rolling out the Open PHACTS chemical registration system

Page 53: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

• FP7 Initiative. PharmaSea: increasing value and flow in the marine biodiscovery pipeline (2012-2017)

• Improve the quality, volume and value of active agents discovered in the marine environment and increase the speed at which they can be delivered

Page 54: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

PharmaSea• Dereplication via ChemSpider• Hosting of natural products datasets• Integrated storage of analytical data (ACD/Labs)• Analytical data algorithms & integration

– Mass spec searching – predicted fragmentation– NMR feature searching – NMR prediction– Computer-assisted structure elucidation

• Integration to ACD/Structure Elucidator

Page 55: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

UK Chemical Database Service

Page 56: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Ilab Integration – NMR DB Searching

Page 57: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Ilab Integration – NMR Prediction

Page 58: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

National Chemistry Data Repository

• Imagine all chemistry related data from all academic projects in the UK in ONE system

• Security model for the data to be embargoed, private or public (available to the entire world!)

• Provide tools for easy data upload, review, automated validation – chemicals, reactions, spectral data, alphanumeric data

• Use the data for algorithm training…

Page 59: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

In Discussions At Present

• Develop the worlds largest online spectroscopy database of integrated data

• Does ACD/Labs have tools to help?– Automated depositions – Silent Automation– Processing and validation – Spectrus – Databasing – Spectrus DB– Web-based integration into ChemSpider

Page 60: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Where else can we get RICH data?

Page 61: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

DERA : Data Enable the RSC Archive

• How much data is in the archive, in the publications and in the supplementary info?– How many compounds for ChemSpider?– How many syntheses for ChemSpider reactions?– How many characterization measurements?

• Property Data• Spectral Data• Graphs and charts to be used for modeling?

Page 62: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

What if we could capture it all?

Page 63: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

The Future of Data

• In Publications– Interactive plots, spectra, buy that compound,

predict that property– Validation of data going INTO publications – NMR

prediction, CASE validation, PhysProp comparisons

• From the lab– How much data NEVER gets published and is still

useful? Failed Reactions? More Open Data…

Page 64: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Acknowledgements

• RSC eScience Team • ACD/Labs – Pranas Japartas and Karim Kassam• GGA – Indigo Toolkit and Bingo Cartridge• The community of depositors• The Open Source Community

Page 65: Marrying ACD/Labs technologies  to eScience Projects at the  Royal Society of Chemistry

Thank you

Email: [email protected] Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams