connecting chemistry across the internet using chemspider

42
Connecting Chemistry Across the Internet Using ChemSpider Antony J Williams and Valery Tkachenko SERMACS, November 15 th 2012

Upload: orcid-0000-0002-2668-4821

Post on 10-May-2015

542 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Connecting Chemistry Across the Internet Using ChemSpider

Connecting Chemistry Across the Internet Using ChemSpider

Antony J Williams and Valery TkachenkoSERMACS, November 15th 2012

Page 2: Connecting Chemistry Across the Internet Using ChemSpider

Chemistry Data and the Weeds

Page 3: Connecting Chemistry Across the Internet Using ChemSpider

Tell me about Roundup

Page 4: Connecting Chemistry Across the Internet Using ChemSpider

So what is Round Up?

Page 5: Connecting Chemistry Across the Internet Using ChemSpider

The World’s Encyclopedia

Page 6: Connecting Chemistry Across the Internet Using ChemSpider

Roundup

Page 7: Connecting Chemistry Across the Internet Using ChemSpider

Where do we Round Up data?

Where can I find the molfile for Roundup? Papers/Patents about Roundup? What are the side effects of Roundup? Where can I order Roundup? What are the physicochemical properties? Metabolic pathways? Different synonyms of Roundup? Synthesis of Roundup? Side effects of Roundup? Etc….

Page 9: Connecting Chemistry Across the Internet Using ChemSpider
Page 10: Connecting Chemistry Across the Internet Using ChemSpider

In an increasing LinkedData map….

Page 11: Connecting Chemistry Across the Internet Using ChemSpider

But I want to aggregate data? So…

Page 12: Connecting Chemistry Across the Internet Using ChemSpider

ChemSpider

Takes on the role of a structure centric hub:

Connecting, validating, qualifying data Enhancing data with connections to services Provides access to data and services for others

to use (Thermo, Agilent, Bruker, Waters, ACD/Labs, Accelrys, etc.)

Uses available services to integrate, connect and enhance the offering

Page 13: Connecting Chemistry Across the Internet Using ChemSpider

Roundup on ChemSpider

Page 14: Connecting Chemistry Across the Internet Using ChemSpider

What will ChemSpider give us??

Page 15: Connecting Chemistry Across the Internet Using ChemSpider

What will ChemSpider give us??

Page 16: Connecting Chemistry Across the Internet Using ChemSpider

What will ChemSpider give us??

Page 17: Connecting Chemistry Across the Internet Using ChemSpider

What will ChemSpider give us??

Page 18: Connecting Chemistry Across the Internet Using ChemSpider

What will ChemSpider give us??

Page 19: Connecting Chemistry Across the Internet Using ChemSpider

What will ChemSpider give us??

Page 20: Connecting Chemistry Across the Internet Using ChemSpider

ChemSpider is Collapsing Data???

Page 21: Connecting Chemistry Across the Internet Using ChemSpider

What will ChemSpider give us??

Page 22: Connecting Chemistry Across the Internet Using ChemSpider

For Glyphosate itself

Page 23: Connecting Chemistry Across the Internet Using ChemSpider

How did we build it?

We deal in Molfiles or SDF files – with coordinates Deposit anything that has an InChI – we support

what InChI can handle, good and bad Standardization based on “InChI standardization” InChIs aggregate (certain) tautomers

How much of ChemSpider is “on ChemSpider”?

Page 24: Connecting Chemistry Across the Internet Using ChemSpider

Connecting Chemistry across the web

So much of what is seen on ChemSpider is retrieved in real time using services

Page 25: Connecting Chemistry Across the Internet Using ChemSpider

Connecting Chemistry across the web

Page 26: Connecting Chemistry Across the Internet Using ChemSpider

Online Predictions

Page 27: Connecting Chemistry Across the Internet Using ChemSpider

A Comment on Quality

For >28 million chemical compounds there are some errors:

“Incorrect” structure representations Mismatched name-structure relationships Experimental properties (the values, the units) Real vs. virtual compounds – text-mining and

conversion

We have deprecated a LOT of data…

Page 28: Connecting Chemistry Across the Internet Using ChemSpider

Downsides of InChI

Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules”

InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…

Page 29: Connecting Chemistry Across the Internet Using ChemSpider

Side Effects of InChI Usage

Page 30: Connecting Chemistry Across the Internet Using ChemSpider

SMILES by comparison…

Page 31: Connecting Chemistry Across the Internet Using ChemSpider

Side Effects of InChI Usage

Page 32: Connecting Chemistry Across the Internet Using ChemSpider

Standardization IssuesDepiction based on molfile

Page 33: Connecting Chemistry Across the Internet Using ChemSpider

Downsides of Overall Approach

Meshing data together based on InChIs worked for simple molecules

2D layout errors inherited or limited by algorithm

Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same

Page 34: Connecting Chemistry Across the Internet Using ChemSpider

So much data online is “erroneous”

Page 35: Connecting Chemistry Across the Internet Using ChemSpider

The confusion of name-structures

Page 36: Connecting Chemistry Across the Internet Using ChemSpider

Collapsing Data – Standardization

Page 37: Connecting Chemistry Across the Internet Using ChemSpider

What needs to happen?

If we could validate Catch errors in databases (and clean) Proactively catch errors in publications/patents Reduce junk in the ether – improve QUALITY!

If we collectively standardized Interlinking between databases should improve

CVSP – a separate presentation….stick around

Page 38: Connecting Chemistry Across the Internet Using ChemSpider

Crowdsourcing ChemSpider

ChemSpider is crowdsourced

Community deposition, annotation and curation

Anyone can “Leave Feedback”

Registered users can add data

Page 39: Connecting Chemistry Across the Internet Using ChemSpider

Internet Data

ChemSpider and Global Chemistry Hub

Commercial SoftwarePre-competitive Data

Open ScienceOpen DataPublishersEducators

Open DatabasesChemical Vendors

Small organic moleculesUndefined materialsOrganometallicsNanomaterialsPolymersMineralsParticle boundLinks to Biologicals

Page 40: Connecting Chemistry Across the Internet Using ChemSpider

Delivering a Prediction Platform Experimental data will be used as the basis of

model generation – a predictive platform…

Page 41: Connecting Chemistry Across the Internet Using ChemSpider

The Future of ChemSpider Continued focus on quality over quantity –

but more data is good too! ChemSpider Reactions – work in progress

and includes >300,000 reactions Plugging in a validation and standardization

platform Delivering personal and institutional

repository capabilities

Page 42: Connecting Chemistry Across the Internet Using ChemSpider

Thank you

Email: [email protected] Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams