the role of trust in science at sla 2011

54
International Year of Chemistry: Perils and Promises of Modern Communication in the Sciences The Role of Trust June 14, 2011 Special Libraries Association Jean-Claude Bradley Department of Chemistry Drexel University

Upload: jean-claude-bradley

Post on 11-May-2015

1.048 views

Category:

Documents


2 download

DESCRIPTION

Jean-Claude Bradley presents at the Special Libraries Association meeting on June 14, 2011 on the "International Year of Chemistry: Perils and Promises of Modern Communication in the Sciences- The Role of Trust". The talk mainly covers the problems with a trusted source based model for melting point data and demonstrates that an Open Data model including Open Notebook Science when necessary can be very helpful in curating datasets. Web services for experimental and predicted melting points are then reviewed.

TRANSCRIPT

Page 1: The Role of Trust in Science at SLA 2011

International Year of Chemistry: Perils and Promises of Modern Communication

in the Sciences The Role of Trust

June 14, 2011

Special Libraries Association

Jean-Claude Bradley

Department of ChemistryDrexel University

Page 2: The Role of Trust in Science at SLA 2011

Unknown Perils of the Past

Before online databases (early 90s) searching for properties like melting

points using ONE “trusted source” was practical

• CRC Handbook•Merck Index• Chemical Vendor Catalogs (e.g. Sigma-Aldrich)• Peer-Reviewed Journals

Page 3: The Role of Trust in Science at SLA 2011

Known Perils of the Present

Today, many librarians discourage the use of new online sources (like Wikipedia) for the

searching of chemical data and recommend using only “trusted sources”

The problem is that the “trusted source” model is - and always was – fundamentally

flawed.

Ironically most of Wikipedia’s chemical information is problematic BECAUSE it is based

on “trusted sources”!

Page 4: The Role of Trust in Science at SLA 2011

Promises for the Future

Using technology, we can begin to replace the “trusted source”

model with one based on transparency and provenance

Page 5: The Role of Trust in Science at SLA 2011

The current state of transparency in scientific communication

Case study of melting point data

Page 6: The Role of Trust in Science at SLA 2011

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Page 7: The Role of Trust in Science at SLA 2011

The Chemical Information Validation Explorer

(Andrew Lang)

Page 8: The Role of Trust in Science at SLA 2011

Discovering outliers for melting points (stdev/average)

Page 9: The Role of Trust in Science at SLA 2011

Investigating the m.p. inconsistencies of EGCG

Page 10: The Role of Trust in Science at SLA 2011

Investigating the m.p. inconsistencies of cyclohexanone

Page 11: The Role of Trust in Science at SLA 2011

Most popular data sources

Page 12: The Role of Trust in Science at SLA 2011

Alfa Aesar donates melting points to the public

Page 13: The Role of Trust in Science at SLA 2011

Open Melting Point Explorer

(Andrew Lang)

Page 14: The Role of Trust in Science at SLA 2011

OutliersMDPI

datasetEPI (donated all data to public

also)

Page 15: The Role of Trust in Science at SLA 2011

Outliers for ethanol: Alfa Aesar and Oxford MSDS

Page 16: The Role of Trust in Science at SLA 2011

Inconsistencies and SMILES problems within MDPI dataset

Page 17: The Role of Trust in Science at SLA 2011

MDPI Dataset labeled with High Trust Level

Page 18: The Role of Trust in Science at SLA 2011

Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs

Page 19: The Role of Trust in Science at SLA 2011

Live curation on a public Google Spreadsheet of compounds with highest mp ranges

(collaboration with Andrew Lang and Antony Williams)

Page 20: The Role of Trust in Science at SLA 2011

Some melting points can’t be resolved only with literature: 4-benzyltoluene

Page 21: The Role of Trust in Science at SLA 2011

The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp

and can be frozen <-30C

Page 22: The Role of Trust in Science at SLA 2011

The quest to resolve the melting point of 4-benzyltoluene: ambiguous results upon heating

but clearly remains a liquid at -15 C for 2 days in freezer

Page 23: The Role of Trust in Science at SLA 2011

Further investigation into the literature for the melting point of 4-benzyltoluene

Although a general description of method is provided the raw data are

not

Page 24: The Role of Trust in Science at SLA 2011

Because of broken provenance errors cascade through the literature

Calculations in patent based on incorrect data

Page 25: The Role of Trust in Science at SLA 2011

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 26: The Role of Trust in Science at SLA 2011

Melting point prediction service

Page 27: The Role of Trust in Science at SLA 2011

Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)

Page 28: The Role of Trust in Science at SLA 2011

Using melting point for temperature dependent solubility prediction

Page 29: The Role of Trust in Science at SLA 2011

Motivation: Faster Science, Better Science

Page 30: The Role of Trust in Science at SLA 2011

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 31: The Role of Trust in Science at SLA 2011

TRUST

PROOF

Page 32: The Role of Trust in Science at SLA 2011

First record then abstract structure

In order to be discoverable use Google friendly formats (simple HTML, no login)

In order to be replicable use free hosted tools (Wikispaces, Google Spreadsheets)

Strategy for an Open Notebook:

Page 33: The Role of Trust in Science at SLA 2011

Crowdsourcing Solubility Data

Page 34: The Role of Trust in Science at SLA 2011

Data provenance: From Wikipedia to…

Page 35: The Role of Trust in Science at SLA 2011

…the lab notebook and raw data

Page 36: The Role of Trust in Science at SLA 2011

Calculations Made Public on Google Spreadsheets

Page 37: The Role of Trust in Science at SLA 2011

Interactive NMR spectra using JSpecView and JCAMP-DX

Page 38: The Role of Trust in Science at SLA 2011

Raw Data As Images

Splatter?

Some liquid

Page 39: The Role of Trust in Science at SLA 2011

YouTube for demonstrating experimental set-up

Page 40: The Role of Trust in Science at SLA 2011

Solubilities collected in a Google Spreadsheet

Page 41: The Role of Trust in Science at SLA 2011

Rajarshi Guha’s Live Web Query using Google Viz API

Page 42: The Role of Trust in Science at SLA 2011

Web services for summary data

(Andrew Lang)

Page 43: The Role of Trust in Science at SLA 2011

Web service calls from within a Google Spreadsheet for solubility measurement and

prediction

(Andrew Lang)

Page 44: The Role of Trust in Science at SLA 2011

Integration of Multiple Web Services to Recommend Solvents for Reactions

(Andrew Lang)

Page 45: The Role of Trust in Science at SLA 2011
Page 46: The Role of Trust in Science at SLA 2011
Page 47: The Role of Trust in Science at SLA 2011
Page 48: The Role of Trust in Science at SLA 2011

Reaction Attempts Book

Page 49: The Role of Trust in Science at SLA 2011

Reaction Attempts Book: Reactants listed Alphabetically

Page 50: The Role of Trust in Science at SLA 2011

ONS Challenge Solubility Book cited for nanotechnology application

Page 51: The Role of Trust in Science at SLA 2011

Lulu.com Data Disks

Page 52: The Role of Trust in Science at SLA 2011

All ONS web services

Page 53: The Role of Trust in Science at SLA 2011

For all Formats of ONS Projects

Page 54: The Role of Trust in Science at SLA 2011

Conclusions

• For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance

•Open Notebook Science offers an efficient way to make research transparent and discoverable