delivering on the promise of a chemistry data repository for the world
DESCRIPTION
This presentation was given as a part of the Microsoft eScience panel discussion in Sao Paulo, Brazil. The panel discussion was in regards to Going Native, a reference to a quote from Jim Gray along the lines of “in order to really understand the computing needs of a scientist you have to go native”. Jim himself did this, immersing himself in astronomy to build what would become the WorldWide Telescope. Bridging the gap between experimental scientists and the computing that underpins their discoveries is an ongoing challenge for eScience. The panel explored what it means to go native and gave examples of where they have seen this work well and shared lesson’s learned from working in this way.TRANSCRIPT
Delivering on the promise of a chemistry data repository
for the world
Antony WilliamsGoing Native Panel Discussion at the Microsoft eScience Workshop
0000-0002-2668-4821
A Question to Start…
• Who in the room has an ORCID?
New Horizons….
• Let’s map together all historical chemistry data and build systems to integrate it
• Heck, let’s integrate chemistry and biology data and add in disease data too
• Let’s model the data and see if we can extract new relationships – quantitative and qualitative
• Let’s take what we learn from historical data and build better solutions for modern data
• Let’s make it all available on the web…
What about this….
• We’re going to map the world
• We’re going to take photos of as many places as we can and link them together
• We’ll let people annotate and curate the map
• Then let’s make it available free on the web
• We’ll make it available for decision making
• Put it on Mobile Devices, give it away…
Chemistry data is of value?
• Reference databases generate hundreds of millions of dollars/euros per year
• So much data generated that could go public
• Maybe 5% of all data generated is published
• There is no “Journal of Failed Experiments”
• Funding agencies start to demand Open Data
• Scientists want funding but also recognition
A shift to Openness
Open Data is here…
Chemistry data is of value?
• Reference databases generate hundreds of millions of dollars/euros per year
• So much data generated that could go public
• Maybe 5% of all data generated is published
• There is no “Journal of Failed Experiments”
• Funding agencies start to demand Open Data
• Scientists want funding but also recognition
• …so who will fund and build the platforms?
Going Native… speaka da lingo
Chemists clearly benefit from accessing data
What we found…
• Data quality on the internet can be very poor
• Everyone wants access to high quality data but very few are willing to contribute
• The primary concerns for contributors• It needs to be easy• Data licensing• Recognition for contributions
Recognition: need to have Impact
Quantitating scientists?
National Information Standards Organization and “Altmetrics”
http://www.niso.org/apps/group_public/download.php/13295/niso_altmetrics_white_paper_draft_v4.pdf
Research Outputs
• Blogs
• Research datasets
• Scientific software
• Posters and presentations at conferences
• Electronic theses and dissertations
• Performances in film and audio
• Lectures, online classes and teaching activities
Recognizing Contribution
• In order to encourage participation maybe we need to provide recognition of impact
• How do we measure impact for:• Performing peer review?• Contributions to more “public platforms”?...
Christmas Curating Wikipedia
Wikipedia Chemboxes
• http://en.wikipedia.org/wiki/Glucose
19
Three days of discussion
Three days of discussion
• If you want to understand Wikipedia definitely Go Native and get involved!
Does ONE bond matter???
A short intro to chirality
A short intro to chirality
Educating chemists in data
• Chemists are more likely to know basic HTML over data formats in chemistry
• Even international standards for data interchange and standardization are unknown
• Standards are ideal for computers to handle
Can we MAKE Quality Data?
• We are building systems for everyone to validate and standardize their data
Where to host research data?
• Containers for chemical compounds, chemical reactions, analytical data, tabular data, etc.
• Algorithms for data validation and standardization
• Domain specific search technologies
• A platform for modeling data
• Progressing the RSC Data Repository…
Compounds
Reactions
Analytical data
Generating models from data
New Horizons….are here
• Let’s map together all historical chemistry data and build systems to integrate it
• Heck, let’s integrate chemistry and biology data and add in disease data too
• Let’s model the data and see if we can extract new relationships – quantitative and qualitative
• Let’s take what we learn from historical data and build better solutions for modern data
• Let’s make it all available on the web…
So we DON’T have to do this…
ORIGINAL FIGURE
EXTRACTED FIGURE
The path forward
• Mesh and aggregate published data
• Encourage deposition of RESEARCH data – that will never be published
• Provide open APIs for data access
• Educate chemists in digital literacy
• Funding agencies should mandate data access
• Collaboration is key – don’t do it alone
Thank you
Email: [email protected]: 0000-0002-2668-4821 Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams