acs 248th paper 146 vivo/scientistsdb integration into eureka
Post on 24-May-2015
81 Views
Preview:
DESCRIPTION
TRANSCRIPT
Faculty Profiling and Searching in the Eureka Research Workbench
using VIVO and ScientistsDB
Matthew Morse, Israel Hurst, and Stuart J. ChalkDepartment of ChemistryUniversity of North Florida
schalk@unf.edu
2014 Fall ACS Meeting
Motivation What is Eureka? What is VIVO? VIVO API What is ScientistDB? MediaWiki API Search Approaches ElasticSearch Usage Future Plans Conclusion
Outline
Motivation
Eureka Research Workbench is an Electronic Laboratory Notebook (ELN) …
…plus representation of resources …and needs to be social
Find colleagues that you can collaborate with
There are many places to get this information
Scientists need to move todigital notebooks…
...and record not just the databut the flow and context
How science is doneis important for searching,aggregation, meta-analysis
We need more than an electronic version of a notebook
We need a science version of “Second Life” (SciLife?)
Electronic Notebooks
Started in 2006 after getting involved in the Analytical Information Markup Language (AnIML) project
Store all research notes/data in a digital format Capture the workflow of scientists Writing in a lab notebook is equivalent to
“multi-type” blogging in the digital world How to capture information? Many data types!
(ExptML) How to store files “online”? (Fedora-Commons) How to access files in the browser? (CakePHP) How to represent laboratory resources? (ExptML) How to link data together? RDF (in Fedora-Commons)
Eureka Research Workbench (ERW)
A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)
Experiment Markup Language (ExptML)
Sample Solution Space Specimen Substance Task Template Timeline User Vendor
Annotation Api Calculation Chemical Citation Customer Data Dataset Definition Element
Equipment Event Experiment Group Message Project Protocol Quote Report Result
What is VIVO?
An interdisciplinary network: Enabling collaboration and discovery among scientists across all disciplines.
Open source software out of Cornell University Now part of Duraspace (Dspace, Fedora-Commons, and
VIVO) Often integrated with other academic services Semantic representation -> Vivo Ontology (https://wiki.duraspace.org/display/VIVO/VIVO-
ISF+Ontology)
http://vivoweb.org/
VIVO API
Interface to search for different types of ‘individuals’ Faculty members Subjects Departments …
Available in multiple download formats N-Triples, RDF, N3, Turtle, JSON-LD
https://wiki.duraspace.org/display/VIVO/The+ListRDF+API
What is ScientistsDB?
Mediawiki site containing nearly 50,000 scientists
Wikipedia entries …plus manual additions
Tony Williams, RSC Sean Atkins, CDD Vault
http://www.scientistsdb.com/
MediaWiki API
Mediawiki is the software that runs Wikipedia Available for download (http://www.mediawiki.org) Access to all data in a mediawiki MySQL database Components
Authentication Search CRUD
http://www.mediawiki.org/wiki/API:Main_page
Search Approaches VIVO
listRDF API for faculty(http://<instance>/listrdf?vclass=http://vivoweb.org/ontology/core#FacultyMember)
Faculty member information (as JSON)(http://<instance>/individual/a52486491431389?format=json)
ScientistsDB Retrieve infobox
(http://www.scientistsdb.com/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Scientist
Extract records with ‘fields’ field
ElasticSearch
Data is stored on a cluster of computers running Elasticsearch NoSQL software
All data is ingested as JSON
Uses Apache Lucene to index data
http://www.elasticsearch.org/overview/elasticsearch
Implementation
Development of CakePHP plugins for VIVO (multiple locations) ScientistDB Elasticsearch
CakePHP can access each of these anywhere in its Model-View-Controller (MVC) code
Future Plans
Ingest more installations of VIVO Work with technical staff at VIVO to make
multi-site search available to all VIVO users
Improve code to clean up infobox data Work with Tony and Sean to evaluate if there
are better ways to retrieve subject fields
Conclusion
ScientistDB plugin works VIVO plugin very close…
Eureka needs to be collaborative software and therefore being able to find other researchers in your field is an important part of the system
Development of many more plugins to access online datasources within Eureka
schalk@unf.edu Phone: 904-620-5311 Skype: stuartchalk LinkedIn/Slidehare: https://www.linkedin.com/in/
stuchalk ORCID: http://orcid.org/0000-0002-0703-7776 ResearcherID:
http://www.researcherid.com/rid/D-8577-2013
Questions?
top related