fair dataverse
TRANSCRIPT
dans.knaw.nlDANS is een instituut van KNAW en NWO
FAIR DataverseNL
Vyacheslav TykhonovSenior Data Scientist (DANS-KNAW),
Royal Netherlands Academy of Arts and [email protected]
Den Haag, the Netherlands, 06.12.2016
Why Dataverse?
• Open source project developed by IQSS of Harvard University and published on github
• Great product with very long history (from 2006)• Very dynamic and experienced development team working in the
Agile environment (community call scheduled once in two weeks)• Clear vision and understanding of research communities
requirements, public roadmap• Strong community behind of Dataverse is helping to improve the
basic functionality and develop it further• Well developed architecture with rich APIs allows to build
application layers around Dataverse
QA: value of data repository for researcher
- I’ve deposited my dataset, it’s archived now. Are you like Dropbox but with some rich metadata?- No, we’ll make your data visible and will invite interested audience.- Are you going help me to get more citations?- We can help to promote your datasets in Google, Yahoo and other search engines.- Are you persistent? What if I loose my hard drive with data someday?- Every dataset will get own persistent identifier after deposit. You’ll be able to access it on the same url any time- Is your service good enough to store my data?- Nature Research journal Scientific Data recommended Dataverse for researchers submitting supporting datasets from any research subject- Probably it’s very expensive, isn’t it?- At the moment we’re working with a lot of partners and charging every customer only 4000 EUR per year for basic services.
Dataverse widget disseminates scientific datalike YouTube or Slideshare
<iframe width="560" height="315" src="https://www.youtube.com/embed/fgn6dmfsZ_M" frameborder="0" allowfullscreen></iframe>
DataverseNL integration by widget
DataverseNL and Google
Dataverse, SEO and FAIR (F and A)
Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a web search engine's unpaid results—often referred to as "natural", "organic", or "earned" results. In general, the earlier (or higher ranked on the search results page), and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users, and these visitors can be converted into customers.[1] SEO may target different kinds of search, including image search, local search, video search, academic search,[2] news search and industry-specific vertical search engines.
from Wikipedia
FAIR - a set of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable.
So… SEO is a major approach to make research data FAIR.
DataverseNL and research community
Common goals:- getting higher position of DataverseNL in search engines will set higher rank of researcher in his community. Great approach to be cited more!- pointers to deposited metadata and data are persistent with handles (ongoing research) and DOIs (archive)- depositor is getting not just citation but own dynamic research media channel that can go up (or down)- adding more dataverses and datasets will automatically increase the importance of DataverseNL in search engines and will boost visibility of the datasets- metadata enrichment will attract more interested visitors on landing pages of researchers, and in the same time increase the popularity of DataverseNL website
The role of Archivist in the digital age
- providing guidance for depositors to describe their metadata by relevant and rich keywords- collecting information from search engines about similar research projects- links exchange to get inbound rank higher (position increase)- suggest new keywords that should increase visibility of datasets and attract more visitors- digital archivist should have good analytics skills to understand trends- research and collections are coming together
Keyword suggestion tool
https://adwords.google.com/KeywordPlanner- find another relevant keywords- analyse similar websites- get search volume- pick up new terms for metadata- track your position in search results- enjoy more visitors on your study!
Community efforts
Every member of Dataverse community improves own metadata and visibility of his data - and other members automatically can get higher positions by higher rank and new citationsValue of the community grows and citation rank increasesMore partners will join to benefit from this collaborationResearch data become Findable and Accessible
Dashboard for every dataset
Back to FAIR: Interoperable (I)
To be Interoperable:I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.I2. (meta)data use vocabularies that follow FAIR principles.I3. (meta)data include qualified references to other (meta)data.
CLARIAH.nl project delivers standards for researchers to be able to find and use the data and tools.DataverseNL will use their services to ingest, map, convert, curate, harvest, query, explore, visualize and export structured humanities research data.
Back to FAIR: Re-usable (R)
Dataverse features:• clear licences for every dataset• plurality of accurate and relevant attributes.• provenance of data (version 4.6, Q4 2016)• domain-relevant community standards
Data Provenance is the key to be Re-usable
Provenance of datasets will allow researchers to see the context on how they are captured, processed, analyzed, and validated and other information that enables interpretation and reuse:
Source: Harvard’s IQSS director Gary King’ Balsamiq
API economy
Dataverse is data repository platform with 4 API endpoints:- Native API- SWORD API- Search API- Data Access API
API token is the key to connect Dataverse with unlimited amount of tools developed by different research communities and integrate it with other repositories.We can benefit from other FAIR tools and datasets today!
Data Preservation
Trusted Digital Repository (TDR) is permanent archive for data and metadata, and provenance information.
Data citation alone does not solve the transparency issue. Full documentation of data set provenance and context is necessary.
The vision is to have Dataverse as deposit service for ongoing research and DANS EASY (TDR) as permanent archive.