fair dataverse

dans.knaw.nlDANS is een instituut van KNAW en NWO

FAIR DataverseNL

Vyacheslav TykhonovSenior Data Scientist (DANS-KNAW),

Royal Netherlands Academy of Arts and [email protected]

Den Haag, the Netherlands, 06.12.2016

Why Dataverse?

• Open source project developed by IQSS of Harvard University and published on github

• Great product with very long history (from 2006)• Very dynamic and experienced development team working in the

Agile environment (community call scheduled once in two weeks)• Clear vision and understanding of research communities

requirements, public roadmap• Strong community behind of Dataverse is helping to improve the

basic functionality and develop it further• Well developed architecture with rich APIs allows to build

application layers around Dataverse

QA: value of data repository for researcher

- I’ve deposited my dataset, it’s archived now. Are you like Dropbox but with some rich metadata?- No, we’ll make your data visible and will invite interested audience.- Are you going help me to get more citations?- We can help to promote your datasets in Google, Yahoo and other search engines.- Are you persistent? What if I loose my hard drive with data someday?- Every dataset will get own persistent identifier after deposit. You’ll be able to access it on the same url any time- Is your service good enough to store my data?- Nature Research journal Scientific Data recommended Dataverse for researchers submitting supporting datasets from any research subject- Probably it’s very expensive, isn’t it?- At the moment we’re working with a lot of partners and charging every customer only 4000 EUR per year for basic services.

Dataverse widget disseminates scientific datalike YouTube or Slideshare

<iframe width="560" height="315" src="https://www.youtube.com/embed/fgn6dmfsZ_M" frameborder="0" allowfullscreen></iframe>

DataverseNL integration by widget

DataverseNL and Google

Dataverse, SEO and FAIR (F and A)

Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a web search engine's unpaid results—often referred to as "natural", "organic", or "earned" results. In general, the earlier (or higher ranked on the search results page), and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users, and these visitors can be converted into customers.[1] SEO may target different kinds of search, including image search, local search, video search, academic search,[2] news search and industry-specific vertical search engines.

from Wikipedia

FAIR - a set of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable.

So… SEO is a major approach to make research data FAIR.

https://en.wikipedia.org/wiki/Website

https://en.wikipedia.org/wiki/Web_page

https://en.wikipedia.org/wiki/Web_search_engine

https://en.wikipedia.org/wiki/Organic_search

https://en.wikipedia.org/wiki/Image_search

https://en.wikipedia.org/wiki/Local_search_(Internet)

https://en.wikipedia.org/wiki/Video_search

https://en.wikipedia.org/wiki/Academic_databases_and_search_engines

https://en.wikipedia.org/wiki/Vertical_search

DataverseNL and research community

Common goals:- getting higher position of DataverseNL in search engines will set higher rank of researcher in his community. Great approach to be cited more!- pointers to deposited metadata and data are persistent with handles (ongoing research) and DOIs (archive)- depositor is getting not just citation but own dynamic research media channel that can go up (or down)- adding more dataverses and datasets will automatically increase the importance of DataverseNL in search engines and will boost visibility of the datasets- metadata enrichment will attract more interested visitors on landing pages of researchers, and in the same time increase the popularity of DataverseNL website

The role of Archivist in the digital age

- providing guidance for depositors to describe their metadata by relevant and rich keywords- collecting information from search engines about similar research projects- links exchange to get inbound rank higher (position increase)- suggest new keywords that should increase visibility of datasets and attract more visitors- digital archivist should have good analytics skills to understand trends- research and collections are coming together

Keyword suggestion tool

https://adwords.google.com/KeywordPlanner- find another relevant keywords- analyse similar websites- get search volume- pick up new terms for metadata- track your position in search results- enjoy more visitors on your study!

https://adwords.google.com/KeywordPlanner

Community efforts

Every member of Dataverse community improves own metadata and visibility of his data - and other members automatically can get higher positions by higher rank and new citationsValue of the community grows and citation rank increasesMore partners will join to benefit from this collaborationResearch data become Findable and Accessible

Dashboard for every dataset

Back to FAIR: Interoperable (I)

To be Interoperable:I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.I2. (meta)data use vocabularies that follow FAIR principles.I3. (meta)data include qualified references to other (meta)data.

CLARIAH.nl project delivers standards for researchers to be able to find and use the data and tools.DataverseNL will use their services to ingest, map, convert, curate, harvest, query, explore, visualize and export structured humanities research data.

http://clariah.nl/

Back to FAIR: Re-usable (R)

Dataverse features:• clear licences for every dataset• plurality of accurate and relevant attributes.• provenance of data (version 4.6, Q4 2016)• domain-relevant community standards

Data Provenance is the key to be Re-usable

Provenance of datasets will allow researchers to see the context on how they are captured, processed, analyzed, and validated and other information that enables interpretation and reuse:

Source: Harvard’s IQSS director Gary King’ Balsamiq

API economy

Dataverse is data repository platform with 4 API endpoints:- Native API- SWORD API- Search API- Data Access API

API token is the key to connect Dataverse with unlimited amount of tools developed by different research communities and integrate it with other repositories.We can benefit from other FAIR tools and datasets today!

http://www.slideshare.net/vty/api-economy-66222818

Data Preservation

Trusted Digital Repository (TDR) is permanent archive for data and metadata, and provenance information.

Data citation alone does not solve the transparency issue. Full documentation of data set provenance and context is necessary.

The vision is to have Dataverse as deposit service for ongoing research and DANS EASY (TDR) as permanent archive.

Try it now

Dataverse is the way to make your data FAIR.

Contact us today!

http://[email protected]

http://demo.dataverse.nl/

fair dataverse

Science