Download - The expanding dataverse
![Page 1: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/1.jpg)
The Expanding
Dataverse
Mercè Crosas, Director of Data Science, IQSS@mercecrosas
January 21, 2015, Lamont Library, Harvard University
![Page 2: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/2.jpg)
Data Publishing: A form of
Scholarly Communication
350 years
of scientific
publishing,
with words
and data
1665 Data, if any, were part of the printed publication
NowVast quantities of digital data (and code) cannot
be part of the printed publication
![Page 3: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/3.jpg)
Pillars of Data Publishing
To make data discoverable, accessible and
reusable, we need:
1. Data Citation, to reference and find data
2. Data Repositories, to host and access data
3. Information about the data, to understand
and reuse them
![Page 4: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/4.jpg)
Dataverse Software:
A Data Publishing framework
… for a wide range of repositories
Public, Generic
Repositories
Institutional
Repositories
Curated Data Archives
Repositories
![Page 5: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/5.jpg)
http://dataverse.org
![Page 6: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/6.jpg)
Dataverse 4.0: Enables and
Enhances Data Publishing
● A data citation compliant with the Data
Citation Principles
● Rich metadata to describe and find datasets
from multiple domains
● Support for public and restricted data,
open data license and terms of use
● Rigorous workflows to publish data, with
support for new versions of the data
![Page 7: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/7.jpg)
Data Citation
![Page 8: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/8.jpg)
A Brief History of Citing Data
1906Chicago Manual of Style:
author/creator, title, dates,
publisher or distributor
1979ASBR (“Data File” type)
MARC (machine readable catalog)
Domain Repositories
(e.g., GenBank)
1959First scientific digital repositories
(e.g. World Data Center, ICPSR)
1999 - NowGrowth of Data Repositories
(e.g., NESSTAR, Dataverse,
Dryad, Figshare, Zenodo)
DOI services for Data
(e.g., DataCite in 2009)
Altman & Crosas, 2013, “The Evolution of Data Citation: From Principles to Implementation” IASSIST Quarterly
2014 Data Citation
Principles
NISO-JATS
revised to
support data
![Page 9: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/9.jpg)
Joint Declaration of Data
Citation Principles
1 Importance
2 Credit and Attribution
3 Evidence
4 Unique Identification
5 Access
6 Persistence
7 Specificity and Verifiability
8 Interoperability and flexibilityhttps://www.force11.org/datacitation
![Page 10: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/10.jpg)
Data Citation generated by
Dataverse
Principle 2:
Credit and Attribution
Principle 4, 5, 6:
Unique Id Access
Persistence
Principle 7:
Specificity and Verifiability
Principle 8: Interoperability and flexibility:
Repository exports citation metadata in XML, JSON formats
Authors, Year, Dataset Title, DOI, Data Repository, UNF, version
Resolves to landing page with access to
metadata, docs, and data
Altman & King, 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data.
![Page 11: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/11.jpg)
A rigorous
Metadata
![Page 12: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/12.jpg)
![Page 13: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/13.jpg)
Three Metadata Levels
Generic Metadata Domain Specific
MetadataFile Metadata
Includes data
citation metadata
fields (Examples:
title, authors,
persistent id,
description)
Examples:
● Social Science
Metadata (DDI)
● Life Sciences
(ISA-Tab)
● Astronomy (VO)
Examples (automatic):
● For Tabular Files:
Column information
● For FITS Files:
Header information
![Page 14: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/14.jpg)
Life Science Metadata
Example: Life Sciences Metadata
![Page 15: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/15.jpg)
Example: Astronomy Metadata
![Page 16: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/16.jpg)
Public vs
Restricted
![Page 17: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/17.jpg)
Terms, Licenses and
Restrictions
Public Dataset Dataset with
Restricted Files
Dataset with
Terms of Use
● CC0 License
● Metadata is public
● Files are public
● CC0 License
● Metadata is public
● Files are restricted
● Access Terms are
defined in dataset
● Metadata is public
● Terms of Use are
defined in dataset
(CC0 can’t apply)
● Files might be public
or restricted
![Page 18: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/18.jpg)
Workflows
![Page 19: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/19.jpg)
Draft, Published and
Versions
Draft DatasetPublished
Dataset, v1
Published
Dataset, v1.1
Published
Dataset, v2
Upload
Data
Dataset in review,
can be shared with
collaborators
Once published,
dataset cannot be
unpublished (only
deaccessioned)
Minor version for
small changes to
dataset description
Major version for
new versions of
data files
Data Citation
becomes publicData Citation
doesn’t change
Data Citation
changes
Draft Draft
![Page 20: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/20.jpg)
Multiple Roles for
Multiple Workflows
Editor
Upload Data +
Edit Metadata
Set File Restrictions +
License and Terms
Grant Access +
Publish Dataset
Upload Data +
Edit Metadata
Upload Data +
Edit Metadata
Set File Restrictions +
License and Terms
Manager
+
+ +Curator
+ Custom Roles
![Page 21: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/21.jpg)
Data Processing,
Analysis, and
Visualizations
![Page 22: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/22.jpg)
Tabular Data: Converted to Preservation format
Download in Original format or
Preservation format (does not
depend on software package)
![Page 23: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/23.jpg)
Tabular Data: Explore and Analyze with TwoRavens
![Page 24: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/24.jpg)
Geospatial Data: Visualize in WorldMap
![Page 25: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/25.jpg)
Demo acknowledgement: Dwayne Liburd, Sonia Barbosa
![Page 26: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/26.jpg)
Not only Expanding in
Features, but also in Size
874 Dataverses
55,539 Datasets
1,173,733 Downloads
![Page 27: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/27.jpg)
What’s coming
![Page 28: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/28.jpg)
![Page 29: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/29.jpg)
Beyond 4.0● Integration with other Systems:
o DASH
o ORCID
o Journal Systems (in addition to OJS)
o Archivematica
o iRODS
● Support for Sensitive Data:o Secure Storage
o DataTags
o Analysis with Privacy Preserving Algorithms
● Data Citation with Dataset Provenance
● Expanding APIs!
![Page 30: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/30.jpg)
![Page 31: The expanding dataverse](https://reader030.vdocuments.us/reader030/viewer/2022032421/55a798091a28ab751f8b49a0/html5/thumbnails/31.jpg)
A rigorous
Thank You
@mercecrosas
http://datascience.iq.harvard.edu/team