big data's long tail
DESCRIPTION
TRANSCRIPT
Big Data’s Long Tail
Carly Strasser, John Kunze,Trisha Cruse
University of California Curation Center, California Digital Library
10 December 2012
From Flickr by rahen z
The Long Tail
Size of dataset
# datasets
The Long Tail
Size of dataset
# researchers# datasets
The Long Tail
Size of dataset
# researchers# datasets
# grants
The Long Tail
Size of dataset
# researchers# datasets
# grants
grant ($)
The Long Tail
Size of dataset
# researchers# datasets
# grants
grant ($)
With data managers
and fancy tools
Do-it-yourself tools
From Flickr By puck90
UGLY TRUTH
Many researchers…are not taught data management
don’t know what metadata are
can’t name data centers or repositories
don’t share data publicly or store it in an archive
aren’t convinced they should share data
Intercept researchers where they already
work
Facilitate
Archiving
Sharing
Publishing
Data management
& organization
Data re-use & reproducibility
DataUp: the vision
Open Source Tool Add-in & Web
Application
Earth, Environmental,
& Ecological Researchers
?
Add-in • Download and install• Appears as “ribbon” in Excel• Windows Excel 2007+
Web-based application • Website that does something
with user’s files• New user interface• Any platform/spreadsheet
software?
FeaturesBest practices checkGenerate metadataGenerate citation
Post data to repository
Requirements
SENT TO MSRReleased Sept
4, 2012
Best Practices Check
Best Practices Check
Generate Metadata
17
Attribute Metadata
Create Data Citation
Create Data Citation: Get Identifier
ask CDL’s Merritt
repository for id
... which asks EZID for an id
Upload to a Repository
Tip: you can also choose a practice repository
The long tail of the long tail
A data repository for
AnyoneAnywhere
Build community
Add repositories
Add metadata schema
From
ani
mati
onre
sour
ces.
org
ONEShare, Merritt, UCSB,
Dryad, etc.
NSF DataNet, INTEROP, etc.