rescue of long-tail data from the ocean bottom to the moon
TRANSCRIPT
IEDA iedadata.org
!
!
!
!IN12A. Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science!Fall AGU 2013!
Rescue of Long-Tail Data from the Ocean Bottom to the Moon!
Leslie Hsu, Kerstin Lehnert, Suzanne Carbotte, Vicki Ferrini,! John Delano1, James B. Gill2, Maurice Tivey3!
!Lamont-Doherty Earth Observatory, Columbia University,!1University of Albany, 2University of California, Santa Cruz, 3Woods Hole Oceanographic Institution!!
IEDA iedadata.org
Data at Risk!¤ "Data at Risk" is scientific data that are !
¤ not in formats that permit full electronic access to the information they contain. !
¤ Data at Risk may be !¤ non-digital (e.g., handwritten or photographic), !¤ on near-obsolete digital media (such as floppy disks), !¤ or insufficiently described (lacking metadata). !
¤ Some born-digital data are considered "at risk" if they cannot be ingested into managed databases because they lack adequate formatting or metadata.!
!
Definition from the ICSU CODATA Data at Risk Task Group (DARTG)!
IEDA iedadata.org
Data Rescue!¤ A “Data Rescue Mission” is any effort to preserve data at risk. Rescue
missions can come in the form of digitization, format migration, treating damaged materials (e.g., water or mold), adding metadata or any action taken to make data accessible in the long term.!
Definition from ICSU CODATA Data at Risk Task Group (DARTG)
M. Tivey
IEDA iedadata.org
Long Tail Data are often Data at Risk!
Long Tail Characteristics!q More specialised!q Low volume!q On C drives!q Hard to find!q Heterogeneous!q Collected by many
people!q Citizen science!q Etc!q Etc!
Long Tail: Environmental and Earth sciences
The Head: Astronomy, Climate, High Energy Physics, Genomics
L. Wyborn http://juliegood.wordpress.com/tag/long-tail/
IEDA iedadata.org
IEDA Data Rescue Mini-Awards!
¤ Established to preserve valuable legacy data sets that are in danger by impending retirement or degradation!
¤ Evaluated by highest impact on future research by quality, size, rarity, unique location or data type!
¤ Made accessible to the community for re-use by inclusion in the IEDA data collections (EarthChem, MGDS, SESAR)!
¤ $7000 award to support proper compilation, documentation, transfer!
¤ 3 awardees chosen from 11 entries over a wide range of geochemical and geophysical data!
!
IEDA iedadata.org
1: Geologic samples and geochemistry!
¤ WHAT: Compilation of sample metadata and geochemical analyses from three areas – Fiji, Izu Arc, and Endeavour segment. (James B. Gill)!
¤ WHY: study of intra-ocean arcs and spreading centers!
¤ HOW: Check and add incomplete data, digitize data, add persistent identifiers. Link between related resources!
¤ Major challenge: Physical sample management!
Maps made with GeoMapApp
IEDA iedadata.org
The importance of Sample identification!
¤ Individual samples can play a large role in scientific conclusions, so accurate documentation of sample metadata is critical.!
¤ The key measurement was the one backarc basalt called "PPTUW”... Subsequent efforts to confirm the observation ran into problems. The apparently-same sample was variously called PPTU, PPTUW/5, PPTUW-1, and TVZ19 in four other papers. None of those papers gave its latitude and longitude… (J. Gill and E. Todd)!
IEDA iedadata.org
2: Near-bottom magnetics!
¤ WHAT: Compilation of near-bottom magnetometer data, including raw, merged, processed, and navigation metadata (Maurice Tivey)!
¤ WHY: study of magnetic reversals, effect of tectonics on magnetic field!
¤ HOW: gather data from different formats, add complete metadata and workflow!
¤ Challenge: over three decades of technology and file formats!
IEDA iedadata.org
Evolution of equipment: 1985, 1992, 2004, 2011 !
IEDA iedadata.org
Evolution of storage media!
M. Tivey
IEDA iedadata.org
Addition of “sufficient” metadata!
IEDA iedadata.org
3: Lunar sample geochemistry!
¤ WHAT: Compilation of lunar sample geochemistry (John W. Delano et al.)!
¤ WHY: composition of the Moon!
¤ HOW: Digitize photos, label specific grains, compile geochemistry in data templates!
¤ Challenge: nothing was digital!
!
LPI
IEDA iedadata.org
Use of IEDA EarthChem templates!
IEDA iedadata.org
Common needs addressed!
¤ Accessibility – web access, links between systems!
¤ Documentation – README files, additional descriptions!
¤ Standardization – IEDA EarthChem geochemical templates !
¤ Persistent links – DOIs and IGSNs!
¤ Citability – DOIs, example citations!
¤ Guidance/Training – calls and emails with disciplinary repository staff!
IEDA iedadata.org
IEDA iedadata.org
Lessons learned: investigator!
¤ Take ownership of your own legacy!¤ Data curation by others may not be complete or correct!
¤ Data rescue of an entire career does not need to be overwhelming !¤ Start with small steps!¤ Disciplinary repositories will help and guide you to what is needed!
¤ Despite the time investment, data rescue is worth it!¤ Others will now be able to re-use the data!¤ Notes taken years ago actually explain anomalies!!
IEDA iedadata.org
Lessons learned: repository!
¤ For Long Tail Data, every project is different !¤ There is not an established workflow – just past experience!¤ Time commitment from staff is nontrivial!
¤ Disciplinary training helps a great deal!¤ Investigators need help determining the best products!
¤ A small incentive will motivate investigators!
¤ Data Rescue missions help the repository determine next steps for development of tools and services!
IEDA iedadata.org
Summary of Long-tail Data Rescue!
¤ Three Data Rescue efforts this past year by IEDA have made data that were at risk!¤ digitized from analog data and near-obsolete media!¤ sufficiently described for reuse!¤ in formats that permit full electronic access!¤ Citable, with persistent identifiers, and ready for reuse!
¤ The projects also helped IEDA identify improvements in data rescue workflow, and future tools and services!
IEDA iedadata.org
More Data Rescue Activities!
¤ Elsevier-IEDA Data Rescue Process Study!¤ A data entry tool for lunar geochemistry: MoonDB!
¤ Elsevier-IEDA International Data Rescue Award!¤ Winner announced at reception tonight, Monday Dec 9th, 2013!¤ Intercontinental Hotel, Twin Peaks Room, 7:00-8:30pm!