metadata for data rescue and data at risk

19
Metadata for Data Rescue Metadata for Data Rescue and Data at Risk and Data at Risk William L. Anderson, John L. Faundeen, Jane Greenberg, Fraser Taylor PV2011, Toulouse, 17 November 2011 PV2011, Toulouse, 17 November 2011 Presented by Nico Carver Presented by Nico Carver In collaboration with the DARi SILS Student Learning Circle

Upload: 2ghouls

Post on 05-Jul-2015

1.180 views

Category:

Education


1 download

DESCRIPTION

A presentation I gave at PV 2011 in Toulouse, France on behalf of CODATA's Data-at-Risk Task Group.

TRANSCRIPT

Page 1: Metadata for Data Rescue and Data at Risk

Metadata for Data Rescue Metadata for Data Rescue and Data at Riskand Data at Risk

William L. Anderson, John L. Faundeen, Jane Greenberg, Fraser Taylor

PV2011, Toulouse, 17 November 2011 PV2011, Toulouse, 17 November 2011 Presented by Nico CarverPresented by Nico Carver

In collaboration with the DARi SILS Student Learning Circle

Page 2: Metadata for Data Rescue and Data at Risk

OutlineOutline• Major Questions• Metadata Scheme Design• Case Study• Next Steps• Acknowledgements• Questions/Comments

Page 3: Metadata for Data Rescue and Data at Risk

Major Questions informing Research Major Questions informing Research

Where is at-risk data?

How are scientists using historic data?

How do we define at-risk?

“8 inch floppy” Retrieved from: http://johnkingworld.com/aplus/images/storage-8inch-floppy.jpg

How do others define at-risk?

What must be done to rescue data-at-risk?

Page 4: Metadata for Data Rescue and Data at Risk

Major Question informing Scheme DesignMajor Question informing Scheme Design

What is essential metadata for describing data-at-risk and aiding in data rescue?

Page 5: Metadata for Data Rescue and Data at Risk

Metadata requirementsMetadata requirements

• Be applicable across a range of disciplines and scientific research areas.

• Sufficiently support the data rescue mission.

Page 6: Metadata for Data Rescue and Data at Risk

Functions of the InventoryFunctions of the Inventory

Describe data of scientific value that is at-risk of being lost, unused, or destroyed.

1. Science area2. Nature of data3. Date or date-span4. Location of original5. Present location

Act as a starting point for the data rescue mission.

6. Expected future7. Risk level

Function Initial Metadata Properties

Page 7: Metadata for Data Rescue and Data at Risk

Metadata Frameworks Useful Metadata Frameworks Useful for Data-at-Riskfor Data-at-Risk

Metadata Property

1. Science area

2. Nature of data

3. Date or date-span

4. Location of original

5. Present location

6. Expected future

7. Risk level

DARTG Chair Elizabeth Griffin’s initial proposed DARTG metadata properties

Page 8: Metadata for Data Rescue and Data at Risk

Metadata Frameworks Useful Metadata Frameworks Useful for Data-at-Riskfor Data-at-Risk

U.S.Geological Service: “Create a Rescue Request”, URL: http://eros.usgs.gov/government/archive_rescue/archive_request.php

Page 9: Metadata for Data Rescue and Data at Risk

Metadata Frameworks Useful Metadata Frameworks Useful for Data-at-Riskfor Data-at-Risk

“Growing the Vocabuary” http://dublincore.org/resources/training/frd_20091217/Tutorial_FRD_baker-1.pdf

Page 10: Metadata for Data Rescue and Data at Risk

Metadata Frameworks Useful Metadata Frameworks Useful for Data-at-Riskfor Data-at-Risk

“The PREMIS Data Dictionary” http://www.loc.gov/standards/premis/v2/premis-dd-2-1.pdf

Page 11: Metadata for Data Rescue and Data at Risk

Data-at-Risk Inventory (DARI) Metadata Data-at-Risk Inventory (DARI) Metadata Scheme: guiding principlesScheme: guiding principles

• Simple

• Broadly applicable

• Extensible

Page 12: Metadata for Data Rescue and Data at Risk

DARI Metadata Scheme (current)DARI Metadata Scheme (current)Metadata Element Name Element Description

Research Area(s) The domains represented by DARTG experts and the more general category of “Other”.

Title The name associated with the collection.

Physical form of the data Paper, photograph, specimen, record book, magnetic tape, etc.

Content and context of the data History, topic, etc. -- if known

Name of current holder Institution, organization or individual.

Dates associated with data Time period when data were collected.

Size Extent, volume, size.

Data condition Stable, deteriorating, etc.

Risk level Poor storage conditions, limited storage time, etc.

Known access and restrictions Public domain, private collection, etc.

Notes Any additional information.

Contact information Address or other contact information for the institution, organization or individual.

DARTG DARI Metadata, Version 1.0

Page 13: Metadata for Data Rescue and Data at Risk

Case Study: introductionCase Study: introduction

Page 14: Metadata for Data Rescue and Data at Risk

Case Study: implementationCase Study: implementation

Page 15: Metadata for Data Rescue and Data at Risk

Case Study: ResultsCase Study: Results• 7 Dataset Descriptions

total. 5 out of 7 were completed unassisted using the metadata template

• 13.5 out of 16 metadata elements considered useful on average (85%)

• 4 out of 5 scientists said they would use the inventory again

Page 16: Metadata for Data Rescue and Data at Risk

Case Study: conclusionsCase Study: conclusions

• The purpose of the inventory had to be more clearly stated on the website

• Instructions for filling out the web form had to be simple, but clear

• 3 metadata properties were determined unnecessary, 4 properties were altered for clarity

• The remaining metadata properties were successful in their ability to cut across scientific disciplines while fully describing data-at-risk

Page 17: Metadata for Data Rescue and Data at Risk

Next StepsNext Steps

• Complete focus groups and surveys at UNC- Chapel Hill and elsewhere to determine possible use cases

• Disseminate information and generate interest for the inventory and the Data-at-Risk project

• Finalize the inventory design and start populating it

Page 18: Metadata for Data Rescue and Data at Risk

Submit a description:Submit a description:http://ibiblio.org/data-at-risk/contribution

Page 19: Metadata for Data Rescue and Data at Risk

Questions/Questions/Comments?Comments?

Acknowledgements:•The University of North Carolina Center for Global Initiatives’ support of the Data At Risk Inventory SILS Student Learning Circle•The Council for Scientific and Technical Data•And the following people for their leadership, guidance, and assistance: Bill Anderson, School of Information, University of Texas at Austin; Jane Greenberg, School of Information and Library Science; Elizabeth Griffin, Herzberg Institute of Astrophysics; Dav Robertson, National Institute of Environmental Health Sciences, NIH; and Paul Jones & John Reuning, ibiblio, University of North Carolina at Chapel Hill.