eprints preservation: why we need preservation planning

15
EPrints Preservation Why we need Preservation Planning by Steve Hitchcock EPrints User Group, OR10, Madrid, 9 July 2010

Upload: jisc-keepit-project

Post on 18-Nov-2014

1.217 views

Category:

Technology


1 download

DESCRIPTION

This short presentation is intended as a light interlude linking two practical sessions in a workshop connecting preservation planning with tools provided for use with EPrints repository software.

TRANSCRIPT

Page 1: EPrints Preservation: Why we need Preservation Planning

EPrints PreservationWhy we need Preservation Planning by Steve HitchcockEPrints User Group, OR10, Madrid, 9 July 2010

Page 2: EPrints Preservation: Why we need Preservation Planning

A first take on digital preservation

DIGITAL PRESERVATION is NOT so DIFFICULT  if you WANT to DO ITYou will want to do digital preservation if you have a lot of digital content collected over years a specified responsibility and resources for that

content an understanding of how that content is used now how it will be needed in future, how the type of content you collect may change

going forward

Page 3: EPrints Preservation: Why we need Preservation Planning

Another take on digital preservation

Digital preservation is identifying what’s required of your repository content tomorrow, and being ready to serve that requirement - or the day after, or the month after. In other words, you can extend this to whatever timescale matters. You can work out everything else from these parameters.

Conversely: what will your content profile look like over the timescale that matters to you? Will it change substantially?

Page 4: EPrints Preservation: Why we need Preservation Planning

Digital repositories diversifying: institution-wide outputs

Science Teaching

Research Arts

KeepIt exemplar preservation repositories

Page 5: EPrints Preservation: Why we need Preservation Planning

Module 1, Organisational issues, audit, selection and appraisalSchool of ECS, University of Southampton, 19 January 2010

Module 3, Primer on preservation workflow, formats and characterisationWestminster-Kingsway College, London, 2 March 2010

Module 2, institutional and lifecycle preservation costs School of ECS, University of Southampton, 5 February 2010

Module 4, Putting storage, format management and preservation planning in the repository

University of Southampton, 18-19 March 2010

Module 5, Trust, of the repository, of the tools and services it chooses University of Northampton, 30 March 2010

Slideshare http://www.slideshare.net/SteveHitchcock/presentations Source materials (ECS EPrints) http://bit.ly/afof8g

Page 6: EPrints Preservation: Why we need Preservation Planning

Work with, not against, your authors and contributors

“Preservation begins with the author” U. Rochester (USA) has written its own repository software

IR+ to give its authors a Web-based authoring workspace, but watch out for new JISC project DepositMO, connecting the user's computer desktop, especially popular apps such as MS Office, with digital repositories.

Which applications are widely used and popular among your authors? Digital content authoring tools are typically chosen on the basis of purpose, utility, familiarity (what is provided, supported by Information Systems?) Rarely are they chosen for format or preservation.

Authors will craft their output in the chosen application, but will often throw away that craft if asked to convert to another format

Page 7: EPrints Preservation: Why we need Preservation Planning

AnalyseCheck Action

• Migration• Emulation• Storage selection

• Format identification,

versioning• File validation

• Virus check• Bit checking and

checksum calculationTools

e.g. DROIDJHOVE

FITS

Preservation planningCharacterisation:Significant properties and technical characteristics, provenance, format, risk factors

Risk analysis

ToolsPlato (Planets)PRONOM (TNA)P2 risk registry (KeepIt)INFORM (U Illinois)KB

Preservation workflow

Page 8: EPrints Preservation: Why we need Preservation Planning

Accepted repository formats: recent survey

What file formats do you accept? Do you convert any to a different format?

ALL: Accept any format.   Two: Convert everything to PDF, but store the

source files in the background for preservation reasons.

Four: Mention specifically converting Word to PDF: one seeks permission from the author to do this, and uploads as Word if permission is not granted.

One: Mentions converting ZIP files to PDF. Sue Ashby, University of Portsmouth Library, Summary of responses to IR questionnaire, JISC-REPOSITORIES, 18 February 2010

Page 9: EPrints Preservation: Why we need Preservation Planning

Some thoughts about formats

Free vs open source vs open standard:• MS Office – XML – open standard• Open Office – free – XML - open standard• PDF page representation• XML generic Web format, computational

Page 10: EPrints Preservation: Why we need Preservation Planning

1000 Ubiquity: degree of adoption of the format1001 Support: number of tools available which can access the format1002 Disclosure: extent to which the format documentation is publicly disclosed1003 Document Quality: completeness of the available documentation1004 Stability: speed and backwards-compatibility of version change1005 Ease of Identification: ease with which the format can be identified1006 Ease of validation: ease with which the format can be validated1007 Lossiness: does the format use lossy compression1008 Intellectual Property Rights: whether or not the format in encumbered by IPR1009 Complexity: degree of content or behavioural complexity supported

From PRONOM documentation (The National Archives), July 2008

Format risks

Page 11: EPrints Preservation: Why we need Preservation Planning

A group task on format risks

1. Choose two formats to compare (e.g. Word vs PDF, Word vs ODF, PDF vs XML, TIFF vs JPEG)

2. By working through the (surviving) list of format risks select a winner (or a draw) between your chosen formats for each risk category (1 point for win)

3. Total the scores to find an overall winning format4. Suggest one reason why the winning format using

this method may not be the one you would choose for your repository

Page 12: EPrints Preservation: Why we need Preservation Planning

Format risk results (from group thinking)

PDF 4 Word 1 TIFF 3 JPEG 1 XML 6 PDF 1

Page 13: EPrints Preservation: Why we need Preservation Planning

Alternative thoughts on ‘winning’ formats

We were then asked to consider why we might choose NOT to use the format that performed better for these criteria:• PDF/Word – Why not PDF? PDF is essentially a conversion format, not a source authoring format.• TIFF/JPEG – Why not TIFF? JPEG is compressed, would take up less space in storage. This factor may be crucial. Archival quality copy or a derivative?• XML/PDF – Why not XML? Many repository resources are deposited in PDF. Do people understand what they need to do with XML?

Page 14: EPrints Preservation: Why we need Preservation Planning

TIFF vs JPEG 2000? Who’s for JPEG? The major players line up1. The National Library of the Netherlands evaluated

JPEG 2000 against uncompressed TIFF (currently used) for storage capacity, image quality, long-term sustainability, functionality. JPEG 2000 is recommended as future archive format.

2. The British Library recently moved forward to migrate their 80-terabyte newspaper collection from TIFF to JPEG 2000

3. The Wellcome Library announced they will use JPEG 2000 for their upcoming digitization projectsPreservation Planning at the Bavarian State Library Using a Collection of

Digitized 16th Century Printings, D-Lib Magazine, Vol15 No. 11/12, Nov/Dec 2009, http://www.dlib.org/dlib/november09/kulovits/11kulovits.html

Page 15: EPrints Preservation: Why we need Preservation Planning

TIFF vs JPEG 2000?

What does Plato say?

Preservation Planning at the Bavarian State Library Using a Collection of Digitized 16th Century Printings, D-Lib Magazine, Vol15 No. 11/12, Nov/Dec 2009, http://www.dlib.org/dlib/november09/kulovits/11kulovits.html

“At this point in time not migrating the TIFF v6 images is the best alternative.”

“However, in one year we'll look at this plan again to see if there are more tools available and

whether or not the ones we considered in this year's evaluation have been improved.”