collaboration on appraisal and collection development for the long-term preservation of digital...

31
a centre of expertise in data curation and preservation Appraisal in the Digital World | Rome, 15-16 October 2007 Collaboration on appraisal and collection development for the long-term preservation of digital content Michael Day DCC Research Team UKOLN, University of Bath Bath BA2 7AY, United Kingdom [email protected]

Upload: michael-day

Post on 17-Nov-2014

4.247 views

Category:

Education


2 download

DESCRIPTION

Slides from a presentation given at: Appraisal in the Digital World, Accademia Nazionale dei Lincei, Rome, Italy, 15-16 November 2007

TRANSCRIPT

Page 1: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

Collaboration on appraisal and collection development for the long-term preservation of digital content

Michael DayDCC Research Team

UKOLN, University of BathBath BA2 7AY, United Kingdom

[email protected]

Page 2: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

2

Presentation outline

• Different approaches to selection and appraisal• Collection development• The importance of collaboration for:

– Digital preservation– Institutional repositories

• General principles for selection and appraisal

Page 3: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

3

Approaches to selection (1)

• Fully comprehensive– “Storage is cheap. Why select?” (topic of ASIST

student chapter panel discussion, UNC, 2007)– May seem to provide a way of avoiding the

cultural bias evident in most selection regimes– But, ad hoc decisions on retention may still be

made, but maybe on pragmatic grounds (e.g., available technology, security, privacy) with little in the way of accountability

– It also does not resolve the practical question of who should be responsible for preservation

Page 4: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

4

Approaches to selection (2)

• Different professional approaches to selection– Archivists focus on “appraisal”

• Based on well-established theoretical principles

• An important part of archival practice– Other cultural heritage organisations focus on

the development and management of collections

• Based on a different set of assumptions

Page 5: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

5

Example: Web archives (1)

• Highlights differences between the archival and collection development approaches– Archivists and records managers approach Web

operations as a potential source or generator of records

• Identify best practice for managing Web records, e.g. TNA

• Mitigating organisational risk• Enhancing accountability

Page 6: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

6

Example: Web archives (2)

– International Internet Preservation Consortium• Internet Archive and national libraries• View Web as a source of “published” content

that can be harvested to enhance existing collections

• Whether highly selective (e.g. UK Web Archiving Consortium, National Library of Australia’s PANDORA archive) or broader in scope (domain capture), national library led-initiatives tend to focus on traditional collection development criteria

Page 7: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

7

Collection development (1)

• Typically focuses both on institutional objectives (e.g. “supporting the research and teaching needs of the university”) and subject needs

• Traditionally includes a range of activities:– Selection, acquisition, deselection (weeding),

disposal, preservation– Part of collection management (also includes

policies, budget allocation, collection evaluation• Most collections will change over time, e.g.

responding to changes to institutional objectives and the resources available (money and space)

Page 8: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

8

Collection development (2)

– Specific selection factors might include:• The overall purpose of the collection (e.g.

supporting education and research)• Existing subject strengths• The information needs of users• Quality, accuracy, authoritativeness,

currency, …• Value for money• Statutory requirements (e.g. for national

libraries)

Page 9: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

9

Collection development (3)

– Collection development policies• These help guide ongoing collecting activities

and form the basis for evaluation• In the library sector, these can be “highly

charged political documents and … the province of the most senior library management” (Derek Law)

• Helps to define organisational goals• “Deaccessioning” can lead to controversy

(e.g. Nicholson Baker’s Double Fold)

Page 10: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

10

Collection development (4)

• Digital resources raise new kinds of selection issues:– Defining content, e.g. understanding the

“significant properties” of resources (vitally important for making preservation decisions)

– The need for various types of metadata– Access

• The longer-term implications of licenses• User support and training needs

Page 11: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

11

Collection development (5)

– The principle that it is important to select resources early in their lifecycle

• Obsolescence leads to loss• Implicit knowledge gets lost• Metadata and documentation is hard to

(re)create retrospectively

Page 12: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

12

Collaboration on preservation (1)

• Collaborative infrastructures have long been identified as necessary for digital preservation and curation, e.g.:

• Preservation is "an ongoing, long-term commitment, often shared, and cooperatively met, by many stakeholders" (Lavoie & Dempsey, 2004)

Page 13: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

13

Collaboration on preservation (2)

• Examples:– Shared services (e.g. registries of

representation information, third-party services for bit-level preservation)

– Networks of "trust" (audit and certification)– Collaboration on policy level, e.g. on collection

development and access

Page 14: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

14

Institutional repositories (1)

• Institutional repositories require collaborative infrastructures:– Distributed services linked (for access) by metadata

harvesting• Open Archives Initiative Protocol for Metadata

Harvesting (OAI-PMH)• Data Providers (repositories) and Service

Providers (aggregators)– Potential for the development of shared services to

support repositories (Swan & Awre, Linking UK Repositories (JISC, 2006)

Page 15: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

15

Institutional repositories (2)

• Potential shared services identified by Swan & Awre (2006):– Resource discovery– Building or hosting repositories– Advisory services (e.g. on IPR, preservation)– Content creation, digitisation– Metadata capture and enhancement– Name authorities– Citation analysis and research assessment– Preservation services

Page 16: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

16

IRs and preservation (1)

• Shared services for preservation:– Assumption that not all institutions with

repositories will be able to manage long-term preservation challenges, e.g.:

• Lack of local expertise and resources• Existing availability of third party services,

e.g. provided by subject-based data centres, national libraries

• Preservation is a logical area for collaboration

Page 17: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

17

IRs and preservation (2)

• Examples:– DARE (Digital Academic Repositories) initiative

(Netherlands)• National Library of the Netherlands (KB) has

responsibility for content deposited in participating repositories

– Repository Bridge project (UK)• Demonstration of harvesting e-theses (using

OAI-PMH and METS) by the National Library of Wales

Page 18: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

18

IRs and preservation (3)

• Examples (continued):– SHERPA DP project (UK) - JISC funded

• Developed disaggregated framework for outsourcing preservation, based on the OAIS model

• Explored the packaging and transfer of content (using METS)

Page 19: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

19

IRs and preservation (4)

• Examples (continued):– Preserv project (UK) - JISC funded

• Simple model of modular services, e.g. for:– Bit-level preservation– Object characterisation and validation (e.g.

using registries like PRONOM-DROID)– Preservation Planning (risk assessments,

technology watch, etc.)– Preservation strategies (e.g. migration)

Page 20: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

20

IRs and preservation (5)

Preserv serviceprovider model(Hichcock, et al.,2007)

Page 21: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

21

IRs and collection development (1)

• Collection development issues for :– Content types

• Peer-reviewed research outputs, scientific datasets, administrative records, ...

• Will have different preservation priorities– Object types (file formats)

• Policies will have direct influence on risks (and costs) of long-term preservation, e.g.:

– Accepting anything vs. defining the specific standards to be used

Page 22: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

22

IRs and collection development (2)

– Ongoing review (and weeding) of collections• Withdrawal of content (contentious issue)• Superseded or duplicate material

– Defining preservation service levels• Different policies needed for different types of

material

Page 23: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

23

IRs and collection development (3)

• Potential areas for collaboration:– Ingest workflows

• Checking conformance with submission rules• Automated tools for format characterisation

and validation, maybe conversion (normalisation)

• Metadata enhancement, e.g. consistent forms of name

Page 24: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

24

Shared collection development (1)

• Collection development has been a traditional focus of library co-operation, e.g.:– Farmington Plan (1940s)– University of London Depository Library

• The concept of "virtual collections"– IFLA Universal Availability of Publications (UAP)

core programme• Also applies to digital collections

– OhioLINK– California Digital Library

Page 25: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

25

Shared collection development (2)

• Collaborative collection development and digital preservation– Potentially reducing unnecessary duplication of

effort– Enabling co-ordinated decisions to be made

about the redundancy and geographical distribution of content

– Also supporting the application of different preservation strategies to the same class of content

Page 26: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

26

Shared collection development (3)

– Identifying collections at risk and supporting their rescue

• In order to do these things, it may be useful to have some common understanding of what collection development and appraisal should mean in the digital era– The main appraisal activities identified by the

InterPARES Appraisal Task Force may be useful here

Page 27: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

27

InterPARES appraisal framework (1)

• 1. Compiling information– Identifying the form and contexts of records– Identifying the particular components that need

preservation– Based on solid research (not just collecting it

together in a haphazard fashion)– This information could become part of the

record’s metadata

Page 28: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

28

InterPARES appraisal framework (2)

• 2. Assessing value– Judgement based on creator’s needs and

societal needs– May be context dependent (institution specific)

• Assessing continuing value• Authenticity• Determining value

Page 29: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

29

InterPARES appraisal framework (3)

• 3. Determining the feasibility of preservation– Determining value is not enough in itself– Need also to consider whether the records are

able to be preserved as authentic records– Takes into account the organisational ability to

undertake preservation– Gathers technical information

• 4. Making the appraisal decision– Based on value and feasibility– All decisions made must be documented

Page 30: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

30

InterPARES appraisal framework (4)

• A generic framework: as developed has a focus on records, but the general principles, broadly interpreted, could be applied to other forms of content, e.g. scientific datasets, Web content

• Does not presuppose a particular preservation approach

• Encourages a focus on organisational objectives, object contexts, object value, the technical feasibility of preservation, and the determination of “significant properties”

• Helps to document the selection process

Page 31: Collaboration on appraisal and collection development for the long-term preservation of digital content

a centre of expertise in data curation and preservation

Appraisal in the Digital World | Rome, 15-16 October 2007

31

Conclusions

• The use of a consistent set of principles might help to encourage:– More consistency in documenting selection and

appraisal decisions across domains, with benefits for collaboration

– May provide insight into assessing value and preservation feasibility in specific contexts (like Web archives)