a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
From Digital Creation to Digital Curation
Managing Digital Cultural Heritage Resources
Maureen Pennock
Digital Curation Centre, UKOLN, University of Bath
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Today’s Talk• Introductions• The UK Digital Curation Centre• Curation and the digital life-cycle• Issues in developing and managing digital
collections• Helpful projects and initiatives• Discussion
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
The UK Digital Curation Centre
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Digital Curation
• Digital Curation, broadly interpreted, is about maintaining and adding value to a trusted body of digital information for current and future use
• The active management and appraisal of data over the entire life-cycle
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
The DCC• Launched in 2004• Established to help solve the extensive
challenges of digital preservation and curation, and to provide research, advice and support services to UK institutions
• Consortium project with 4 main partners• 4 main teams distributed across the 4 UK
locations• Funded by JISC & the e-Science Core
Programme
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Organisation to Engage & Collaborate
Industry
research collaborators
standards bodies
testbeds& tools
communities of practice: users
community support & outreach
research
development co-ordination
service definition & delivery
management & admin support
Collaborative Associates Network of DataOrganisations
curation organisations eg DPC
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
DCC Outreach• Raising Awareness and Dissemination
• Website (http://www.dcc.ac.uk )• International Journal of Digital Curation
• Annual International Conference
• Understanding Users and their Needs• Requirements gathering
• Associates Network
• DCC Forum
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
DCC Services• Information Services
• Community-developed Digital Curation Manual• Briefing Papers & FAQ’s• Technology Watch, Standards Watch, Legal Watch• Case Studies• Best Practice Checklists
• Advisory Services• Events: information days, workshops, training• Helpdesk
• Audit and Certification Services
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
DCC Research• Annotation in Databases• Data archiving• Socio-economic and legal issues• Metadata extraction and curation• Ontologies and data dictionaries• Provenance and databases• Data transformation, integration and publishing• Supporting technologies• Networks of trusted digital repositories• Organisational and cultural challenges to digital
curation
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
DCC Development• DCC Approach to Digital Curation (white
paper) – sets out the path for development activities:• Monitoring international standards• Creating testbeds for digital curation tools• Development of recommendations for tools and
methods for generating Representation Information
• Development of a Representation Information Registry (DCC RIR)
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
Digital Curation and the Life-Cycle
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Why a life-cycle approach?• Curation is a life-cycle approach to management and
preservation of digital objects, necessary because:• Digital materials are fragile & susceptible to change from
technological advances throughout their life-cycle• Each stage can impact on subsequent stages• Traditional management processes can need adapting for
digital materials with different requirements.
• The life-cycle approach enables continuity and provenance despite technological and organisational contextual change
• Maximises investments and potential
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Life-Cycle model
CreationAccess & Re-use
Selection
Active Use
Acquisition
Storage & Preserv-
ation
Digital Object
• Life-cycle model differs slightly depending on the context (e.g. libraries/ archives/museums)
• This generic model addresses libraries
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
From Creation to Curation• Life-cycle approach facilitates continuity and
control over the different stages • Each stage can impact on the following one:
• Creation impacts on many stages, as the way a resource is created affects the way it can be curated and its sustainability
• Creation problematic in a digital heritage context as you may not have control over the way resources are created
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
Issues in Developing and Managing Digital Collections
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
The Digital Library: Discuss• What exactly is a digital library?
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
The Digital Library: Discuss• What exactly is a digital library?
• A library accessible over the internet? (but to what extent?)• A library with (only?) digital holdings?• A cutting-edge institution that maximises IT potential? (can
be achieved multifariously)• An added-value service?
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
The Digital Library: Discuss• What exactly is a digital library?
• A library accessible over the internet? (but to what extent?)• A library with (only?) digital holdings?• A cutting-edge institution that maximises IT potential? (can
be achieved multifariously)• An added-value service?
• Professional disparity over the definition (especially the difference between this and a digital archive)
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
The Digital Library: Discuss• What exactly is a digital library?
• A library accessible over the internet? (but to what extent?)• A library with (only?) digital holdings?• A cutting-edge institution that maximises IT potential? (can
be achieved multifariously)• An added-value service?
• Professional disparity over the definition (especially the difference between this and a digital archive)
• More than just a search engine and an access mechanism – more than just the Internet!
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Potential digital library resources
Digitised• Maps and Posters• Photographs• Original texts – books,
manuscripts, newspapers, journals
• Audio-visual material• Microfilm
Born Digital• Maps and Posters• Photographs• E-Publications• Audio-visual material• Websites (which will
invariably contain multi-media objects)
• Cataloguing data?
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Issues
• Range across the life-cycle• Involves different stakeholders in each• Communication essential
Technical Preservation Organisational
Legal Financial Cultural
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Technical issues (1)• Harvesting & Accession • Storage – which model to implement?• Metadata – what metadata are needed?• Security – protection from unauthorised or
malicious access• User access – what tools are needed?
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Technical issues (2)• Preservation
• Objects highly environmentally dependent• Software/hardware changes many times during the lifetime
of the records – every five years?• Content may be altered if action is undertaken• Content will become inaccessible if action is not taken
• Preservation strategies & tools• Fragility of storage media• Media obsolescence• File deterioration• Hardware & software obsolescence
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Organisational and Cultural issues• Organisational and cultural infrastructure not
usually geared towards digital longevity• Digital cultural heritage resources are often
primarily recognised as resources for the ‘here and now’
• ‘Here and now’ access practices ≠ longevity!• Preservation issues not recognised/regarded• Staffing – expansion of duties or new staff?• Need for senior managerial support, e.g policy,
finances…
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Financial issues• Financial:
• Not just a one-off ‘digitising’ or ‘collecting’ cost• Preservation activity can require ongoing financial
commitment• Who will pay – now and in the future? • What are the cost benefits?• Where’s the business model?• Will access be payment-restricted?
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Legal issues• Legal:
• Meeting legal obligations: data protection, copyright, database right…
• Who is responsible?
• Copyright particularly relevant, as copying can be a vital act in preservation and access• Impact of DRM on copying abilities• A new definition of copying needed?
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Addressing the issues• Follow progress in national initiatives• Collaborate & communicate• Engage the consumer
• Success requires commitment:• At a policy level (integrated)• At a managerial level (support/backing)• At a staffing level (actions/activities)
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Strategy (1)• A written policy and strategy to support activities and
help secure resources• Take a life-cycle approach to support curation and
preservation planning• If creating resources, provide good practice guidance for
sustainability (eg when digitising or accepting digitised resources)
• Assess collection/selection criteria – are they still valid? Do they need expanding? Identify possible resources
• Digital resources can complement & enhance physical ones
• Be aware of externally produced digital resources (eg websites); check other heritage collections before gathering!
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Strategy (2)
• Identify legal restraints in collection/management/access• Can value be added to resources during acquisition?• Store objects in a secure environment• Plan for preservation activities to maintain access to
authentic resources over time and avoid incurring extra costs
• Determine access and user requirements• Implement integrated approach to collection accessibility
• Adapt and learn from national and other leading activities
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
Helpful projects and initiatives for preservation and accessibility
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
National Library of Scotland• Developed several digital and web-accessible
themed collections:• Propaganda: A weapon of war (posters/images)• Maps• First Scottish books• Robert-Louis Stevenson (letters, sketches, photos)• Muriel Spark – the story• Churchill: The evidence (contains school resources)
• Trusted Digital Repository• Part of the UK Web Archiving Consortium (UKWAC)
• Selection and collection criteria for Scottish web sites• ‘Archiving the UK General Election 2005’
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
UK WAC• UK Web Archiving Consortium (6 members)
• British Library, National Library of Scotland, National Library of Wales, The National Archives, Wellcome Library, JISC
• Collects Web content selectively • Uses modified PANDAS collection/harvesting software
developed by the National Library of Australia• Underlying harvesting program is currently HTTrack• Permission is sought from site owners in advance• Persistent Identifier URLs• Single partner assumes responsibility for each site• Central repository of metadata• The collections are publicly accessible
• Website: http://www.webarchive.org.uk/
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Internet Archive• Non-profit organisation, based in U.S.• Wants to offer permanent access to digital online
materials of all types• Founded in 1996, has been collecting since then … much
content donated by Alexa Internet• Collects sites by crawling and harvesting web sites
• Sites can 'opt out' by way of robots.txt file on the web server
• Most content is freely available to the public, e.g. through the Wayback Machine
• Interface issues: only the URL indicates that the page is archived
• Website: http://www.archive.org/
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
IIPC (1)• International Internet Preservation Consortium
• Builds co-operation between the Internet Archive and national and research libraries
• Co-ordinated by the Bibliothèque nationale de France• The British Library is the only current UK member, other
national library partners include the Library of Congress, the Library and Archives Canada and the national libraries of Australia, Denmark, Finland, Iceland, Italy, Norway and Sweden
• Reflects those with current experience of Web archiving• Both working-groups and tool development• Phase II will enable new partners to join the consortium
• Website: http://netpreserve.org/
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
IIPC (2)*• Phase I - developing the IIPC toolkit
• Standards and tools for supporting:• Acquisition - archival quality crawler (Heritrix); portable
database extraction and migration tool for database-driven deep web sites (DeepARC)
• Managing collections - analytical and prioritization tools for automatically focusing harvesting; curation tools to provide a non-technical interface for selecting, monitoring and verifying archived web sites
• Collection storage and maintenance - tools for manipulating formats; a standardised storage format (WARC), standards for metadata
• Access and finding aids - browse interfaces (WERA) and search facilities (NutchWAX)
* Michael Day, IWMW 2006
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
LOCKSS (1)• Lots of Copies Keeps Stuff Safe (LOCKSS)• An ‘easy and inexpensive way to collect, store,
preserve, and and provide access to their own, local copy of authorised content they purchase’ (LOCKSS website)
• E-Journal collection and preservation system• Open Source Software• Runs on standard desktop hardware• Requires very little technical
administration
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
LOCKSS (2)• Trial and pilot projects underway
• DCC support available through helpdesk and dedicated Advisory post
• Current trial suitable only for certain titles (due to licensing arrangements with publishers)
• Private networks can be developed:• Requires technical development• Minimum of six machines necessary to
achieve desired redundancy• Suitable for, eg, online course material
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Further resources• National Library of Scotland http://www.nls.uk
• National Library of Wales http://www.llgc.org.uk/
• British Library http://www.bl.uk
• DCC website http://www.dcc.ac.uk
• UKOLN website http://www.ukoln.ac.uk • SLAINTE website http://www.slainte.org.uk/
• Digital Archives Regional Pilot (DARP) project http://www.data-archive.ac.uk/randd/darp.asp
• ‘Building and Sustaining Digital Collections’, Abbey Smith http://www.clir.org/
a centre of expertise in data curation and preservation
CILIPs Branch/Group Day :: 27 September 2006 :: Dundee
Thank You & Discussion
Maureen Pennock
Join the DCC Associates Network (it’s free!)
http://www.dcc.ac.uk/associates/