leap ii institute - unt digital library/67531/metadc848616/m2/1/high_re… · leap ii institute...
TRANSCRIPT
LEAP II INSTITUTEDigital Curation and Lifecycle
management Workshop
May 17, 2016
Daniel Gelaw AlemnehDigital Curation Coordinator,
University of North Texas
Brigham Young University
Hawaii Smith Library
Hawaii
2
Aloha!
• Welcome and Introduction
– Participants, Facilitators, and Guests
• Pre-Test Survey
– Post-Test follow-up
• Review of Agenda and Plan for the day
– The Day’s Schedule
– Expected Outcomes
Housekeeping
PRE-WORKSHOP SURVEY
• Do you currently work directly with digital contents?▢ Yes ▢ No
• If Yes, how long have you been working with digital projects?___________
• Please rate your knowledge on digital curation and lifecycle management concepts.▢ No knowledge ▢ Some knowledge
▢ Average knowledge ▢ Above average knowledge
▢ Expert knowledge
• What do you hope to learn at this workshop?____________________________________________
____________________________________________
The Day’s Schedule
8:00 – 9:00
Welcome and Introduction
-Collect Pre-test
Introduction to Digital Library/Curation
- Definition and a brief history
9:00 – 10:15
Types and Features of digital libraries
- Portal to Texas History - (PTH)
- Pacific Digital Library - (PTL)
Karleen Samuels
10:15 – 10:30 Break
10:30 – 11:15
Digital Library of the Caribbean -
Laurie N. Taylor, DLOC, UF
11:15 – Noon
Digital Lifecycle Management
- Pre-Ingest Workflow
- Tools and Quality Assurances
MORNING AFTERNOON
1:00 – 1:30
Mountain West Digital Library –
Anna Neatour, MWDL
1:30 – 2:00
Metadata for Curation
- Hands on and Activity Time
2:00 – 3:00
Digital Library of free items
Jefferson Bailey, Director of Web
Archiving, Internet Archive
3:00 – 3:15 Break
3:15 – 4:15
Digital Preservation: Theory & Practice
4:15 – 5:00
Collaboration, Rights, Advocacy, Policy
Michael Aldrich, University Librarian, BYU
5:00 – 6:00
Questions & Answers, and Reflections
- Post-Test
Expected Outcomes
• At the end of the workshop, you will be able to develop, evaluate, or apply digital library technologies in your work environment.
• You will gain hands on experience with a basic initial pre-ingest activities and understand the roles of such processes within the digital curation lifecycle.
Digital library/Digital
Curation -
Definition and brief history of DL
development
• Digital technologies provide scholars
with access to diverse and previously
unavailable contents that span various
formats, myriad technologies across
institutions and nations
Technology and Trends
Intern
et Penetratio
n p
er Po
pu
lation
(Sou
rce: http
://ww
w.in
ternetw
orld
stats.com
/stats.htm
)
Technology and Trends
• Information creation, organization, retrieval, use, and preservation is becoming more complex.
• User as creator, annotator, indexer, searcher, and eventual user of his/her content
• Visualization of the information space instead of a ranked list of search results
Technology and Trends
Digital libraries and supporting technologies
have now matured to the point where their contents are incorporating complex and
dynamic resources and services.
Digital Library
• A digital library is an online collection of digitalobjects, of assured quality, that are created orcollected and managed according to internationallyaccepted principles for collection development andmade accessible in a coherent and sustainablemanner, supported by services necessary to allowusers to retrieve and exploit the resources. (IFLA)
• Methods of building digital collections:– digitization, converting analog to digital form
– acquisition of born digital item
– access to external materials by providing pointers
Source: IFLA/UNESCO Manifesto for Digital Libraries: http://www.ifla.org/publications/iflaunesco-manifesto-for-digital-libraries
Digital Library
• Various repositories may have different architectures.
– Digital Repositories offer a convenient infrastructure through which to store, manage, re-use and curate digital materials
• There are many questions and factors to consider associated with particular choices in implementing particular repository System:
– Locally managed repositories.
– Commercially managed repositories.
– Consortial repositories and other hybrid Options.
Repository Platforms• ArchivesSpace: - is an archives information management application
for managing both physical and digital archival holdings.• CONTENTdm:- is a digital collection management system and hosting
service.• Dspace:- is an institutional repository system which enables easy
deposit, preservation, and access for all types of digital content.• Eprints:- provides digital repository software that is intended to
create a highly configurable web-based repository. • Fedora:- provides the back-end foundation for digital repository
systems responsible for managing and preserving all types of digital content.
• Greenstone:- is a suite of software for building and distributing digital library collections. It provides a way of organizing information and publishing it on the web or on removable media such as DVD and USB flash drives.
Source: DCC: http://www.dcc.ac.uk/resources/external/category/repository-platforms#sthash.mLzDsJV8.dpuf
• The open access (OA) movement is part of the
broader "open knowledge" or "open content"
movement that transforms scholarly
communication
– OA is provision of unrestricted online access to
results/outputs of research and development
– A massive open online course (MOOC) is a large
scale, open-access re-imagining of the more
traditional forms of e-learning
The Open Access Movement
Visitors to Popular MOOCs - Udemy (www.udemy.com) by Country.
(Source: www.alexa.com/siteinfo/udemy.com, as of April 10, 2015)
17
The Emerging and Future Roles of Academic Libraries
(ARL Scenarios) (Creative Commons BY NC ND)
Figure 5. Gross Domestic Expenditure on R&D (GERD)
(Source: http://www.battelle.org/docs/tpp/2014_global_rd_funding_forecast.pdf )
19
Digital Curation and Preservation
• Digital curation is the management, preservation, and enrichment of digital resources.
• Involves maintaining, preserving, adding value, and facilitate use and re-use throughout its lifecycle and over time.
Digital Curation and Preservation
• Digital Preservation is defined as the management process of ensuring digital objects and information are accessible over the long term.
• Development of standards, format compatibility, format migration, and systems interoperability are important aspect of digital preservation process.
Source: Guidance Document for Lifecycle Management of ETDs: http://digital.library.unt.edu/ark:/67531/metadc282598
Digital Curation Workflows Considerations
Source: A Digital POWRR Workshop: http://digitalpowrr.niu.edu/
Five Basic Aspects of Digital Object
Source
ALA
s’ 9
Prin
cip
les o
f D
igita
l Co
nte
nt
Source
ALA
s’ 9
Prin
cip
les o
f D
igita
l Co
nte
nt
Source
NISO’s Nine Principles that apply to good digital collections:
Source
1. A good digital collection is created according to good digital collection is created according to an explicit collection development policy.
2. 2. Collections should be described so that a user can discover characteristics of the collection, including scope, format, restrictions on access, ownership, and any information significant for determining the collection’s authenticity, integrity & interpretation.
3. A good collection is curated, which is to say, its resources are actively managed during their entire lifecycle.
4. A good collection is broadly available and avoids A good collection is broadly available and avoids unnecessary impediments to use. Collections should be accessible to persons with disabilities, should be accessible to persons with disabilities, and usable effectively in conjunction with adaptive technologies. adaptive technologies.
5. A good collection respects intellectual property rights.6. A good collection has mechanisms to supply usage data and other data that allows usage
data and other data that allows standardized measures of usefulness to be recorded. 7. A good collection is interoperable. 8. A good collection integrates into the users own workflow. 9. A good collection is sustainable over time.
Features, Design, Architecture,
and Characteristics of Digital
Libraries –
– The Portal to Texas History (PTH):
http://texashistory.unt.edu/
– Pacific Digital Library:
http://www.pacificdigitallibrary.org/ (Karleen
Samuel)
The Portal to Texas History:http://texashistory.unt.edu/
29
UNT’s Digital Resources Accessed from 200+ Countries http://digital.library.unt.edu/explore/collections/UNTETD/browse/
30
Ro
bu
st Arch
itectu
re fo
r a Ce
ntralize
d P
ortal Syste
m
31
32
The Pacific Digital Library (PTL):http://www.pacificdigitallibrary.org
By: Karleen Samuels
33
The “Leaders for Pacific Libraries” team who created the framework for the Pacific Digital Library(Source: http://www.pacificdigitallibrary.org/cgi-bin/pdl?e=p-000off-pdl--00-2--0--010---4-------0-1l--10en-50---20-about---00-3-1-00bySR-0-0-000utfZz-8-00&a=p&p=info
34
Digital Library Of the Caribbean (dLOC):
http://www.dloc.com/
By: Laurie N. Taylor, dLOC & UF
Digital Curation and
Lifecycle Management –
Workflow, Organization, and Preservation
• Various tools and services:
– Some tools/services do specific tasks – Microservices
• Tools that facilitate front-end ingest processes or back-end storage and preservation activities (like virus checking,
– Some tools/services combine multiple microservices –Macroservices
• There are some services that will perform all activities (like Dspace Direct, Archivematica + DuraCloud)
Tools and Best Practices
• Digital Projects – Hardware:
– Equipment needed for digitizing existing physical materials and for creating new digital content.
• https://www.library.unt.edu/digital-projects-unit/equipment
• Digital Projects – Software:
– A variety of digitization software needed in daily digital imaging and processing operations
• https://www.library.unt.edu/digital-projects-unit/software
Tools and Best Practices
Digital Curation Workflows and Micro/Macro-Services
Source: A Digital POWRR Workshop: http://digitalpowrr.niu.edu/
• The future usability of any preserved digital content will depend in part on how well organized and integrated the body of content was when it was first ingested.
The Classic Information Retrieval (IR) Model
Source: (Modified from) Bates, M. J. (1989). The design of browsing and berrypicking
techniques for the online search interface. Online Review, 13(5), 407-424.
40
Metadata Roles in Digital Curation
and/or Preservation
What is Metadata?
• Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.
Source: NISO. Understanding Metadata. http://www.niso.org/publications/press/UnderstandingMetadata.pdf
42
Types of Metadata
• Descriptive metadata describes a resource
for purposes such as discovery and
identification.
• Structural metadata indicates how compound
objects are put together, for example, how
pages are ordered to form chapters.
• Administrative metadata provides
information to help manage a resource, such
as when and how it was created, file type and
other technical information, and who can access it.
43
• Appropriate metadata increases discoverability
• Owners or creators may know their materials well, but they may not have the required knowledge to appropriately describe their items or to assign appropriate subjects.
Metadata Roles in Digital Curation and/or Preservation
44
Capturing Metadata
Metadata Quality
• Metadata quality, consistency, and interoperability can have major implications for the discovery and long-term preservation of digital resources.
• Poor Metadata Quality:
• Ambiguities
• Poor recall
• Poor precision
• Inconsistency of search results
Metadata Quality: Factors Influencing Metadata Quality
• Local Requirements:
• Objects
• Granularity
• Functionality
• Collaborative Requirements:
• Diversity of Users
• Interoperability
• Digital Rights Issues
47
Digital Curation Workflows Considerations
Metadata Quality Considerations
• Determine level of quality required
• Determine nature of gap and how to close it
• Machine verses human error handling
• Compromise
• Prioritize
• Test the workflow
49
Determining the breadth and depth of your metadata
• Exhaustivity vs. Specificity
• How do we know when exhaustivity has gotten to a
degree where the index is no longer being
improved?
• Why inverse relationship?
• Metadata for multi-media items
• A picture is worth a thousand words
• “Thumbnail is the best metadata.” which describes
complex events or stories with just a single still
image.
51
A Phased Approach to Ensuring Long-Term Access to Digital Resources
Activity Time!
Activity: Metadata Creation
• You will practice with our real life project: Patent Metadata• Sign into the editing system:
http://edit.texashistory.unt.edu
• You need to familiarize yourself with our metadata guidelines: http://www.library.unt.edu/digital-projects-unit/patent-metadata• Our video tutorial provides an example record from
start to finish and walks through the creation of metadata record for such (patent) content: http://www.library.unt.edu/digital-projects-unit/patent-metadata#video-tutorial
55
1 2 3
A B
• A alone. All documents that have A.
-E.G. apples
– (Shade ???)
Activity: Boolean Operation
56
1 2 3
A B
• A not B.
- E.G. apples NOT oranges
– (Shade ???)
Activity: Boolean Operation …
57
1 2 3
A B
• A OR B.
- E.G. apples OR oranges
– (Shade ???)
Activity: Boolean Operation …
58
1 2 3
A B
• A AND B.
- E.G. apples AND oranges
- (Shade ???)
Activity: Boolean Operation …
59
1 2 3
A B
A alone. All documents that have A.
Shade 1 & 2. E.G. apples
1 2 3
A B
A AND B. Shade 2
apples AND oranges
1 2 3
A B
A OR B. Shade 1, 2, 3
apples OR oranges
1 2 3
A B
A NOT B. Shade 1
apples NOT oranges
Four b
asic Boolean
operatio
ns
:
60
Complex Statements
1
2
3
45
6
7
A B
C
• (A OR B) AND C
• Shade ???
• (A OR B) NOT C
• Shade ???
61
Complex Statements
1
2
3
45
6
7
A B
C
• (A OR B) AND C
• Shade 4,5,6
- (apples or oranges) AND Texas
• (A OR B) NOT C
• Shade 1,2,3
- (apples or oranges NOT Texas
62
Precision
• Precision is the ratio of the number of
relevant records retrieved to the total
number of irrelevant and relevant records
retrieved.
- Example: If 100 documents are retrieved and 50 of
those items are relevant to the user’s query, the
precision ratio is 50 to 100 (50%).
63
Recall
• Recall: is a simple quantitative ratio of
relevant records retrieved to the total
number of relevant items potentially
available.
- Example: If there are100 relevant documents in the
library that are relevant to the user’s needs and the
system retrieves 75, then the recall ratio is 75 out of 100
(75%). I.e. recall for this search is 75 percent effective.
64
Inverse Relationship of Precision and Recall
65
66
67
68
Internet Archive: https://archive.org/
By: Jefferson Bailey,Director of Web Archiving, Internet Archive
Digital Preservation:
Theory and Practice
Levels
of P
reservatio
n
Source: National Digital Stewardship Alliance (NDSA): http://www.digitalpreservation.gov/ndsa/activities/levels.html
• Many solutions and approaches
recommended by different communities:
–OAIS (Open Archival Information
Systems)
• SIPs, AIPs, DIPs
– PLANETs
–PREMIS
–DCC
Digital Preservation: Best Practices
Various Best practices and Conceptual Data Models for Digital Curation or Preservation: OAIS
Source: Consultative Committee for Space Data Systems. Reference Model for an Open Archival Information System (OAIS): http://public.ccsds.org/publications/archive/650x0m2.pdf
• Reference Model for an Open Archival Information System (OAIS) Functional Entities
73
Various Best practices and Conceptual Data Models for Digital Curation or Preservation: Planets
Source: The Open Planets Foundation: http://www.openplanetsfoundation.org/
PLANETS (or the Preservation and Long-
term Access through Networked Services)
project addressed core digital preservation
challenges. The primary goal for
Planets was to build practical services and tools to help ensure long-term access to digital cultural and
scientific assets.
74
Various Best practices and Conceptual Data Models for Digital Curation or Preservation: PREMIS
Source: PREMIS: http://www.loc.gov/standards/premis/
Digital materials require constant maintenance and migration to new formats as technology changes. In order to survive into the future, digital objects need preservation metadata that can exist independently from the systems which were used to create them. Without preservation metadata, digital material will be lost.
PREMIS (or PREservation
Metadata: Implementation
Strategies) is an international working group concerned with
developing metadata for use in digital preservation.
75
Various Best practices and Conceptual Data Models for Digital Curation or Preservation: DCC
Source: DCC: http://www.dcc.ac.uk/
The Digital Curation Centre (DCC) was
established to help solve the extensive challenges of digital
preservation and digital curation and to
lead research, development, advice, and support services for higher education
institutions in the United Kingdom.
76
No single Approach or Model Addresses all Aspects of Digital Curation or Preservation Issues
77
Copyright in Digital Library:
By: Michael Aldrich,University Librarian at BYU-Hawaii
78
Challenges and Opportunities:
Concluding Remarks
79
Challenges of Digital Curation
• Rate of creation of new data and data sets• Storage format evolution and obsolescence• Maintaining accessibility to data through links and
search results• Comparability of semantic and ontological
definitions of data sets• Different community inclined to share items at
different level of normalization
80
Challenges of Digital Curation
81
Response to Digital Curation Challenges
Various community engaged in different collaborative activities to prepare future professionals :
• Dedicated academic courses and programs (DCM at UNT, DCEP at UIUC, Data Curation at UNC-CH)
• Dedicated symposia and workshops (DigCCurr, DataRes )
• Dedicated publications(International Journal of Digital Curation)
• TRAC Certification (Trustworthy Repositories Audit & Certification)TDR ISO 16363 (Trustworthy Digital Repository ISO Standard)
• Specialized research institutions, Consortia(DCC, iCAMP, DataRes , etc.)
UN
T’S
TR
AC
Co
nfo
rm
an
ce D
ocu
men
t: h
ttp:/
/w
ww
.library.u
nt.e
du
/d
igita
l-librarie
s/
tru
ste
d-d
igita
l-rep
osito
ry
• A good digital collection is sustainable over time, which is to say its individual items are curated and actively managed during their entire lifecycle in both a trusted and cost effective manner.
• Successful digital curation will mitigate digital obsolescence, keeping the information accessible to users indefinitely.
Summary
Conclusion
• Considering the multiple stakeholders in the digital ecosystems, a collaborative approach is the only way to addressing Scholarly Communications and Digital Curation Challenges!
Conclusion
• Yes, with others, you can accomplishwhat you cannot accomplish alone.
• “When spider webs unite, they can tie up a lion.“ (Ethiopian proverb)
Even though many of you may not become digital curators, we hope this workshop will make you (or has already made you) aware of the
nuances of digital curation and preservation that are
often overlooked!
88
Glossary
• Aggregator – A service that harvests content or metadata from multiple organizations to provide another mode of access.
• Born-Digital – “An item is born-digital if it has been generated entirely electronically by using a word processor” and/or electronic hardware such as a digital camera.
• Digital Curation – The management, preservation, and enrichment of digital resources.
• Digital Preservation – “The management process of ensuring digital objects and information are accessible over the long term. Development of standards, format compatibility, format migration, and systems interoperability are important aspect of this process.
• Metadata – Information about an object. • Open Access (OA) – “Information readily available on the Web at
no cost to users and without access restrictions. Source: Guidance Document for Lifecycle Management of ETDs: (Their complete glossary is found at ): http://digital.library.unt.edu/ark:/67531/metadc282598
89
Digital Curation / Preservation - Post-Workshop Survey
1. Name: ____________________________________
2. How satisfied are you with the workshop overall?Not satisfied Slightly satisfied Satisfied Very satisfied Extremely satisfied
▢ ▢ ▢ ▢ ▢
3. Having completed the workshop, please rate your knowledge on Digital Curation and lifecycle management concepts for digital resources:No knowledge Some knowledge Average knowledge Above average knowledge Expert knowledge
▢ ▢ ▢ ▢ ▢
4. What was the most important information presented at the workshop?
5. What information was not presented at the workshop that should have been?
6. Any other comments:__________________________
90
• American Library Association (2007). Principles for Digital Content. Retrieved May 10th, 2016 from: www.ala.org/ala/aboutala/offices/oitp/PDFs/Principlesfinalfinal.pdf
• Digital Curation Centre (DCC) (2007). What is Digital Curation? Retrieved May 10th, 2016 from: http://www.dcc.ac.uk/about/what
• Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407-424.
• Godby C. J. and Deneberg R. (2015). Common Ground: Exploring Compatibilities Between the Linked Data Models of the Library of Congress and OCLC . Retrieved May 10th, 2016 from: http://www.oclc.org/content/dam/research/publications/2015/oclcresearch-loc-linked-data-2015.pdf
• D-Lib Magazine. Retrieved May 10th, 2016 from: http://www.dlib.org/
• IFLA/UNESCO Manifestó for Digital Librareis. Retrieved May 10th, 2016 from: http://www.ifla.org/publications/iflaunesco-manifesto-for-digital-libraries
• Netcraft (2016). April 2016 Web Server Survey. Retrieved May 10th, 2016 from: http://news.netcraft.com/archives/web_server_survey.html
• NISO (2004) Understanding Metadata. Retrieved May 10th, 2016 from: http://www.niso.org/publications/press/UnderstandingMetadata.pdf
• OCLC (2007). Sharing, privacy and Trust in our Networked World. Retrieved May 10th, 2016 from: http://www.oclc.org/reports/pdfs/sharing.pdf
• TechSmith, Co. (2008). “UX 2.0: Any User, Any Time, Any Channel.” Retrieved May 10th, 2016 from : http://download.techsmith.com/morae/docs/UserExperience2_0.pdf
• UNT Libraries Metadata Initiative page. Retrieved May 10th, 2016 from: http://www.library.unt.edu/digitalprojects/metadata
• UNT Libraries Patent Metadata Guide. Retrieved May 10th, 2016 from: http://www.library.unt.edu/digital-projects-unit/patent-metadata
• UNT Libraries’ Digital Projects – Hardware (Equipment). Retrieved May 10th, 2016 from: https://www.library.unt.edu/digital-projects-unit/equipment
• UNT Libraries’ Digital Projects – Software. Retrieved May 10th, 2016 from: https://www.library.unt.edu/digital-projects-unit/software
References & Web Sites Consulted