pen to pixel: bringing appropriate technologies to digital manuscript philology
DESCRIPTION
Digital representation of medieval manuscripts and their key elements – ranging from beautiful illuminations to ancient hidden diagrams and texts – pose significant challenges for the application of appropriate technologies that are efficient and useful to scholars. While users and institutions tend to focus on the technologies and their technical capabilities, one of the most significant elements in development of digital representations of manuscripts is the ability to share and archive digital data for philology, scholarship and preservation research and analysis. Large datasets need to be created and archived with clear storage and access procedures to ensure data integrity and full knowledge of the digital content. Only with common standards, work processes and access can advanced digitization technologies be used for the study of medieval manuscripts in libraries. These are being used in institutions ranging from the ancient library of St. Catherine’s Monastery in the Sinai to the Library of Congress, Walters Art Museum and University of Pennsylvania Library in the United States. Wherever they are located, each is grappling with the challenges of collecting and preserving digital information from medieval manuscripts and codices for future generations. These libraries use advanced camera systems to capture high-resolution images of manuscripts. Some of these institutions are also conducting spectral imaging studies of manuscripts with advanced collection and digital processing to reveal erased information – such as the earliest copies of Archimedes diagrams and treatises – without damaging the upper layer of text and artwork. These technologies yield large collections of quality digital images for access and study, but the data that becomes the digital counterpart must be effectively stored, managed and preserved to be truly useful for study. Integrating complex sets of digital images and hosting them on the Web for global users poses a complex set of challenges.TRANSCRIPT
Pen to Pixel: Pen to Pixel: Bringing Appropriate Technologies to Bringing Appropriate Technologies to
Digital Manuscript Philology Digital Manuscript Philology
On behalf of the Walters Art Museum Digitization Team, especially:
Lynley Herbert, Lynley Herbert, Ariel Tabritha, Diane Bockrath, Kimber Wiegand,Ariel Tabritha, Diane Bockrath, Kimber Wiegand,
Doug EmeryDoug Emery
Supported by the US National Endowment for the HumanitiesSupported by the US National Endowment for the Humanities
Michael B. TothMichael B. TothR. B. Toth Associates
rbtoth.comhttp://www.thedigitalwalters.org/
Walters Art Museum
W.562, 2b Koran
9th century AH / 15th CE
Walters Art Museum, Baltimore, Maryland
Digital Imaging System
St. Catherine’s Monastery, Sinai
Spectral Imaging System
US Library of Congress
Spectral Imaging System
Advanced Digitization
Applied Science & Technology
…to Manuscript Studies
Manuscript Studies 20th Century and prior
Manuscript Studies 21st Century
Obscured Information
Illuminated Manuscripts
• Complex, Changing Technical Climate
• Range of Digital Products & Formats
• Need for Integrity of Entire Data Set
• Demand for Continual & Faster Access
• User Repurposing of Content
• Restrictions on Access and Use
Digital Manuscript Challenges
“…an ultimate challenge to creators and users of digital tools wishing to produce useful and reliable digital counter-parts to these medieval sources of knowledge and testimonies of intellectual creativity.”
Simplicity of Data
1. Access to data• By People
• By Machines
2. Licensing • Global Storage &
Access
Walters Online Manuscripts
The Digital Waltershttp://www.thedigitalwalters.org/
Islamic Manuscripts of the Walters Art Museum: A Digital Resource
(2008 to 2011)
Parchment to Pixel: Creating a Digital Resource of
Medieval Manuscripts(2010 to 2012)
The Digital Walters
Islamic Parchment to Pixel
Total
No. of Manuscripts
172 107 279
No. of TEI Descriptions
170 37 207
Distinct Images 46,857 34,084 80,941
Image Files 187,266 134,698 321,964
Data Size 5.99 TB 4.09 TB 10.08 TB
Over 10 Terabytes of Data
. . . and growing!
. . . and growing!
Data & Metadata
• Long-term data set viability beyond the lifetime of current technologies– Adherence to existing broadly accepted
standards– Simple, flat metadata records
• Integration of metadata with images, supporting data and scholarly products
Cataloging & Metadata
• Metadata Integrated with Digital Object– Adherence to broadly accepted standards
– Simple, flat metadata records
• Persistent Identifiers• Accepted Standards
– Standardized Vocabularies
– Metadata Schema
– xml to support conversion to other formats (e.g. MARC, MODS, EAD)
• Documentation & Preserve Standards
Data Integrity
• Image• XML Metadata • TEI Catalog• License
• Cataloging• Metadata • File Format • Imaging and Color• Resolution or Fidelity• Vocabulary and Geographic Names
• Foreign Language and English
• Intellectual Property• Storage• Quality and Quality Control• Others
Standardize
Preservation & Access
Owner of Archimedes Palimpsest:
• Preserve data in “flat files” – Do not tailor data for Web interfaces
• Host data on “spinning disks” – Did not want digital product to end up on media that
could become obsolete, with limited access
• Make broadly available on Internet– Do not place restrictions on use
Data Layout
ReadMe
TechnicalReadMe
DataData
SupplementalSupplemental
AccessWalters
Manuscripts
AccessOtherBooks
Digital Walters File Structure
• Manuscript level: all information that applies the manuscript as a whole, including an abstract, physical dimensions and features of the manuscript, like size, extent, collation, and binding.
• Manuscript item level: all information that applies to the intellectual divisions of the book, including the titles of works, rubrics, incipits, colophons, layout information about the written surface.
• Manuscript piece level: all information for the items imaged (i.e., binding pieces, flyleaves, and folios), including item name, folio number, and, for illuminated pieces, detailed descriptions of the art work.
Cataloging Information
Dublin Core Metadata Initiative Element Set
Manuscript DCMI Elements
• Identifier: the shelf mark for manuscripts (e.g., W.582), and the image serial number for images (e.g., W582_000001)
• Creator: always the Walters Art Museum
• Contributor: one entry for each project participant responsible for the creation of the manuscript’s data set
• Date: the date of web page or image creation
• Title: the title of the manuscript (e.g, “Walters Ms. W.579, Prayer”)
• Description: a description of the manuscript or image
• Source: source of the object used to create the image or image collection
• Type: Image for individual images; Collection for all images of a manuscript
• Format: image/tiff for images, text/html for a manuscript web page
• Subject: keywords describing the manuscript or imaged folio
• Rights: license and usage terms
License and use: UPDATED! 6 February 2013All Walters manuscript images and descriptions provided here are licensed for use under the Creative Commons Attribution-Share Alike 3.0 Unported License and the GNU Free Documentation License.You are free to download and use the images and descriptions on this website under the licenses named above. You do not need to apply to the Walters prior to using the images. We ask only that you cite the source of the images as the Walters Art Museum.Additionally, we request that a copy of any work created using these materials be sent to the Curator of Manuscripts and Rare Books at the Walters Art Museum, 600 N. Charles Street, Baltimore, MD 21201, [email protected] these terms mark a change from our previous license, which placed a noncommercial restriction on the use of these materials. The noncommercial restriction no longer applies, and this license supersedes the previously advertised license, and replaces that found in many of the archival TIFF image headers.This change follows the Walters Art Museum’s licensing policy. More information on the Walters’ intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.
License and use: UPDATED! 6 February 2013All Walters manuscript images and descriptions provided here are licensed for use under the Creative Commons Attribution-Share Alike 3.0 Unported License and the GNU Free Documentation License.You are free to download and use the images and descriptions on this website under the licenses named above. You do not need to apply to the Walters prior to using the images. We ask only that you cite the source of the images as the Walters Art Museum.Additionally, we request that a copy of any work created using these materials be sent to the Curator of Manuscripts and Rare Books at the Walters Art Museum, 600 N. Charles Street, Baltimore, MD 21201, [email protected] these terms mark a change from our previous license, which placed a noncommercial restriction on the use of these materials. The noncommercial restriction no longer applies, and this license supersedes the previously advertised license, and replaces that found in many of the archival TIFF image headers.This change follows the Walters Art Museum’s licensing policy. More information on the Walters’ intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.
• /manuscript: top-level container of metadata for a manuscript’s images• /manuscript/image_object: description of the manuscript, primarily Dublin Core metadata,
with the number of images captured in the imageCount element• /manuscript/images: container for the manuscript’s image data• /manuscript/images/image: information about a single capture and its derivatives, including:
– /manuscript/images/image/index: the order of the image in the set, beginning with 0– /manuscript/images/image/image_subject: the folio number or name of the piece imaged
• /manuscript/images/image/capture: detailed information about the image’s capture extracted from the imaging software database
• /manuscript/images/image/masterDerivation: description of how the archival TIFF image was generated from the camera raw file, including cropping and color correction information
• /manuscript/images/image/jhoveData: XML output of the JHOVE utility run on the archival TIFF file
• /manuscript/images/image/derivative: three elements containing cropping and scaling information needed to generate the 300 PPI, SAP, and thumbnail files from the archival TIFF
Metadata xml Information
/manuscript/manuscript
xml Model
/image_object/image_object
/manuscript/
images
/manuscript/
images
/image/image
/image/image
/image/image
/image/image
/capture/capture
/capture/capture
/capture/capture
/capture/capture
Preserve Standards
• Transfer & archive digital data for research and analysis by the curatorial, scholarly, preservation and imaging communities
• Clear access procedures Ensuring data integrity for digital storage
repositories, Preventing introduction of mislabeled and
incorrect metadata
Standard Workflows for Data Management
Quality Control
• Data Quality– Automate data handling to avoid error– Audit trail for manual data manipulation
• Quality Management– Implement processes for quality review– Verification and Validation
• Documentation– Define metrics &
quality goals
• Internal Digital Asset Management System– Internal Server
• Image Files• Catalog Data
• Access Infrastructure• Security• Backup
– Internet Systems Consortium
Data Management System
Dublin Core Metadata Initiative(DCMI)
Dublin Core Metadata Initiative(DCMI)
Digital Representation
e.g. TIFF Image
Digital Representation
e.g. TIFF Image
TEITEI
Johns Hopkins Metadata Application
Metadata(METS)
Metadata(METS)
PreservationMetadata: Implementation Strategies(PREMIS)
EventEvent
Metadata(METS)
Metadata(METS)
RequestEvent
Agent Agent
IDR Access Model
Digital Representation
e.g. TIFF Image
Digital Representation
e.g. TIFF Image
Preservation Heresy:
The Digital information is closer to the original than the Artifact itself
Preservation Heresy:
The Digital information is closer to the original than the Artifact itself
Preservation of the Data
<“I don’t use the parchment. The parchment is gone! As far as the scholars are concerned, there is no parchment. You only work from digital images on the laptop – that’s the only thing that matters for the reading.” – Dr. Reviel Netz, 14 Jan WYPR
“I don’t use the parchment. The parchment is gone! As far as the scholars are concerned, there is no parchment. You only work from digital images on the laptop – that’s the only thing that matters for the reading.” – Dr. Reviel Netz, 14 Jan WYPR
What Will Happen to the Data?
“There’s a big technical issue that has me worried. The information on the Net is not all simple text. It’s structured, whether it’s Microsoft Word documents or PDFs. That means the information is only really accessible if you understand how to interpret the bits. What happens when files are there and we don’t know how to interpret them anymore? “If you have a CD but the form isn’t known anymore. I have 5 1/4-in. diskettes, but nothing to read them. Even 3 1/2-in. diskette readers are becoming hard to come by. The physical source media change. We may lose the ability to read them.”
Vint Cerf, Google Internet Evangelist, recipient of US Presidential Medal of
Freedom, and basic architecture of the Internet. July 30, 2007 (Computerworld)
“There’s a big technical issue that has me worried. The information on the Net is not all simple text. It’s structured, whether it’s Microsoft Word documents or PDFs. That means the information is only really accessible if you understand how to interpret the bits. What happens when files are there and we don’t know how to interpret them anymore? “If you have a CD but the form isn’t known anymore. I have 5 1/4-in. diskettes, but nothing to read them. Even 3 1/2-in. diskette readers are becoming hard to come by. The physical source media change. We may lose the ability to read them.”
Vint Cerf, Google Internet Evangelist, recipient of US Presidential Medal of
Freedom, and basic architecture of the Internet. July 30, 2007 (Computerworld)
Digital Preservation
Impermanence of Digitized Data • Dynamic technology, media and
formats• Rapid obsolescence• Regular reformatting required
• Ensure utility of data• Broad distribution to service providers • Standardized formats & encoding
License
All artworks in the photographs are in public domain due to age. The photographs of two-dimensional objects are also in the public domain. Photographs of three-dimensional objects and all descriptions have been released under the Creative Commons Attribution-Share Alike 3.0 Unported License and the GNU Free Documentation License.
You are free to download and use the images and descriptions on this website under the licenses named above, but if you desire digital images at a higher resolution, for scholarly or commercial publication, please contact our photo services department.
Trusted Digital Repository
• Compliance with the Reference Model for an Open Archival Information System (OAIS)
• Administrative responsibility
• Organizational viability
• Financial sustainability
• Technological and procedural suitability
• System security
• Procedural accountability
Michael B. TothMichael B. TothR. B. Toth Associates
rbtoth.com
Future Opportunities