pen to pixel: bringing appropriate technologies to digital manuscript philology

49
Pen to Pixel: Pen to Pixel: Bringing Appropriate Technologies to Bringing Appropriate Technologies to Digital Manuscript Philology Digital Manuscript Philology On behalf of the Walters Art Museum Digitization Team, especially: Lynley Herbert, Lynley Herbert, Ariel Tabritha, Diane Bockrath, Kimber Ariel Tabritha, Diane Bockrath, Kimber Wiegand, Wiegand, Doug Emery Doug Emery Supported by the US National Endowment for the Humanities Supported by the US National Endowment for the Humanities Michael B. Toth Michael B. Toth R. B. Toth Associates rbtoth.com http://www.thedigitalwalters.org/

Upload: michael-b-toth

Post on 09-May-2015

232 views

Category:

Technology


0 download

DESCRIPTION

Digital representation of medieval manuscripts and their key elements – ranging from beautiful illuminations to ancient hidden diagrams and texts – pose significant challenges for the application of appropriate technologies that are efficient and useful to scholars. While users and institutions tend to focus on the technologies and their technical capabilities, one of the most significant elements in development of digital representations of manuscripts is the ability to share and archive digital data for philology, scholarship and preservation research and analysis. Large datasets need to be created and archived with clear storage and access procedures to ensure data integrity and full knowledge of the digital content. Only with common standards, work processes and access can advanced digitization technologies be used for the study of medieval manuscripts in libraries. These are being used in institutions ranging from the ancient library of St. Catherine’s Monastery in the Sinai to the Library of Congress, Walters Art Museum and University of Pennsylvania Library in the United States. Wherever they are located, each is grappling with the challenges of collecting and preserving digital information from medieval manuscripts and codices for future generations. These libraries use advanced camera systems to capture high-resolution images of manuscripts. Some of these institutions are also conducting spectral imaging studies of manuscripts with advanced collection and digital processing to reveal erased information – such as the earliest copies of Archimedes diagrams and treatises – without damaging the upper layer of text and artwork. These technologies yield large collections of quality digital images for access and study, but the data that becomes the digital counterpart must be effectively stored, managed and preserved to be truly useful for study. Integrating complex sets of digital images and hosting them on the Web for global users poses a complex set of challenges.

TRANSCRIPT

Page 1: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Pen to Pixel: Pen to Pixel: Bringing Appropriate Technologies to Bringing Appropriate Technologies to

Digital Manuscript Philology Digital Manuscript Philology

On behalf of the Walters Art Museum Digitization Team, especially:

Lynley Herbert, Lynley Herbert, Ariel Tabritha, Diane Bockrath, Kimber Wiegand,Ariel Tabritha, Diane Bockrath, Kimber Wiegand,

Doug EmeryDoug Emery

Supported by the US National Endowment for the HumanitiesSupported by the US National Endowment for the Humanities

Michael B. TothMichael B. TothR. B. Toth Associates

rbtoth.comhttp://www.thedigitalwalters.org/

Page 2: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Walters Art Museum

W.562, 2b Koran

9th century AH / 15th CE

Walters Art Museum, Baltimore, Maryland

Digital Imaging System

Page 3: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

St. Catherine’s Monastery, Sinai

Spectral Imaging System

Page 4: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

US Library of Congress

Spectral Imaging System

Page 5: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Advanced Digitization

Page 6: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Applied Science & Technology

Page 7: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

…to Manuscript Studies

Page 8: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Manuscript Studies 20th Century and prior

Page 9: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Manuscript Studies 21st Century

Page 10: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Obscured Information

Page 11: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Illuminated Manuscripts

Page 12: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

• Complex, Changing Technical Climate

• Range of Digital Products & Formats

• Need for Integrity of Entire Data Set

• Demand for Continual & Faster Access

• User Repurposing of Content

• Restrictions on Access and Use

Digital Manuscript Challenges

“…an ultimate challenge to creators and users of digital tools wishing to produce useful and reliable digital counter-parts to these medieval sources of knowledge and testimonies of intellectual creativity.”

Page 13: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Simplicity of Data

1. Access to data• By People

• By Machines

2. Licensing • Global Storage &

Access

Page 14: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Walters Online Manuscripts

Page 15: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

The Digital Waltershttp://www.thedigitalwalters.org/

Page 16: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Islamic Manuscripts of the Walters Art Museum: A Digital Resource

(2008 to 2011)

Page 17: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology
Page 18: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Parchment to Pixel: Creating a Digital Resource of

Medieval Manuscripts(2010 to 2012)

Page 19: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology
Page 20: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

The Digital Walters

  Islamic Parchment to Pixel

Total

No. of Manuscripts

172 107 279

No. of TEI Descriptions

170 37 207

Distinct Images 46,857 34,084 80,941

Image Files 187,266 134,698 321,964

Data Size 5.99 TB 4.09 TB 10.08 TB

Over 10 Terabytes of Data

. . . and growing!

. . . and growing!

Page 21: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Data & Metadata

• Long-term data set viability beyond the lifetime of current technologies– Adherence to existing broadly accepted

standards– Simple, flat metadata records

• Integration of metadata with images, supporting data and scholarly products

Page 22: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Cataloging & Metadata

• Metadata Integrated with Digital Object– Adherence to broadly accepted standards

– Simple, flat metadata records

• Persistent Identifiers• Accepted Standards

– Standardized Vocabularies

– Metadata Schema

– xml to support conversion to other formats (e.g. MARC, MODS, EAD)

• Documentation & Preserve Standards

Page 23: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Data Integrity

• Image• XML Metadata • TEI Catalog• License

Page 24: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

• Cataloging• Metadata • File Format • Imaging and Color• Resolution or Fidelity• Vocabulary and Geographic Names

• Foreign Language and English

• Intellectual Property• Storage• Quality and Quality Control• Others

Standardize

Page 25: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Preservation & Access

Owner of Archimedes Palimpsest:

• Preserve data in “flat files” – Do not tailor data for Web interfaces

• Host data on “spinning disks” – Did not want digital product to end up on media that

could become obsolete, with limited access

• Make broadly available on Internet– Do not place restrictions on use

Page 26: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Data Layout

ReadMe

TechnicalReadMe

DataData

SupplementalSupplemental

AccessWalters

Manuscripts

AccessOtherBooks

Page 27: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Digital Walters File Structure

Page 28: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology
Page 29: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology
Page 30: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology
Page 31: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology
Page 32: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology
Page 33: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

• Manuscript level: all information that applies the manuscript as a whole, including an abstract, physical dimensions and features of the manuscript, like size, extent, collation, and binding.

• Manuscript item level: all information that applies to the intellectual divisions of the book, including the titles of works, rubrics, incipits, colophons, layout information about the written surface.

• Manuscript piece level: all information for the items imaged (i.e., binding pieces, flyleaves, and folios), including item name, folio number, and, for illuminated pieces, detailed descriptions of the art work.

Cataloging Information

Page 34: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Dublin Core Metadata Initiative Element Set

Page 35: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Manuscript DCMI Elements

• Identifier: the shelf mark for manuscripts (e.g., W.582), and the image serial number for images (e.g., W582_000001)

• Creator: always the Walters Art Museum

• Contributor: one entry for each project participant responsible for the creation of the manuscript’s data set

• Date: the date of web page or image creation

• Title: the title of the manuscript (e.g, “Walters Ms. W.579, Prayer”)

• Description: a description of the manuscript or image

• Source: source of the object used to create the image or image collection

• Type: Image for individual images; Collection for all images of a manuscript

• Format: image/tiff for images, text/html for a manuscript web page

• Subject: keywords describing the manuscript or imaged folio

• Rights: license and usage terms

Page 36: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

License and use: UPDATED! 6 February 2013All Walters manuscript images and descriptions provided here are licensed for use under the Creative Commons Attribution-Share Alike 3.0 Unported License and the GNU Free Documentation License.You are free to download and use the images and descriptions on this website under the licenses named above. You do not need to apply to the Walters prior to using the images. We ask only that you cite the source of the images as the Walters Art Museum.Additionally, we request that a copy of any work created using these materials be sent to the Curator of Manuscripts and Rare Books at the Walters Art Museum, 600 N. Charles Street, Baltimore, MD 21201, [email protected] these terms mark a change from our previous license, which placed a noncommercial restriction on the use of these materials. The noncommercial restriction no longer applies, and this license supersedes the previously advertised license, and replaces that found in many of the archival TIFF image headers.This change follows the Walters Art Museum’s licensing policy. More information on the Walters’ intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.

License and use: UPDATED! 6 February 2013All Walters manuscript images and descriptions provided here are licensed for use under the Creative Commons Attribution-Share Alike 3.0 Unported License and the GNU Free Documentation License.You are free to download and use the images and descriptions on this website under the licenses named above. You do not need to apply to the Walters prior to using the images. We ask only that you cite the source of the images as the Walters Art Museum.Additionally, we request that a copy of any work created using these materials be sent to the Curator of Manuscripts and Rare Books at the Walters Art Museum, 600 N. Charles Street, Baltimore, MD 21201, [email protected] these terms mark a change from our previous license, which placed a noncommercial restriction on the use of these materials. The noncommercial restriction no longer applies, and this license supersedes the previously advertised license, and replaces that found in many of the archival TIFF image headers.This change follows the Walters Art Museum’s licensing policy. More information on the Walters’ intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.

Page 37: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

• /manuscript: top-level container of metadata for a manuscript’s images• /manuscript/image_object: description of the manuscript, primarily Dublin Core metadata,

with the number of images captured in the imageCount element• /manuscript/images: container for the manuscript’s image data• /manuscript/images/image: information about a single capture and its derivatives, including:

– /manuscript/images/image/index: the order of the image in the set, beginning with 0– /manuscript/images/image/image_subject: the folio number or name of the piece imaged

• /manuscript/images/image/capture: detailed information about the image’s capture extracted from the imaging software database

• /manuscript/images/image/masterDerivation: description of how the archival TIFF image was generated from the camera raw file, including cropping and color correction information

• /manuscript/images/image/jhoveData: XML output of the JHOVE utility run on the archival TIFF file

• /manuscript/images/image/derivative: three elements containing cropping and scaling information needed to generate the 300 PPI, SAP, and thumbnail files from the archival TIFF

Metadata xml Information

Page 38: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

/manuscript/manuscript

xml Model

/image_object/image_object

/manuscript/

images

/manuscript/

images

/image/image

/image/image

/image/image

/image/image

/capture/capture

/capture/capture

/capture/capture

/capture/capture

Page 39: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Preserve Standards

Page 40: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

• Transfer & archive digital data for research and analysis by the curatorial, scholarly, preservation and imaging communities

• Clear access procedures Ensuring data integrity for digital storage

repositories, Preventing introduction of mislabeled and

incorrect metadata

Standard Workflows for Data Management

Page 41: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Quality Control

• Data Quality– Automate data handling to avoid error– Audit trail for manual data manipulation

• Quality Management– Implement processes for quality review– Verification and Validation

• Documentation– Define metrics &

quality goals

Page 42: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

• Internal Digital Asset Management System– Internal Server

• Image Files• Catalog Data

• Access Infrastructure• Security• Backup

– Internet Systems Consortium

Data Management System

Page 43: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Dublin Core Metadata Initiative(DCMI)

Dublin Core Metadata Initiative(DCMI)

Digital Representation

e.g. TIFF Image

Digital Representation

e.g. TIFF Image

TEITEI

Johns Hopkins Metadata Application

Metadata(METS)

Metadata(METS)

PreservationMetadata: Implementation Strategies(PREMIS)

EventEvent

Metadata(METS)

Metadata(METS)

RequestEvent

Agent Agent

IDR Access Model

Digital Representation

e.g. TIFF Image

Digital Representation

e.g. TIFF Image

Page 44: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Preservation Heresy:

The Digital information is closer to the original than the Artifact itself

Preservation Heresy:

The Digital information is closer to the original than the Artifact itself

Preservation of the Data

<“I don’t use the parchment. The parchment is gone! As far as the scholars are concerned, there is no parchment. You only work from digital images on the laptop – that’s the only thing that matters for the reading.” – Dr. Reviel Netz, 14 Jan WYPR

“I don’t use the parchment. The parchment is gone! As far as the scholars are concerned, there is no parchment. You only work from digital images on the laptop – that’s the only thing that matters for the reading.” – Dr. Reviel Netz, 14 Jan WYPR

Page 45: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

What Will Happen to the Data?

“There’s a big technical issue that has me worried. The information on the Net is not all simple text. It’s structured, whether it’s Microsoft Word documents or PDFs. That means the information is only really accessible if you understand how to interpret the bits. What happens when files are there and we don’t know how to interpret them anymore? “If you have a CD but the form isn’t known anymore. I have 5 1/4-in. diskettes, but nothing to read them. Even 3 1/2-in. diskette readers are becoming hard to come by. The physical source media change. We may lose the ability to read them.”

Vint Cerf, Google Internet Evangelist, recipient of US Presidential Medal of

Freedom, and basic architecture of the Internet. July 30, 2007 (Computerworld)

“There’s a big technical issue that has me worried. The information on the Net is not all simple text. It’s structured, whether it’s Microsoft Word documents or PDFs. That means the information is only really accessible if you understand how to interpret the bits. What happens when files are there and we don’t know how to interpret them anymore? “If you have a CD but the form isn’t known anymore. I have 5 1/4-in. diskettes, but nothing to read them. Even 3 1/2-in. diskette readers are becoming hard to come by. The physical source media change. We may lose the ability to read them.”

Vint Cerf, Google Internet Evangelist, recipient of US Presidential Medal of

Freedom, and basic architecture of the Internet. July 30, 2007 (Computerworld)

Page 46: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Digital Preservation

Impermanence of Digitized Data • Dynamic technology, media and

formats• Rapid obsolescence• Regular reformatting required

• Ensure utility of data• Broad distribution to service providers • Standardized formats & encoding

Page 47: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

License

All artworks in the photographs are in public domain due to age. The photographs of two-dimensional objects are also in the public domain. Photographs of three-dimensional objects and all descriptions have been released under the Creative Commons Attribution-Share Alike 3.0 Unported License and the GNU Free Documentation License.

You are free to download and use the images and descriptions on this website under the licenses named above, but if you desire digital images at a higher resolution, for scholarly or commercial publication, please contact our photo services department.

Page 48: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Trusted Digital Repository

• Compliance with the Reference Model for an Open Archival Information System (OAIS)

• Administrative responsibility

• Organizational viability

• Financial sustainability

• Technological and procedural suitability

• System security

• Procedural accountability

Page 49: Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

Michael B. TothMichael B. TothR. B. Toth Associates

rbtoth.com

Future Opportunities