developing a successful digitization program · 1 an argument can be made that digitizing for...

23
Developing a Successful Digitization Program Suggested Best Practices and Guidelines Robert Suriano September 2010 (314) 961-7434 [email protected]

Upload: others

Post on 22-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

Developing a Successful Digitization Program Suggested Best Practices and Guidelines

Robert Suriano

September 2010

(314) 961-7434 [email protected]

Page 2: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

2 Robert Suriano| T: (314) 961-7434 E: [email protected]

Developing a Successful Digitization Program

Whether you are currently considering a digital project, or have completed one, this paper

should provide insight into how digitization projects can be developed in order to make them

successful in the long run.

The paper outlines the different steps in the development of a digital project. These steps

include determining project goals, key questions to ask, selection of materials to digitize,

resources to gather, the creation of workflow and procedures, how to maintain quality

control, making the materials accessible and useful, and measuring progress and success. In

addition, there is discussion of best practices that can bring a digitization effort from a

temporary project to a sustainable digital program.

Digitization projects are actually more complex than you initially may think. There are many

questions that need to be asked prior to even starting a project that should be answered in

order to produce a successful outcome. Among the considerations are first, and foremost:

What is the goal of the project? Creating digital images of materials in your collection can

be done to address preservation needs or make them more accessible to your users. You will

also find that not everything can, or should, be digitized. The materials you have, your

patrons, and your organizational structure and funding will determine the answers to how to

develop your project and make it successful.

Page 3: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 3

Digitization is defined as the conversion of analog materials (such as letters, photographs, books, and sound

recordings and moving images) into formats that are readily accessible in an online environment. Digitization

also refers to all of the steps involved during that process, and including material selection. Specifically,

these steps are:

1. Determination of project (and later, program) goals.

2. The selection of the collection, or materials.

3. Determination of imaging specifications.

4. The assembly of resources (including finances, staff, technical knowledge, equipment, and

organizational backing).

5. Creation of project workflows.

6. The actual scanning, or imaging, of the materials.

7. Quality control procedures (both during and following the process).

8. Metadata, and bibliographic control.

9. Making the images, and accompanying required metadata available.

10. Sustainability. That is, making the project into a program.

As noted above, even though you may have an idea of what materials you want to make available digitally, it

is best to determine what the actual goals of the project are, and this is more than just “putting some

interesting photographs, or old letters, online”. The setting of goals – which will be driven by a number of

factors – is central to a successful digitization project. And, in fact, if crafted well, will enable your

organization to transition the project into a long-running and successful program.

I found a very good list of key questions that can be used to flesh out the goals of the project from the

University of Georgia. The Digital Library of Georgia (http://dlg.galileo.usg.edu/) has been up and running

for almost ten years and makes available a wide range of materials centered on the people and history of that

state. It is a good example of a sustained, digitization program. Some of the key questions the developers at

the DLG have used in shaping their program include the following:

Why do you want to digitize materials?

Who is your audience?

Do you possess the materials?

Who is the copyright holder for the intellectual property contained in the materials?

What is your timeframe for the project?

How is the project being funded?

Who will be responsible for different stages of the project?

How will you digitize the materials?

How will you describe the materials and what metadata scheme will be used?

How will you provide access to the collection?

How will you preserve and maintain the collection and digitized materials?

From a project management perspective, I would add a couple of additional questions, namely:

Page 4: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

4 Robert Suriano| T: (314) 961-7434 E: [email protected]

Do you have sponsor (i.e., someone to provide institutional support and influence over the

organization)? (Institutional support is a key to long-term sustainability).

Are there potential obstacles to successful scanning, or placing images online. (This question leads to

addressing scheduling, as well as quality control issues).

Project Goals

Libraries, museums, historical societies, and any other organization that digitizes materials do so for two

reasons: to preserve collections, and to increase access to those materials. According to the Council of

Library and Information Resources and the Digital Library Federations, digital preservation is “universally

acclaimed as an effective tool of preventive preservation.” Digitization is a particularly useful, and cost-

effective, method for preserving (as well as making accessible) sound and moving images. Overlapping the

preservation aspect is making materials accessible (either to current users, or to future patrons). Obviously,

the digitization of fragile (as well as heavily-requested) items can both preserve and continue to make these

items available. Digital imaging can also increase and improve access by addressing high demand as well as

providing enhanced access. Digitization can also open up access to those items not currently on display, or

readily available because of space limitations or because of perceived low demand.

There are other goals to digitization, including a desire to develop collaborative partnerships with other

institutions (which may hold items or collections that would complement, or even supplement your own).

Partnerships allow the possibility of increased interest in your institution’s collection that lead to increased

patronage and possible revenues. By making collections available online you also open up the opportunities

to additional services and opportunities, such as educational awareness, and the possibility of leveraging your

expertise to other organizations.

Users and your target audience

Even before selecting materials to digitize, taking a look at who your intended audience will be is an

important consideration. Simply digitizing materials for the sake of digitizing can be a waste of time and

resources if no one ends up using them1. As with any decision to make a collection available to the public,

there are two sets of user groups to take a look at: your current users, and those future patrons who have

either never taken advantage of the materials you hold, or those who are not aware of your materials. (And

there is a difference between these latter two sub-categories).

Your current users are those who know you and your collections best. These are the folks who you can easily

tap into to weigh the potential for making materials available digitally. Are these users regular patrons? Do

they focus on any particular subject matter? Do you receive frequent inquiries or requests for information?

Are there certain types of materials (photographs, diaries, quotations, etc.) that are most frequently used? Are

there materials that seem to get attention at particular times of the year (such as when student term papers or

1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies)

could be put off until some future time. But, there are other methods of preservation which may be less expensive in the long-run than retaining digital copies. For purposes of this paper, we will consider digitization as the total process from selection to access.

Page 5: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 5

theses are being worked on)? High demand for some materials is a good indicator that their digitization may

be warranted, particularly when there are frequent requests for photocopies!

It is probably obvious that if you don’t have a steady flow of researchers or visitors, or even receive frequent

email or phone calls for information, that digitizing materials might increase awareness. But even for those

organizations that have nationally, or internationally, known collections, there are still people out there who

have never visited (because they live too far away), may not have current interest in what materials you hold,

or may still not know you exist. Digitization offers the opportunity to increase access to your collections

from these user groups. By digitizing, access to your collections allows distant access, and may entice some

to actually visit at some later date. Awareness can be stimulated, particularly by rare items that may never

be made available to the public because of preservation issues. And trends in research and interest may mean

that some items in your collection that have had little attention are suddenly of interest. In this latter case,

you might keep aware of what neighboring institutions are doing, or making available. An increase in interest

in regional history or a particular industry might be an opportunity for you to make materials available – by

taking advantage of the attention being received elsewhere. (Internet searches that highlight someone else’s

collection may also spot yours, as well!)

The identification of potential users may also provide insight into how materials should be presented and

accessed. Will the users want to be able to copy or print copies of the items? Will the users require the

highest quality images, or will lower quality suffice? The answers to these types of questions will help you

determine the best format as well as the optimum user interface to implement.

Material selection

I am sure that your institution currently has a collection development policy. This is actually a good place to

start. What materials do you currently collect and make available? Are there materials within your collection

that have restrictions on access? Are there materials too fragile for use? You can use this document to

develop a digitization policy. Harvard University developed a “decision-making matrix”2 for selecting

materials for digitization. Key questions posed include:

Does the material have sufficient intrinsic value to ensure interest in a digital product?

Will digitization significantly enhance access?

What goals might be met by digitization (including preservation, functionality, and cost saving)?

Does current technology yield image quality adequate to meet stated goals?

Are the costs of scanning and post-scan processing supportable?

Most digitizing organizations, though, use a similar set of criteria to select the materials for digitization. A

good example is that put together by the North Carolina ECHO program (http://www.ncecho.org/index.shtml)

which provides guidance for state programs. Their determining criteria can be summarized as the following

points:

2 The entire matrix is reproduced at the end of this paper, and can be found at:

http://www.clir.org/pubs/reports/hazen/matrix.html

Page 6: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

6 Robert Suriano| T: (314) 961-7434 E: [email protected]

The Audience (Who are they? Will the material be of interest? Will access be adequate?)

Impact on the Institution? (Is there sufficient funding and resources?)

Intellectual Control (Will access be better? Will the knowledge base of the staff be raised?)

Intellectual Property Rights (Is permission available to digitize the material?)

Preservation (Will digitization aid in preserving the materials?)

Technical Considerations (Will available equipment and software be sufficient to allow quality

reproduction and make access visually appealing?)

Value (Will the digital collection be unique? Will the collection have enduring value?)

The IFLA simplifies the criteria for material selection to the following:

Content – Does the intellectual value justify the required costs and resources?

Demand – Is there an audience for a digital presentation of the material (and, will they benefit by its

accessibility?)

Condition – Can the material undergo the process of digitization without risk of damage? Does the

collection have adequate descriptive data or cataloging essential for future access?

When you read through these various lists of criteria, it is important to look beyond the first-instinct thought

of textual material. The digitization of photographs, recorded sound, and moving image materials require

different models of conversion, particularly in terms of equipment and handling. The questions may be the

same, but the answers will need more complex analysis.

Specifications for Text and Graphical Materials. The basic scanning process is first to create a master image

and then, using this, create the access image (often termed a derivative)3. The master (or archival) image is

saved to long-term storage, while the derivative becomes the public access version. The most common

imaging formats are Tagged Image File Format (TIFF), Joint Photographic Experts Group (JPEG), Graphics

Interchange Format (GIF), Bit-Mapped (BMP), and JPEG-2000. Best practices by most institutions suggest

saving (master) digital images in an uncompressed file format, such as TIFF.

When determining the imaging specifications, there are five components to consider:

Resolution – determined by the number of pixels used to present the image. Resolution is typically

expressed in dots per inch (dpi) or pixels per inch (ppi). Increasing resolution will result in a

greater ability to delineate fine details, but will also result in larger file sizes.

Bit Depth – the measurement of the number of bits used to define each pixel. The greater the bit

depth used, the greater the number of gray and color tones that can be represented.4

3 Some experts advocate the creation of a second, or service master file, that is used to create subsequent derivative

files. The service master is saved in a readily accessible location for further use, while the original master file is saved to long-term storage. 4 In relation to bit depth, there are three types of scanning: bitonal, where one bit is used to represent black or white,

grayscale (which uses multiple bits to represent shades of gray), and color (where multiple bits per pixel are used to represent different colors). A setting of 8 bits/pixel results in 256 different shades of gray. A setting of 24 bits per pixel is called true color, and makes possible the selection of 17 million colors.

Page 7: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 7

Image Enhancement – the processes used to modify or improve image capture by changing size,

color, contrast, brightness, or to compare and analyze images for characteristics that the human eye

cannot perceive.

Compression – used to reduce file size for processing, storage, and transmission.

Image quality – the cumulative result of resolution, bit depth, enhancement and compression, as well

as the effects of the types of equipment, techniques, and skills of personnel involved.

The goal is to optimize image quality without sacrificing the ability to discern details, yet not maximizing the

resolution without creating a file so large that access is hindered. In addition, depending on the bit depth

selected, the resulting content will be sufficiently displayed or will be limited due to the number of colors

available.

As noted, both the type of equipment and the skill of the persons doing the scanning will affect the overall

image quality. Even thought different manufacturers will state the same imaging specifications, there are

differences between scanning equipment. Similarly, human interaction will play a role in the image quality.

Personnel will possess different skill levels, perceptions, and attentions to detail, no matter how well they are

trained or how simply the equipment can be operated. Many times I’ve had staff inadvertently miss a page of

a book being scanned, or not properly set the resolution or bit depth, even though they’d been doing the job

for years.

Best Practices. Most organizations (in the U.S.) follow the recommendations of the Library of Congress and

National Archives and Records Administration when setting imaging specifications. Resolution for text

images (those that do not include graphic content) is set at 300 dpi and a bit depth of 8-bit black and white

setting. If there are illustrations, maps, charts, or photographs included, the bit depth is modified to 8-bit

grayscale. If some level of color is included, the bit depth will be increased to 24-bit RGB color. And, using

NARA guidelines, the master (or archival) image is saved to a TIF format.

Material Image type Resolution Bit-depth

Text-based (books, pamphlets, etc.)

with little or no graphical content or

color

Master 300 ppi to 600 ppi (400 ppi

and up for OCR purposes)

1 bit bitonal B&W

Access 150 dpi to 300 dpi 1 bit bitonal B&W

Text-based (books, pamphlets, etc.)

with graphical and color content

Master 300 to 600 ppi (400 ppi and up

for OCR purposes)

8 bit grayscale

24 bit color*

Access 150 to 300 ppi 8 bit grayscale

24 bit color*

Photographs Master 300 to 800 ppi 8 bit grayscale

24 bit color*

Access 150 to 300 dpi 8 bit grayscale

24 bit color*

Maps Master 400 ppi 24 bit color

Access 150 to 300 dpi 24 bit color

Rare Books (objects of high

artifactual value)

Master

400 ppi minimum

24 bit color

*24-bit color to be used in cases where color is an important attribute of the document.

Page 8: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

8 Robert Suriano| T: (314) 961-7434 E: [email protected]

Specifications for Moving Image and Sound Materials

Best practices for film and recorded sound are less exact and any type of widespread standard has yet to be

put into use. Much of the efforts in digitization of these types of materials are with the goal of preservation.

Media for film and moving images has proven to become obsolete much quicker than text and other “analog”

materials. The best example is motion pictures made in the first half of the 20th century. The use of cellulose

nitrate film base decomposes over time, slowly turning to dust. Cellulose acetate film, which was developed

to replace nitrate film, has shown to suffer from vinegar syndrome, where the chemicals used in this type of

process degrade to acidic vinegar and the film becomes brittle and shrinks over time. Color film of much of

the 20th century has also been demonstrated to fade. Because of these issues, the conversion of film to digital

media has become an effort at saving these types of materials

The processes used to digitize film and sound also use what is known as sampling. The easiest way to

describe this is to picture a piece of music as a series of sound waves. Digital sampling changes the smooth

wave into a series of points along the wave that best describes the shape. Because sampling cannot totally

match the smoothness of the wave, some information is lost.

Unfortunately, there are no current standards that allow for a perfect digital copy of moving image or sound

materials that replicates the same picture and sound quality as the original. The National Archives as gone as

far as stating that the combination of changing digital formats and the lack of consensus for standards has

resulted in no National Archive determination for the conversion of audio-visual materials to digital. Several

years ago, the Bibliographic Center for Research (BCR) put together minimum recommendations for

digitizing audio recordings.

Sample Rage Bit Depth Archival File Format

Spoken language 44.1 kHz 16-bit WAV, AIF

Music 44.1 kHz (min.)

96 kHz (optimal)

16-bit WAV, AIF

This mirrors a 2001 technical study put together by the Library of Congress that recommended that the

digitization of audio files result in three copies: a master file (96 or 48 KHz at 24-bit), a service file of higher

fidelity at 44kHz and 16-bit, and an additional service file of lower fidelity (saved to MP3 format). The LOC

recommended that both the master and high-fidelity service file be saved to WAV format.

Resources

The equipment, staffing, and other requirements for your digitization project will depend on both the types of

materials that will be digitized, and the digitization specifications.

Equipment requirements will include computers and software, scanners, storage medium, and perhaps digital

cameras. Image file storage and processing can require significant memory, so you will want to consider

computers with an upper range of RAM, hard drive space, and processing speed. In addition, larger monitors

(over 21”) can be very helpful in viewing and processing the images. When selecting a scanner, you need to

consider the following:

Page 9: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 9

The physical dimensions of the source documents

The type(s) of media that will be scanned (transparent or reflective)

The range of details, tones and colors present in the documents

The physical condition of the documents.

Collections that include oversized documents (such as maps) will necessitate a flatbed scanner (which

normally has a 13x17 scanning surface). If you know you will be scanning pages from bound volumes

(books, diaries) that will not be able to be taken apart, then an open-book scanner will be required. If the

majority of your materials are 8x11 sized documents and are in fairly good shape, a production scanner

(similar to a photocopier) will suffice. If your collection includes three-dimensional objects, then a digital

camera will be required. Often, however, you will be faced with collections with materials of different sizes,

conditions, and requirements for color. The decision you will need to make is either to exclude segments of

the collection to reduce the resource requirements, or look at other options (such as partnerships with

institutions that may possess the necessary equipment, or outsourcing to a third-party to handle those

particular items).

There are a number of software applications available for processing image files. Adobe Photoshop is

considered the de facto standard and is recommended, particularly when dealing with photographs and

documents containing various types of colors and graphics. It is a relatively expensive piece of software, but

it has been demonstrated to produce high quality images. 5

Because you will need to retain the master and access image files it is recommended that these be saved to

off-line storage, such as CD-R or magnetic tape. Compact disc storage is now relatively inexpensive and it is

advisable to select the highest quality CD-R option. It is recommended that two copies be made, the second

for backup purposes. Compact discs should be saved in jewel cases rather than envelopes and labeled

appropriately. As the lifespan of CD-R technology is between five and ten years, plan on recopying all CD-R

after five years.

Equipment is only part of the resource requirements. You will still need skilled staff to handle the materials,

run the scanners, conduct post-imaging enhancements, and enter metadata and cataloging information. For a

new digitization project, new skills may be required and possibly additional staff. When planning your

project, you should allow time to train the staff in the necessary technologies and procedures. It is also

recommended that staff not directly associated with the scanning process be familiarized with the basic

theories and practices of digitization.

Supporting and making the above resources available, is the budgetary requirements. As you develop your

digitization plan, you have to take into account the costs associated with the equipment, supplies, and staff.

The costs you will encounter will fit into one of the following categories:

- Equipment and supplies

- Salaries, wages, and benefits

5 Use of Photoshop would obviously be overkill, however, for those projects digitizing solely textual documents with

little or no graphical content. The production of 8-bit black and white images would be sufficient and there are several software packages (such as TechSoft PixEdit) that would sufficient.

Page 10: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

10 Robert Suriano| T: (314) 961-7434 E: [email protected]

- Staff training

- Services, contracts, and legal fees

- Overhead and indirect costs (including offices and workspace)

- Maintenance, licenses, and communication charges

- Contingency

Note that it is likely that staff compensation will account for about 45 to 50 percent of the total project costs.

The National Archives and Records Administration estimates that approximately 30 percent of the project

costs will be digital conversion with the remainder (20 to 25 percent) of the costs being allocated to metadata

creation (including cataloging, description, and indexing). My experience is that it’s possible that the

metadata process can actually be larger (with the conversion costs being less) depending on the types of

materials being scanned.

When considering a digitization project, you will need to determine where to locate staff and equipment.

Most production scanners are no larger than a medium-size copy machine, but open book scanners can be

quite large, and require minimal overhead lighting to reduce glare. In addition to space for equipment, I’ve

found it beneficial to also have adequate work space for the items being scanned. A large table is great for

preparing materials. And having a nearby shelf-unit or cabinet can save time transporting the items to and fro

from where they usually reside. You will also want to make sure that the staff has space to accomplish their

tasks. In addition to space for a computer it is advisable to have enough work space to accommodate the

items they may be working on (for review purposes), as well as other materials such as finding aids or

reference materials that might be useful when doing metadata work or cataloging of the digitized materials.

Some key questions to consider for space requirements:

How many people will be working on the project?

Will the tasks associated (scanning, metadata and cataloging, web site posting, etc.) be done by

multiple people, or will one person handle more than one task?

Will you require different types of scanning equipment?

Will a photocopier be readily accessible?

Are there materials that will need special preparation before digitizing? (Photocopying, removal of

bindings, sorting and organizing, etc.)

Will materials need to be returned to archival storage or library shelving afterwards?

In some cases, particularly large or long-term projects, you may want to centrally locate staff and equipment

if possible.6

6 An interesting operation I once came across was that employed by Iron Mountain. They designated a space (actually

a large open room set inside a larger warehouse) for their digitization efforts that featured a product flow layout where each step of the imaging process led to the next. When materials were received for scanning, they were set in an open space next to a large table where someone would open the boxes, sort the materials and prepare the materials. Next to the prep table was a table next to the scanner where the materials were placed to await scanning. Next to this was a computer for reviewing the images. Finally, there was an area where the materials were re-boxed to be returned, or sent to long-term storage.

Page 11: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 11

Project Workflow and Scheduling

Once you have determined what materials are to be digitized, you will want to establish both a schedule and

workflow for the project. The basic steps in the actual imaging process are preparation, scanning,

enhancement, compression, and access.

You will want to document each step, whether in the form of a checklist or log, in order to ensure all steps

were completed. Frequent self-checks also encourage a sense of responsibility and ownership in the process

by the staff. I have found that the creation of a software application (such as a spreadsheet or in a database)

keeps the paper clutter to a minumum. Checklists, and other documentation, work best as reminders, while a

software application is best for actual use in tracking what has been completed, as well as providing a useful

method for recording productivity.

Prior to the start any digitization project you will want to run a few tests to determine the approximate time it

takes to complete the various steps, as well as the proper handling procedures. This will allow you to put

together fairly accurate scheduling assignments.

Preparation involves the activities needed to make both the materials and the equipment ready for scanning.

Archival materials in boxes and folders will need to be removed and ordered and readied for scanning. If

items are bound or stapled, these would be removed, if possible. Any items that you might consider

susceptible to damage from excess hanlding should either be excluded or photocopied.7 The equipment you

are using will also need to be readied, including cleaning any glass surfaces or parts that may interact with

the materials being scanned.

The scanning phase is the digital imaging of the materials. Prior to scanning, you will want to make sure that

all settings correspond to your specifications, in terms of resolution, bit depth, color, etc.

After the material has been scanned, the master image will be saved to long-term storage and a derivative

copy is created in the enhancement phase. In this step, the image file will be corrected for color, brightness,

contrast, and possibly “cleaned” of any imperfections that affect the image’s legibility. In addition, text

images that are designated to be searchable can be run through an optical character recognition (OCR)

application to create a readable text file of the image. (The alternative option is to manually transcribe the

document).

Before the image can be placed in the web server, you will probably have to reduce the size of the file to

make it easier to transmit and make more accessible. Compression techniques can reduce the quality of an

image. Such techniques can either be “less less” (a decompressed image will be identical to its earlier state

because no information was lost when the file was reduced) or “lossy” (which means significant information

is list due to the sampling methods undertaken by the compression application.

At this point, it is advisable to save a copy of the derivative image to long-term storage. This will allow you

to restore an access copy to your web server in the event something catastrophic occurs.

7 You will want to pay particular attention to any papers that are bent or torn. If you are using a production (or feed)

scanner, these can catch and jam the scanner and cause additional damage to the material.

Page 12: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

12 Robert Suriano| T: (314) 961-7434 E: [email protected]

scanner

Enhancement

preparation

compression

Web server

Long-term storage

specifications

resolutionbit-depth

selection

scanner cleanedmaterial tested for fragility and ability to undergo process

Image master

access image(derivative)

CD-Rmagnetic tape

modifications to size, color, contrast, brightnessand corrections for imperfections

criteria

intrinsic value?audience?enhanced access?benefit > cost?

PC

Imaging processing software (e.g., Adobe Photoshop)

Typical Imaging Process Workflow

Page 13: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 13

Finally, your materials can be transferred to the web server for access purposes. But that is not the final step.

Before images can be made public, descriptive information is required to allow users to find them. Metadata

and cataloging information is recorded and integrated into the user interface (online catalog, image content

management system, etc.) It has been argued that sometimes it is best just to throw materials up on a web site

with minimal metadata. In this manner, the argument goes, you generate a degree of critical mass for your

digital presence. You can go back later, then and add the necessary metadata when you have time. On the

other hand, as your project grows, you may find it more difficult (in terms of time and resouces) to go back

later and fill in the blanks. It may be better to take your time and do it now, rather than wait for another day.

Quality Control

One of the key aspects to digitization workflow is ensuring that each step of the process is done correctly.

There are numerous opportunities for the introduction of errors, either due to equipment, software, or staff.

Some the most common sources of errors I have encountered have been

Damage to materials from scanning equipment

Image resolution incorrectly set

Pages out of order

Metadata incorrectly entered, or missing

The key to reducing errors is to document each step of the process (which creates an audit trail), as well as

have written policies and procedures. Checklists and logs create a method to trace where an error might have

occurred (as well as provide a useful method for uncovering possible future sources of error), as well as

placing a level of responsibility and ownership on the person(s) doing the work. Written policies and

procedures provide a step-by-step outline of what is to be done and how they are to be completed.

In addition to checklists and instruction materials, it is important to also include systematic quality control

checks within the overall process. The procedure should include confirmations as to resolution settings,

filenames, and image quality (i.e., legibility and clarity). Even the most skilled and conscientious person is

prone to errors. An additional method for QC is having a second person look things over to help reduce

possible problems due to human error – the old axiom of not seeing the forest for the trees holds in

digitization, especially when you are repeating steps.

Finally, depending on the number of images being produced, looking at the images (either via sampling if the

number is large, or at the entire sample if it is a small collection) is a very good and trusted method of

making sure the scans are clear and legible. The same goes for metadata and cataloging entries.

Metadata

As I mentioned above, the process of defining digital images using metadata can be more costly than the

actual scanning process. This is particularly true for materials not text-based, such as photographs, pieces of

art, recorded sound, and three-dimensional objects. In these cases, the MARC record has limitations which

tend to be emphasized in the online environment. You will want to take advantage of the bibliographic

information and finding aids you have to try and lessen the amount of time and costs needed to make your

Page 14: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

14 Robert Suriano| T: (314) 961-7434 E: [email protected]

materials accessible on your web site. Even before digitizing, you should decide what metadata standard to

use. Fields should include how the image was digitized, its format, ownership and any copyright information.

The most important question to ask yourself as you put together the list of metadata fields is “will this help

someone find this object?” Depending on how the information will finally be presented, metadata can be

captured in a number of different ways, from a simple spreadsheet to advanced content management systems

and XML implementations. In the end, you will want to collect as much information as you can for each

item. For some items you will not know very much, but try at least to have a caption or description of what

the item is and how it relates to your collection.

Access

Ultimately, your digital collection is only as good as the ability of users to access the images. I have visited

many library web sites only to find it extremely difficult to actually locate the digital collection. It often takes

some digging around on these sites to find the link to the digital library page, which often is a shame because

the collection turns out to be quite good. If your organization is going to spend resources digitizing materials,

they should also make sure that the collection gets a prominent or obvious access point on the web site.

But linking to the digital collection web page is only half of the access issue. Providing a user friendly

interface is, actually, the more important facet. This is where you should probably be prepared to gather

information from current and possible future users of your collection. You want to create an interface that

addresses their most important criteria. They should be able to tell you not only what is important, but also

how they prefer to access your items. Some of the criteria you should be aware of include:

Searching – What search terms are important? How do users prefer to see results displayed? Do

users value searching individual collections, or do they want to search across different collections?

Printing – Should you allow printing of non-copyright materials? Watermarks might be a useful

solution to this question.

Display – How do users want to see items arranged? Should individual items be displayed initially as

thumbnails, or as larger images? Do users want the ability to zoom in on an image to see greater

detail?

Metadata – What information about an item do users value the most? What do they want to be able

to extract and use?

Announcements and Notices – Do users want to be updated as to new additions to the collection?

Another key area to investigate and address, is how the digital collection will be integrated into the

organization’s web site. Some sites use the same “look and feel” throughout all of their pages, while others

use different “themes” for different areas. You may also want to provide access to specific tools (such as

multi-media players or a download link to Adobe Acrobat Reader) if your images require specific software

add-ins to display adequately.

Sustainability

At some point, you will have to consider how to transition your digitization efforts from a project to a

program. Some projects will have a definitive ending (occurring after grant-funding has ended, for example).

Page 15: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 15

The “ending” for a project may occur after you have converted the most desirable, or requested items, or the

funding that you used to initiate the project will not support a long-running program.

Long-term support. There are a number of methods, or sources of funding, that can be considered to keep a

project running – or change its designation to a program. Obviously, the simplest is to have the project added

to the organization’s internal budget. Internal funding, however, requires firm backing by senior members

of the organization. If you already have a sponsor within the organization, then you at least have support for

going forward. If there isn’t someone high up in your organization who values what you are doing, then it

may be time to start cultivating that support. Even if creating a self-sufficient operation is not feasible,

having someone within upper management who can argue for the project can be important to its survival.

External sources of funding can come in different forms. The most common source is through grants. Most

grants, however, have a definitive time period associated with them, or have a fixed amount that can be

applied. Even these limitations can help build a foundation for a digitization program. Set amounts can fund

additional items to be converted, or provide for enhancements in the web site that can drive up visitors.

Another avenue for funding is taking advantage of the resources you have put together for your project.

Those resources include equipment and staff, and most importantly: the expertise that was developed for

your process. There are many organizations that would like to develop digital collections, but do not have

those resources available, because of budgetary limitations, lack of space or staffing. You may be able to

leverage your expertise to provide services for another institution that can lead to additional revenues or

additional materials for your collection. These partnerships are advantageous to both parties and can lead to

increased interest for both institutions. Similarly, a collaborative arrangement with another organization can

provide additional materials or expertise. An arrangement with another library or museum can result in

sharing or swapping of resources. One organization may have imaging equipment and staff (that the other

does not) while the other has an excellent web development unit (which the first currently lacks). Each

organization could benefit from an arrangement to trade one area of expertise for another.

The most common collaborative venture, though, is for one organization to allow its digital collection to be

presented on (or linked from) another’s web site. In this case, both institutions gain additional exposure that

can increase the number of patrons. The quantity and quality of both collections is increased, providing, of

course, that each collection is in some way related to the other. An example would be two museums

operating within the same geographical area, or two collections that possess works by the same artist.

Finally, and probably the most undesirable method of sustaining a digitization project, is to charge a user

fee. This may be considered for some collections in high demand, or segments of your collection that will

only be displayed temporarily. Some materials may have access restrictions (due to copyright or ownership

provisions) that would allow for the charging of a fee or subscription. But typically, fees can have the affect

of turning users away. You want to be increasing the number of users, not discouraging them.

Securing a stable monetary funding source will enable you to continue scanning and making items

accessible. This includes retaining staff, purchasing new equipment, and ensuring that the images you’ve

created remain viable.

Page 16: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

16 Robert Suriano| T: (314) 961-7434 E: [email protected]

Technology and staff. In the long run, sustaining a digital program also depends on keeping up with

technology. You will want to work with your IT staff in establishing both regular maintenance and planned

updates for software and equipment. Scanning equipment should be placed on a regular maintenance

schedule to make sure they operate efficiently for as long as possible. There are various options from signing

a service agreement with a vendor to doing regular cleaning yourself.8 Equipment should also be replaced at

some point, such as every 3-4 years for computers and up to 5-7 years for some scanners. Similarly, you will

want to make sure that your IT staff updates software and hardware associated with the web site. You should

also make sure that back-ups of your web site are regularly created so that any downtime can be minimized.

Finally, along these lines, the images you create – particularly the master files - should be archived, either to

CD/DVDs or magnetic tape.

Just as important to upgrading equipment and software, it is also important to make sure the people you

employ are kept up to date. As new software is rolled out or new techniques implemented, make sure that

your staff receives appropriate training. In addition, regular status meetings are important to keep staff

apprised of both the progress of the program as well as upcoming milestones and events (such as new

equipment or new materials that will be digitized).

Measuring Success

Once you have your project (or program) underway, you will eventually need to answer the question: Is it

successful? You will want to develop a set of goals or metrics to help you (and your management and other

stakeholders) answer this question. There are a number of different ways to approach this, including:

Items made available

Materials scanned per time

Number of users (web hits)

Number of downloads

Number of new members or subscribers

Amount of new donations

Number of requests or inquiries

You will want to measure both the progress of the digitization process and how the digital collection is being

accessed and used. Included in the former are counting the number of items scanned and/or posted to the web

site. You will want to make sure that when you record the number of items you are taking into account

characteristics of the items that affect the time and effort it takes to digitize it – such things as its condition,

the type of material, and the descriptive information that is required to catalog it. You will also want to

determine the minimum time period to evaluate these counts. If you are digitizing rather complex items that

can take weeks or even months to convert, you might want to consider looking at the number of items per

8 In a program I worked on, we determined we could clean the scanners and replace cartridges ourselves at a cheaper

cost than renewing the initial maintenance agreement with the vendor. The downside to this was that if anything more crucial than cleaning was required, a service call by the vendor would need to have been arranged, and likely would have been expensive. But our determination was that the odds of something breaking was remote and the scanners would be replaced before that would occur.

Page 17: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 17

month. If you are digitizing a collection of relatively simple items, such as typed manuscripts or photographs,

then you might want to look at items per day or hour.

Measuring the success of your efforts can be done through a variety of metrics. The simplest method is to

count the number of web hits or downloads from the digital collection web site. You should expect to see big

increases at the outset as the collection is introduced. And you will hope to see smaller increases as

additional items are added. Overall contribution of the project can be measured by looking at the total

number hits your organization’s web site receives after your digital collection goes live.

Similarly, you can look at the number of new patrons, members or subscribers to your organization. If there

is a significant increase after you have started digitizing, then this could be an indicator that your collection is

a success to the organization.

In any event, you want to try and find out what your users like about your digital collection and what they

would like to see. You also want to find out from new patrons or subscribers if they have accessed your

collection. Gathering feedback and doing user surveys is important in keeping your digitization fresh and

successful.

Summary

Digitizing materials is a complex process. There are many questions and issues to be addressed before even

turning on the scanner (or purchasing it, for that matter). Determining the goal and involving your users are

keys to success, as is properly assessing the requirements and resources available.

In determining the goals of your project, you do not want to forget to evaluate the needs of your current and

potential users. Insight into what users want, what they will do with the materials, and how they prefer to

access the materials are keys to creating a viable and successful online presence.

This paper included summaries of a lot of information about developing a successful digital project. If you

are new to digitization, there should be enough here to help you answer the important questions, as well as

prompt many more questions that will lead you to success. If your organization has already initiated a

project, or has been digitizing for some time, I think there are points here that can help you make your

projects and programs even more successful.

Page 18: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

18 Robert Suriano| T: (314) 961-7434 E: [email protected]

APPENDIX A: Some common terms used in digitization

AIF (Audio Interchange File format ): an audio file format standard used for storing sound data for

personal computers and other electronic audio devices. Co-developed by Apple Computer in 1988. The file

extension for the format is .aif (or .aiff). There is also a compressed variant of AIF knowns as AIFF-C or

AIFC.

Bit Depth: a computer graphics term describing the number of bits used to represent the color of a single

pixel in a bitmapped image or video frame buffer. Can be expressed per channel (8 bits, for example) or as

a total for all channels or more commonly in bits per pixel (bpp). Also known as color depth.

Bitonal: a mode of digital capture where one bit per pixel represents black and white. Bitonal imaging is

best suited for textual documents and books, with minimal to no colors or shading.

CMYK (Cyan-Magenta-Yellow-Black): a color model in which all colors are described as a mixture of

these four colors. It is the standard model used in offset printing for full-color documents.

Compression: the process used to compress digital signals to allow transmission within a much small

smaller bandwith. There are two methods of compression: lossy and lossless. A lossy method is a method

that designed to compress the file by selectively removing portions of the data, but that when

uncompressed the resulting file is different, but close enough to the original that any difference cannot be

detected by the human eye or ear. Lossless data compression allows the exact original data to be

reconstructed from the compressed data. While lossless compression will result in an exact duplicate of the

original, the compressed file size will be significantly larger than that created using a lossy algorithm.

Dots per inch (DPI): a measure of resolution used for printed text or images and monitor display.

Grayscale: a range of shades of gray in an image or the values represented between black and white.

JPEG (Joint Photographic Experts Group): a compression algorithm for condensing the size of image files.

This format allows for online access to full screen image files because they require less storage and are

therefore quicker to download into a web page.

JPEG 2000: a compression standard developed by the ISO JPEG committee to improve on the performance

of JPEG while adding significant new features and capabilities to enable new imaging applications.

Compared to TIFF files, JPEG 2000 compression can reduce a file size by a magnitude, or more.

MP3: an audio file format, based on MPEG technology. It creates very small files suitable for streaming or

downloading over the Internet.

MPEG (Motion Picture Expert Group): a set of standards for the compression of digital video and audio

data or a file of data compressed according to those standards.

OCR (Optical Character Recognition): the mechanical or electronic translation of text within a scanned

image into machine-encoded text. OCR makes it possible to edit the text, and search for words or phrases.

Page 19: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 19

Typically, the text produced through the OCR process resides alongside (or “behind”) the scanned image

and can be retrieved and searched by the user.

PDF (Portable Document Format): one of the most widely used formats to preserve electronic documents

and ensure long-term survival. This is primarily due to the open nature of its creator, Adobe, in allowing

for the format to be applied to almost any common operating environment or application.

Pixels per inch (PPI): the number of pixels captured in a given inch and used when discussing scanning

resolution and on-screen display. Increased PPI will result in higher quality images, but also increase the

file size.

Resolution: determines the quality of an image. It is described either by pixel dimensions (height and

width) for on-screen use or physical size and PPI. There is no perfect resolution standard. Resolution

should be adjusted based on size, quality, condition, and uses of the digital object.

RGB: refers to Red, Green , Blue, the colors output from a typical computer monitor. In terms of

digitization, RGB typically refers to a mode for capturing a digital image where multiple bits per pixel

represent color.

TIFF (Tagged Image File Format): the most frequently used file format for master images. It is a flexible

and highly portable open standard format. TIFF files may or may not use lossless compression, but due to

the typically large file sizes they are not suitable for web delivery.

WAV (Waveform Audio Format): an PC/Windows audio file format developed by IBM and Microsoft for

storing sound data. The file extention for the format is .wav.

Page 20: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

20 Robert Suriano| T: (314) 961-7434 E: [email protected]

APPENIX B: Selection for Digitizing, A Decision-Making Matrix © Harvard University, May 1997

(http://www.clir.org/pubs/reports/hazen/matrix.html as of 7/28/2010)

Page 21: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 21

APPENDIX C: Some Digital Collection Sites of note

Canadian War Museum: “Canada and the First World War” (http://www.warmuseum.ca/firstworldwar)

Digital Library of Georgia (http://dlg.galileo.usg.edu/?Welcome)

East Carolina University Joyner Library Digital Collections (http://digital.lib.ecu.edu/)

Historic Pittsburgh / University of Pittsburgh (http://digital.library.pitt.edu/pittsburgh/)

Kentuckiana Digital Library (http://kdl.kyvl.org/)

Mississippi Digital Library (http://www.msdiglib.org/)

State Library of North Carolina Digital Repository (http://digital.ncdcr.gov/cdm4/index.php)

University of Washington Digital Collections (http://content.lib.washington.edu/)

Utah Museum of Fine Arts Digital Collections (http://umfa.utah.edu/DigitalCollections)

World Digital Library (http://www.wdl.org/en/)

Windows on the Past, Cornell University (http://cdl.library.cornell.edu/)

Page 22: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

22 Robert Suriano| T: (314) 961-7434 E: [email protected]

APPENDIX D: Some additional sources of information

_____. “BCR’s CDP Digital Imaging Best Practices, Version 2.0”, BCR’s Colorado Digitization Project

Digital Imaging Best Practices Working Group, June 2008. (http://www.bcr.org/dps/cdp/best/digital-

imaging-bp.pdf as of 7/28/10).

_____. “Digital Library of Georgia Digitization Guide, Version 2.0”, University of Georgia Libraries,

September 2004. (http://dlg.galileo.usg.edu/AboutDLG/DigitizationGuide.html?Welcome&Welcome as

of 7/30/2010)

_____. “Good Practice Guide for Developers of Cultural Heritage Web Services”, UKOLN, April 2006.

(http://www.ukoln.ac.uk/interop-focus/gpg/print-all/ as of 7/28/10).

_____. “Guidelines for Digitization Projects for collections and holdings in the public domain, particularly

those held by libraries and archives” International Federation of Libraries and Archives, March 2002.

(https://www.cu.edu/digitallibrary/cudldigitizationbp.pdf as of 7/28/10).

_____. “Proposed Digital Imaging Standards and Best Practices”, Indiana State Library, February 8, 2007.

(http://www.in.gov/library/files/dig_imgst.pdf as of 7/28/10)

_____. “Technical Standards for Digital Conversion of Text and Graphic Materials” United States Library

of Congress, December 2006 (http://memory.loc.gov/ammem/about/techStandards.pdf as of 7/28/10).

_____. “Typical Elements for Use in a Statement of Work for the Digital Conversion of Sound Recordings

and Related Documents” United States Library of Congress, March 2001

(http://www.loc.gov/rr/mopic/avprot/audioSOW.html as of 7/28/2010).

_____. “University of Colorado Digital Library Digitization Best Practices, version 1.0”, University of

Colorado Digital Library, University of Colorado, Boulder, CO, August 2009.

(https://www.cu.edu/digitallibrary/cudldigitizationbp.pdf as of 7/28/10).

Casey, Mike and Gordon, Bruce. Sound Directions: Best Practices for Audio Preservation, (Harvard

University and Indiana University, 2007)

(http://www.dlib.indiana.edu/projects/sounddirections/papersPresent/sd_bp_07.pdf as of 7/28/10).

Jones, Trevor. “An Introduction to Digital Projects for Libraries, Museums and Archives”, Illinois

Digitization Institute, May 2001. (http://images.library.uiuc.edu/resources/introduction.htm as of 7/30/10).

Smith, Abby. “Strategies for Building Digitized Collections”, Digital Library Federations and Council of

Library and Information Resources, September 2001.

(http://www.clir.org/pubs/reports/pub101/contents.html as of 7/28/2010).

Page 23: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off

T: (314) 961-7434 E: [email protected] | Robert Suriano 23

Questions?

Robert Suriano

MLS, Archives and Records Administration

Tel.: 314-961-7434

Email: [email protected]