developing a successful digitization program · 1 an argument can be made that digitizing for...
TRANSCRIPT
![Page 1: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/1.jpg)
Developing a Successful Digitization Program Suggested Best Practices and Guidelines
Robert Suriano
September 2010
(314) 961-7434 [email protected]
![Page 2: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/2.jpg)
2 Robert Suriano| T: (314) 961-7434 E: [email protected]
Developing a Successful Digitization Program
Whether you are currently considering a digital project, or have completed one, this paper
should provide insight into how digitization projects can be developed in order to make them
successful in the long run.
The paper outlines the different steps in the development of a digital project. These steps
include determining project goals, key questions to ask, selection of materials to digitize,
resources to gather, the creation of workflow and procedures, how to maintain quality
control, making the materials accessible and useful, and measuring progress and success. In
addition, there is discussion of best practices that can bring a digitization effort from a
temporary project to a sustainable digital program.
Digitization projects are actually more complex than you initially may think. There are many
questions that need to be asked prior to even starting a project that should be answered in
order to produce a successful outcome. Among the considerations are first, and foremost:
What is the goal of the project? Creating digital images of materials in your collection can
be done to address preservation needs or make them more accessible to your users. You will
also find that not everything can, or should, be digitized. The materials you have, your
patrons, and your organizational structure and funding will determine the answers to how to
develop your project and make it successful.
![Page 3: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/3.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 3
Digitization is defined as the conversion of analog materials (such as letters, photographs, books, and sound
recordings and moving images) into formats that are readily accessible in an online environment. Digitization
also refers to all of the steps involved during that process, and including material selection. Specifically,
these steps are:
1. Determination of project (and later, program) goals.
2. The selection of the collection, or materials.
3. Determination of imaging specifications.
4. The assembly of resources (including finances, staff, technical knowledge, equipment, and
organizational backing).
5. Creation of project workflows.
6. The actual scanning, or imaging, of the materials.
7. Quality control procedures (both during and following the process).
8. Metadata, and bibliographic control.
9. Making the images, and accompanying required metadata available.
10. Sustainability. That is, making the project into a program.
As noted above, even though you may have an idea of what materials you want to make available digitally, it
is best to determine what the actual goals of the project are, and this is more than just “putting some
interesting photographs, or old letters, online”. The setting of goals – which will be driven by a number of
factors – is central to a successful digitization project. And, in fact, if crafted well, will enable your
organization to transition the project into a long-running and successful program.
I found a very good list of key questions that can be used to flesh out the goals of the project from the
University of Georgia. The Digital Library of Georgia (http://dlg.galileo.usg.edu/) has been up and running
for almost ten years and makes available a wide range of materials centered on the people and history of that
state. It is a good example of a sustained, digitization program. Some of the key questions the developers at
the DLG have used in shaping their program include the following:
Why do you want to digitize materials?
Who is your audience?
Do you possess the materials?
Who is the copyright holder for the intellectual property contained in the materials?
What is your timeframe for the project?
How is the project being funded?
Who will be responsible for different stages of the project?
How will you digitize the materials?
How will you describe the materials and what metadata scheme will be used?
How will you provide access to the collection?
How will you preserve and maintain the collection and digitized materials?
From a project management perspective, I would add a couple of additional questions, namely:
![Page 4: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/4.jpg)
4 Robert Suriano| T: (314) 961-7434 E: [email protected]
Do you have sponsor (i.e., someone to provide institutional support and influence over the
organization)? (Institutional support is a key to long-term sustainability).
Are there potential obstacles to successful scanning, or placing images online. (This question leads to
addressing scheduling, as well as quality control issues).
Project Goals
Libraries, museums, historical societies, and any other organization that digitizes materials do so for two
reasons: to preserve collections, and to increase access to those materials. According to the Council of
Library and Information Resources and the Digital Library Federations, digital preservation is “universally
acclaimed as an effective tool of preventive preservation.” Digitization is a particularly useful, and cost-
effective, method for preserving (as well as making accessible) sound and moving images. Overlapping the
preservation aspect is making materials accessible (either to current users, or to future patrons). Obviously,
the digitization of fragile (as well as heavily-requested) items can both preserve and continue to make these
items available. Digital imaging can also increase and improve access by addressing high demand as well as
providing enhanced access. Digitization can also open up access to those items not currently on display, or
readily available because of space limitations or because of perceived low demand.
There are other goals to digitization, including a desire to develop collaborative partnerships with other
institutions (which may hold items or collections that would complement, or even supplement your own).
Partnerships allow the possibility of increased interest in your institution’s collection that lead to increased
patronage and possible revenues. By making collections available online you also open up the opportunities
to additional services and opportunities, such as educational awareness, and the possibility of leveraging your
expertise to other organizations.
Users and your target audience
Even before selecting materials to digitize, taking a look at who your intended audience will be is an
important consideration. Simply digitizing materials for the sake of digitizing can be a waste of time and
resources if no one ends up using them1. As with any decision to make a collection available to the public,
there are two sets of user groups to take a look at: your current users, and those future patrons who have
either never taken advantage of the materials you hold, or those who are not aware of your materials. (And
there is a difference between these latter two sub-categories).
Your current users are those who know you and your collections best. These are the folks who you can easily
tap into to weigh the potential for making materials available digitally. Are these users regular patrons? Do
they focus on any particular subject matter? Do you receive frequent inquiries or requests for information?
Are there certain types of materials (photographs, diaries, quotations, etc.) that are most frequently used? Are
there materials that seem to get attention at particular times of the year (such as when student term papers or
1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies)
could be put off until some future time. But, there are other methods of preservation which may be less expensive in the long-run than retaining digital copies. For purposes of this paper, we will consider digitization as the total process from selection to access.
![Page 5: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/5.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 5
theses are being worked on)? High demand for some materials is a good indicator that their digitization may
be warranted, particularly when there are frequent requests for photocopies!
It is probably obvious that if you don’t have a steady flow of researchers or visitors, or even receive frequent
email or phone calls for information, that digitizing materials might increase awareness. But even for those
organizations that have nationally, or internationally, known collections, there are still people out there who
have never visited (because they live too far away), may not have current interest in what materials you hold,
or may still not know you exist. Digitization offers the opportunity to increase access to your collections
from these user groups. By digitizing, access to your collections allows distant access, and may entice some
to actually visit at some later date. Awareness can be stimulated, particularly by rare items that may never
be made available to the public because of preservation issues. And trends in research and interest may mean
that some items in your collection that have had little attention are suddenly of interest. In this latter case,
you might keep aware of what neighboring institutions are doing, or making available. An increase in interest
in regional history or a particular industry might be an opportunity for you to make materials available – by
taking advantage of the attention being received elsewhere. (Internet searches that highlight someone else’s
collection may also spot yours, as well!)
The identification of potential users may also provide insight into how materials should be presented and
accessed. Will the users want to be able to copy or print copies of the items? Will the users require the
highest quality images, or will lower quality suffice? The answers to these types of questions will help you
determine the best format as well as the optimum user interface to implement.
Material selection
I am sure that your institution currently has a collection development policy. This is actually a good place to
start. What materials do you currently collect and make available? Are there materials within your collection
that have restrictions on access? Are there materials too fragile for use? You can use this document to
develop a digitization policy. Harvard University developed a “decision-making matrix”2 for selecting
materials for digitization. Key questions posed include:
Does the material have sufficient intrinsic value to ensure interest in a digital product?
Will digitization significantly enhance access?
What goals might be met by digitization (including preservation, functionality, and cost saving)?
Does current technology yield image quality adequate to meet stated goals?
Are the costs of scanning and post-scan processing supportable?
Most digitizing organizations, though, use a similar set of criteria to select the materials for digitization. A
good example is that put together by the North Carolina ECHO program (http://www.ncecho.org/index.shtml)
which provides guidance for state programs. Their determining criteria can be summarized as the following
points:
2 The entire matrix is reproduced at the end of this paper, and can be found at:
http://www.clir.org/pubs/reports/hazen/matrix.html
![Page 6: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/6.jpg)
6 Robert Suriano| T: (314) 961-7434 E: [email protected]
The Audience (Who are they? Will the material be of interest? Will access be adequate?)
Impact on the Institution? (Is there sufficient funding and resources?)
Intellectual Control (Will access be better? Will the knowledge base of the staff be raised?)
Intellectual Property Rights (Is permission available to digitize the material?)
Preservation (Will digitization aid in preserving the materials?)
Technical Considerations (Will available equipment and software be sufficient to allow quality
reproduction and make access visually appealing?)
Value (Will the digital collection be unique? Will the collection have enduring value?)
The IFLA simplifies the criteria for material selection to the following:
Content – Does the intellectual value justify the required costs and resources?
Demand – Is there an audience for a digital presentation of the material (and, will they benefit by its
accessibility?)
Condition – Can the material undergo the process of digitization without risk of damage? Does the
collection have adequate descriptive data or cataloging essential for future access?
When you read through these various lists of criteria, it is important to look beyond the first-instinct thought
of textual material. The digitization of photographs, recorded sound, and moving image materials require
different models of conversion, particularly in terms of equipment and handling. The questions may be the
same, but the answers will need more complex analysis.
Specifications for Text and Graphical Materials. The basic scanning process is first to create a master image
and then, using this, create the access image (often termed a derivative)3. The master (or archival) image is
saved to long-term storage, while the derivative becomes the public access version. The most common
imaging formats are Tagged Image File Format (TIFF), Joint Photographic Experts Group (JPEG), Graphics
Interchange Format (GIF), Bit-Mapped (BMP), and JPEG-2000. Best practices by most institutions suggest
saving (master) digital images in an uncompressed file format, such as TIFF.
When determining the imaging specifications, there are five components to consider:
Resolution – determined by the number of pixels used to present the image. Resolution is typically
expressed in dots per inch (dpi) or pixels per inch (ppi). Increasing resolution will result in a
greater ability to delineate fine details, but will also result in larger file sizes.
Bit Depth – the measurement of the number of bits used to define each pixel. The greater the bit
depth used, the greater the number of gray and color tones that can be represented.4
3 Some experts advocate the creation of a second, or service master file, that is used to create subsequent derivative
files. The service master is saved in a readily accessible location for further use, while the original master file is saved to long-term storage. 4 In relation to bit depth, there are three types of scanning: bitonal, where one bit is used to represent black or white,
grayscale (which uses multiple bits to represent shades of gray), and color (where multiple bits per pixel are used to represent different colors). A setting of 8 bits/pixel results in 256 different shades of gray. A setting of 24 bits per pixel is called true color, and makes possible the selection of 17 million colors.
![Page 7: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/7.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 7
Image Enhancement – the processes used to modify or improve image capture by changing size,
color, contrast, brightness, or to compare and analyze images for characteristics that the human eye
cannot perceive.
Compression – used to reduce file size for processing, storage, and transmission.
Image quality – the cumulative result of resolution, bit depth, enhancement and compression, as well
as the effects of the types of equipment, techniques, and skills of personnel involved.
The goal is to optimize image quality without sacrificing the ability to discern details, yet not maximizing the
resolution without creating a file so large that access is hindered. In addition, depending on the bit depth
selected, the resulting content will be sufficiently displayed or will be limited due to the number of colors
available.
As noted, both the type of equipment and the skill of the persons doing the scanning will affect the overall
image quality. Even thought different manufacturers will state the same imaging specifications, there are
differences between scanning equipment. Similarly, human interaction will play a role in the image quality.
Personnel will possess different skill levels, perceptions, and attentions to detail, no matter how well they are
trained or how simply the equipment can be operated. Many times I’ve had staff inadvertently miss a page of
a book being scanned, or not properly set the resolution or bit depth, even though they’d been doing the job
for years.
Best Practices. Most organizations (in the U.S.) follow the recommendations of the Library of Congress and
National Archives and Records Administration when setting imaging specifications. Resolution for text
images (those that do not include graphic content) is set at 300 dpi and a bit depth of 8-bit black and white
setting. If there are illustrations, maps, charts, or photographs included, the bit depth is modified to 8-bit
grayscale. If some level of color is included, the bit depth will be increased to 24-bit RGB color. And, using
NARA guidelines, the master (or archival) image is saved to a TIF format.
Material Image type Resolution Bit-depth
Text-based (books, pamphlets, etc.)
with little or no graphical content or
color
Master 300 ppi to 600 ppi (400 ppi
and up for OCR purposes)
1 bit bitonal B&W
Access 150 dpi to 300 dpi 1 bit bitonal B&W
Text-based (books, pamphlets, etc.)
with graphical and color content
Master 300 to 600 ppi (400 ppi and up
for OCR purposes)
8 bit grayscale
24 bit color*
Access 150 to 300 ppi 8 bit grayscale
24 bit color*
Photographs Master 300 to 800 ppi 8 bit grayscale
24 bit color*
Access 150 to 300 dpi 8 bit grayscale
24 bit color*
Maps Master 400 ppi 24 bit color
Access 150 to 300 dpi 24 bit color
Rare Books (objects of high
artifactual value)
Master
400 ppi minimum
24 bit color
*24-bit color to be used in cases where color is an important attribute of the document.
![Page 8: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/8.jpg)
8 Robert Suriano| T: (314) 961-7434 E: [email protected]
Specifications for Moving Image and Sound Materials
Best practices for film and recorded sound are less exact and any type of widespread standard has yet to be
put into use. Much of the efforts in digitization of these types of materials are with the goal of preservation.
Media for film and moving images has proven to become obsolete much quicker than text and other “analog”
materials. The best example is motion pictures made in the first half of the 20th century. The use of cellulose
nitrate film base decomposes over time, slowly turning to dust. Cellulose acetate film, which was developed
to replace nitrate film, has shown to suffer from vinegar syndrome, where the chemicals used in this type of
process degrade to acidic vinegar and the film becomes brittle and shrinks over time. Color film of much of
the 20th century has also been demonstrated to fade. Because of these issues, the conversion of film to digital
media has become an effort at saving these types of materials
The processes used to digitize film and sound also use what is known as sampling. The easiest way to
describe this is to picture a piece of music as a series of sound waves. Digital sampling changes the smooth
wave into a series of points along the wave that best describes the shape. Because sampling cannot totally
match the smoothness of the wave, some information is lost.
Unfortunately, there are no current standards that allow for a perfect digital copy of moving image or sound
materials that replicates the same picture and sound quality as the original. The National Archives as gone as
far as stating that the combination of changing digital formats and the lack of consensus for standards has
resulted in no National Archive determination for the conversion of audio-visual materials to digital. Several
years ago, the Bibliographic Center for Research (BCR) put together minimum recommendations for
digitizing audio recordings.
Sample Rage Bit Depth Archival File Format
Spoken language 44.1 kHz 16-bit WAV, AIF
Music 44.1 kHz (min.)
96 kHz (optimal)
16-bit WAV, AIF
This mirrors a 2001 technical study put together by the Library of Congress that recommended that the
digitization of audio files result in three copies: a master file (96 or 48 KHz at 24-bit), a service file of higher
fidelity at 44kHz and 16-bit, and an additional service file of lower fidelity (saved to MP3 format). The LOC
recommended that both the master and high-fidelity service file be saved to WAV format.
Resources
The equipment, staffing, and other requirements for your digitization project will depend on both the types of
materials that will be digitized, and the digitization specifications.
Equipment requirements will include computers and software, scanners, storage medium, and perhaps digital
cameras. Image file storage and processing can require significant memory, so you will want to consider
computers with an upper range of RAM, hard drive space, and processing speed. In addition, larger monitors
(over 21”) can be very helpful in viewing and processing the images. When selecting a scanner, you need to
consider the following:
![Page 9: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/9.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 9
The physical dimensions of the source documents
The type(s) of media that will be scanned (transparent or reflective)
The range of details, tones and colors present in the documents
The physical condition of the documents.
Collections that include oversized documents (such as maps) will necessitate a flatbed scanner (which
normally has a 13x17 scanning surface). If you know you will be scanning pages from bound volumes
(books, diaries) that will not be able to be taken apart, then an open-book scanner will be required. If the
majority of your materials are 8x11 sized documents and are in fairly good shape, a production scanner
(similar to a photocopier) will suffice. If your collection includes three-dimensional objects, then a digital
camera will be required. Often, however, you will be faced with collections with materials of different sizes,
conditions, and requirements for color. The decision you will need to make is either to exclude segments of
the collection to reduce the resource requirements, or look at other options (such as partnerships with
institutions that may possess the necessary equipment, or outsourcing to a third-party to handle those
particular items).
There are a number of software applications available for processing image files. Adobe Photoshop is
considered the de facto standard and is recommended, particularly when dealing with photographs and
documents containing various types of colors and graphics. It is a relatively expensive piece of software, but
it has been demonstrated to produce high quality images. 5
Because you will need to retain the master and access image files it is recommended that these be saved to
off-line storage, such as CD-R or magnetic tape. Compact disc storage is now relatively inexpensive and it is
advisable to select the highest quality CD-R option. It is recommended that two copies be made, the second
for backup purposes. Compact discs should be saved in jewel cases rather than envelopes and labeled
appropriately. As the lifespan of CD-R technology is between five and ten years, plan on recopying all CD-R
after five years.
Equipment is only part of the resource requirements. You will still need skilled staff to handle the materials,
run the scanners, conduct post-imaging enhancements, and enter metadata and cataloging information. For a
new digitization project, new skills may be required and possibly additional staff. When planning your
project, you should allow time to train the staff in the necessary technologies and procedures. It is also
recommended that staff not directly associated with the scanning process be familiarized with the basic
theories and practices of digitization.
Supporting and making the above resources available, is the budgetary requirements. As you develop your
digitization plan, you have to take into account the costs associated with the equipment, supplies, and staff.
The costs you will encounter will fit into one of the following categories:
- Equipment and supplies
- Salaries, wages, and benefits
5 Use of Photoshop would obviously be overkill, however, for those projects digitizing solely textual documents with
little or no graphical content. The production of 8-bit black and white images would be sufficient and there are several software packages (such as TechSoft PixEdit) that would sufficient.
![Page 10: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/10.jpg)
10 Robert Suriano| T: (314) 961-7434 E: [email protected]
- Staff training
- Services, contracts, and legal fees
- Overhead and indirect costs (including offices and workspace)
- Maintenance, licenses, and communication charges
- Contingency
Note that it is likely that staff compensation will account for about 45 to 50 percent of the total project costs.
The National Archives and Records Administration estimates that approximately 30 percent of the project
costs will be digital conversion with the remainder (20 to 25 percent) of the costs being allocated to metadata
creation (including cataloging, description, and indexing). My experience is that it’s possible that the
metadata process can actually be larger (with the conversion costs being less) depending on the types of
materials being scanned.
When considering a digitization project, you will need to determine where to locate staff and equipment.
Most production scanners are no larger than a medium-size copy machine, but open book scanners can be
quite large, and require minimal overhead lighting to reduce glare. In addition to space for equipment, I’ve
found it beneficial to also have adequate work space for the items being scanned. A large table is great for
preparing materials. And having a nearby shelf-unit or cabinet can save time transporting the items to and fro
from where they usually reside. You will also want to make sure that the staff has space to accomplish their
tasks. In addition to space for a computer it is advisable to have enough work space to accommodate the
items they may be working on (for review purposes), as well as other materials such as finding aids or
reference materials that might be useful when doing metadata work or cataloging of the digitized materials.
Some key questions to consider for space requirements:
How many people will be working on the project?
Will the tasks associated (scanning, metadata and cataloging, web site posting, etc.) be done by
multiple people, or will one person handle more than one task?
Will you require different types of scanning equipment?
Will a photocopier be readily accessible?
Are there materials that will need special preparation before digitizing? (Photocopying, removal of
bindings, sorting and organizing, etc.)
Will materials need to be returned to archival storage or library shelving afterwards?
In some cases, particularly large or long-term projects, you may want to centrally locate staff and equipment
if possible.6
6 An interesting operation I once came across was that employed by Iron Mountain. They designated a space (actually
a large open room set inside a larger warehouse) for their digitization efforts that featured a product flow layout where each step of the imaging process led to the next. When materials were received for scanning, they were set in an open space next to a large table where someone would open the boxes, sort the materials and prepare the materials. Next to the prep table was a table next to the scanner where the materials were placed to await scanning. Next to this was a computer for reviewing the images. Finally, there was an area where the materials were re-boxed to be returned, or sent to long-term storage.
![Page 11: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/11.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 11
Project Workflow and Scheduling
Once you have determined what materials are to be digitized, you will want to establish both a schedule and
workflow for the project. The basic steps in the actual imaging process are preparation, scanning,
enhancement, compression, and access.
You will want to document each step, whether in the form of a checklist or log, in order to ensure all steps
were completed. Frequent self-checks also encourage a sense of responsibility and ownership in the process
by the staff. I have found that the creation of a software application (such as a spreadsheet or in a database)
keeps the paper clutter to a minumum. Checklists, and other documentation, work best as reminders, while a
software application is best for actual use in tracking what has been completed, as well as providing a useful
method for recording productivity.
Prior to the start any digitization project you will want to run a few tests to determine the approximate time it
takes to complete the various steps, as well as the proper handling procedures. This will allow you to put
together fairly accurate scheduling assignments.
Preparation involves the activities needed to make both the materials and the equipment ready for scanning.
Archival materials in boxes and folders will need to be removed and ordered and readied for scanning. If
items are bound or stapled, these would be removed, if possible. Any items that you might consider
susceptible to damage from excess hanlding should either be excluded or photocopied.7 The equipment you
are using will also need to be readied, including cleaning any glass surfaces or parts that may interact with
the materials being scanned.
The scanning phase is the digital imaging of the materials. Prior to scanning, you will want to make sure that
all settings correspond to your specifications, in terms of resolution, bit depth, color, etc.
After the material has been scanned, the master image will be saved to long-term storage and a derivative
copy is created in the enhancement phase. In this step, the image file will be corrected for color, brightness,
contrast, and possibly “cleaned” of any imperfections that affect the image’s legibility. In addition, text
images that are designated to be searchable can be run through an optical character recognition (OCR)
application to create a readable text file of the image. (The alternative option is to manually transcribe the
document).
Before the image can be placed in the web server, you will probably have to reduce the size of the file to
make it easier to transmit and make more accessible. Compression techniques can reduce the quality of an
image. Such techniques can either be “less less” (a decompressed image will be identical to its earlier state
because no information was lost when the file was reduced) or “lossy” (which means significant information
is list due to the sampling methods undertaken by the compression application.
At this point, it is advisable to save a copy of the derivative image to long-term storage. This will allow you
to restore an access copy to your web server in the event something catastrophic occurs.
7 You will want to pay particular attention to any papers that are bent or torn. If you are using a production (or feed)
scanner, these can catch and jam the scanner and cause additional damage to the material.
![Page 12: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/12.jpg)
12 Robert Suriano| T: (314) 961-7434 E: [email protected]
scanner
Enhancement
preparation
compression
Web server
Long-term storage
specifications
resolutionbit-depth
selection
scanner cleanedmaterial tested for fragility and ability to undergo process
Image master
access image(derivative)
CD-Rmagnetic tape
modifications to size, color, contrast, brightnessand corrections for imperfections
criteria
intrinsic value?audience?enhanced access?benefit > cost?
PC
Imaging processing software (e.g., Adobe Photoshop)
Typical Imaging Process Workflow
![Page 13: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/13.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 13
Finally, your materials can be transferred to the web server for access purposes. But that is not the final step.
Before images can be made public, descriptive information is required to allow users to find them. Metadata
and cataloging information is recorded and integrated into the user interface (online catalog, image content
management system, etc.) It has been argued that sometimes it is best just to throw materials up on a web site
with minimal metadata. In this manner, the argument goes, you generate a degree of critical mass for your
digital presence. You can go back later, then and add the necessary metadata when you have time. On the
other hand, as your project grows, you may find it more difficult (in terms of time and resouces) to go back
later and fill in the blanks. It may be better to take your time and do it now, rather than wait for another day.
Quality Control
One of the key aspects to digitization workflow is ensuring that each step of the process is done correctly.
There are numerous opportunities for the introduction of errors, either due to equipment, software, or staff.
Some the most common sources of errors I have encountered have been
Damage to materials from scanning equipment
Image resolution incorrectly set
Pages out of order
Metadata incorrectly entered, or missing
The key to reducing errors is to document each step of the process (which creates an audit trail), as well as
have written policies and procedures. Checklists and logs create a method to trace where an error might have
occurred (as well as provide a useful method for uncovering possible future sources of error), as well as
placing a level of responsibility and ownership on the person(s) doing the work. Written policies and
procedures provide a step-by-step outline of what is to be done and how they are to be completed.
In addition to checklists and instruction materials, it is important to also include systematic quality control
checks within the overall process. The procedure should include confirmations as to resolution settings,
filenames, and image quality (i.e., legibility and clarity). Even the most skilled and conscientious person is
prone to errors. An additional method for QC is having a second person look things over to help reduce
possible problems due to human error – the old axiom of not seeing the forest for the trees holds in
digitization, especially when you are repeating steps.
Finally, depending on the number of images being produced, looking at the images (either via sampling if the
number is large, or at the entire sample if it is a small collection) is a very good and trusted method of
making sure the scans are clear and legible. The same goes for metadata and cataloging entries.
Metadata
As I mentioned above, the process of defining digital images using metadata can be more costly than the
actual scanning process. This is particularly true for materials not text-based, such as photographs, pieces of
art, recorded sound, and three-dimensional objects. In these cases, the MARC record has limitations which
tend to be emphasized in the online environment. You will want to take advantage of the bibliographic
information and finding aids you have to try and lessen the amount of time and costs needed to make your
![Page 14: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/14.jpg)
14 Robert Suriano| T: (314) 961-7434 E: [email protected]
materials accessible on your web site. Even before digitizing, you should decide what metadata standard to
use. Fields should include how the image was digitized, its format, ownership and any copyright information.
The most important question to ask yourself as you put together the list of metadata fields is “will this help
someone find this object?” Depending on how the information will finally be presented, metadata can be
captured in a number of different ways, from a simple spreadsheet to advanced content management systems
and XML implementations. In the end, you will want to collect as much information as you can for each
item. For some items you will not know very much, but try at least to have a caption or description of what
the item is and how it relates to your collection.
Access
Ultimately, your digital collection is only as good as the ability of users to access the images. I have visited
many library web sites only to find it extremely difficult to actually locate the digital collection. It often takes
some digging around on these sites to find the link to the digital library page, which often is a shame because
the collection turns out to be quite good. If your organization is going to spend resources digitizing materials,
they should also make sure that the collection gets a prominent or obvious access point on the web site.
But linking to the digital collection web page is only half of the access issue. Providing a user friendly
interface is, actually, the more important facet. This is where you should probably be prepared to gather
information from current and possible future users of your collection. You want to create an interface that
addresses their most important criteria. They should be able to tell you not only what is important, but also
how they prefer to access your items. Some of the criteria you should be aware of include:
Searching – What search terms are important? How do users prefer to see results displayed? Do
users value searching individual collections, or do they want to search across different collections?
Printing – Should you allow printing of non-copyright materials? Watermarks might be a useful
solution to this question.
Display – How do users want to see items arranged? Should individual items be displayed initially as
thumbnails, or as larger images? Do users want the ability to zoom in on an image to see greater
detail?
Metadata – What information about an item do users value the most? What do they want to be able
to extract and use?
Announcements and Notices – Do users want to be updated as to new additions to the collection?
Another key area to investigate and address, is how the digital collection will be integrated into the
organization’s web site. Some sites use the same “look and feel” throughout all of their pages, while others
use different “themes” for different areas. You may also want to provide access to specific tools (such as
multi-media players or a download link to Adobe Acrobat Reader) if your images require specific software
add-ins to display adequately.
Sustainability
At some point, you will have to consider how to transition your digitization efforts from a project to a
program. Some projects will have a definitive ending (occurring after grant-funding has ended, for example).
![Page 15: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/15.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 15
The “ending” for a project may occur after you have converted the most desirable, or requested items, or the
funding that you used to initiate the project will not support a long-running program.
Long-term support. There are a number of methods, or sources of funding, that can be considered to keep a
project running – or change its designation to a program. Obviously, the simplest is to have the project added
to the organization’s internal budget. Internal funding, however, requires firm backing by senior members
of the organization. If you already have a sponsor within the organization, then you at least have support for
going forward. If there isn’t someone high up in your organization who values what you are doing, then it
may be time to start cultivating that support. Even if creating a self-sufficient operation is not feasible,
having someone within upper management who can argue for the project can be important to its survival.
External sources of funding can come in different forms. The most common source is through grants. Most
grants, however, have a definitive time period associated with them, or have a fixed amount that can be
applied. Even these limitations can help build a foundation for a digitization program. Set amounts can fund
additional items to be converted, or provide for enhancements in the web site that can drive up visitors.
Another avenue for funding is taking advantage of the resources you have put together for your project.
Those resources include equipment and staff, and most importantly: the expertise that was developed for
your process. There are many organizations that would like to develop digital collections, but do not have
those resources available, because of budgetary limitations, lack of space or staffing. You may be able to
leverage your expertise to provide services for another institution that can lead to additional revenues or
additional materials for your collection. These partnerships are advantageous to both parties and can lead to
increased interest for both institutions. Similarly, a collaborative arrangement with another organization can
provide additional materials or expertise. An arrangement with another library or museum can result in
sharing or swapping of resources. One organization may have imaging equipment and staff (that the other
does not) while the other has an excellent web development unit (which the first currently lacks). Each
organization could benefit from an arrangement to trade one area of expertise for another.
The most common collaborative venture, though, is for one organization to allow its digital collection to be
presented on (or linked from) another’s web site. In this case, both institutions gain additional exposure that
can increase the number of patrons. The quantity and quality of both collections is increased, providing, of
course, that each collection is in some way related to the other. An example would be two museums
operating within the same geographical area, or two collections that possess works by the same artist.
Finally, and probably the most undesirable method of sustaining a digitization project, is to charge a user
fee. This may be considered for some collections in high demand, or segments of your collection that will
only be displayed temporarily. Some materials may have access restrictions (due to copyright or ownership
provisions) that would allow for the charging of a fee or subscription. But typically, fees can have the affect
of turning users away. You want to be increasing the number of users, not discouraging them.
Securing a stable monetary funding source will enable you to continue scanning and making items
accessible. This includes retaining staff, purchasing new equipment, and ensuring that the images you’ve
created remain viable.
![Page 16: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/16.jpg)
16 Robert Suriano| T: (314) 961-7434 E: [email protected]
Technology and staff. In the long run, sustaining a digital program also depends on keeping up with
technology. You will want to work with your IT staff in establishing both regular maintenance and planned
updates for software and equipment. Scanning equipment should be placed on a regular maintenance
schedule to make sure they operate efficiently for as long as possible. There are various options from signing
a service agreement with a vendor to doing regular cleaning yourself.8 Equipment should also be replaced at
some point, such as every 3-4 years for computers and up to 5-7 years for some scanners. Similarly, you will
want to make sure that your IT staff updates software and hardware associated with the web site. You should
also make sure that back-ups of your web site are regularly created so that any downtime can be minimized.
Finally, along these lines, the images you create – particularly the master files - should be archived, either to
CD/DVDs or magnetic tape.
Just as important to upgrading equipment and software, it is also important to make sure the people you
employ are kept up to date. As new software is rolled out or new techniques implemented, make sure that
your staff receives appropriate training. In addition, regular status meetings are important to keep staff
apprised of both the progress of the program as well as upcoming milestones and events (such as new
equipment or new materials that will be digitized).
Measuring Success
Once you have your project (or program) underway, you will eventually need to answer the question: Is it
successful? You will want to develop a set of goals or metrics to help you (and your management and other
stakeholders) answer this question. There are a number of different ways to approach this, including:
Items made available
Materials scanned per time
Number of users (web hits)
Number of downloads
Number of new members or subscribers
Amount of new donations
Number of requests or inquiries
You will want to measure both the progress of the digitization process and how the digital collection is being
accessed and used. Included in the former are counting the number of items scanned and/or posted to the web
site. You will want to make sure that when you record the number of items you are taking into account
characteristics of the items that affect the time and effort it takes to digitize it – such things as its condition,
the type of material, and the descriptive information that is required to catalog it. You will also want to
determine the minimum time period to evaluate these counts. If you are digitizing rather complex items that
can take weeks or even months to convert, you might want to consider looking at the number of items per
8 In a program I worked on, we determined we could clean the scanners and replace cartridges ourselves at a cheaper
cost than renewing the initial maintenance agreement with the vendor. The downside to this was that if anything more crucial than cleaning was required, a service call by the vendor would need to have been arranged, and likely would have been expensive. But our determination was that the odds of something breaking was remote and the scanners would be replaced before that would occur.
![Page 17: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/17.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 17
month. If you are digitizing a collection of relatively simple items, such as typed manuscripts or photographs,
then you might want to look at items per day or hour.
Measuring the success of your efforts can be done through a variety of metrics. The simplest method is to
count the number of web hits or downloads from the digital collection web site. You should expect to see big
increases at the outset as the collection is introduced. And you will hope to see smaller increases as
additional items are added. Overall contribution of the project can be measured by looking at the total
number hits your organization’s web site receives after your digital collection goes live.
Similarly, you can look at the number of new patrons, members or subscribers to your organization. If there
is a significant increase after you have started digitizing, then this could be an indicator that your collection is
a success to the organization.
In any event, you want to try and find out what your users like about your digital collection and what they
would like to see. You also want to find out from new patrons or subscribers if they have accessed your
collection. Gathering feedback and doing user surveys is important in keeping your digitization fresh and
successful.
Summary
Digitizing materials is a complex process. There are many questions and issues to be addressed before even
turning on the scanner (or purchasing it, for that matter). Determining the goal and involving your users are
keys to success, as is properly assessing the requirements and resources available.
In determining the goals of your project, you do not want to forget to evaluate the needs of your current and
potential users. Insight into what users want, what they will do with the materials, and how they prefer to
access the materials are keys to creating a viable and successful online presence.
This paper included summaries of a lot of information about developing a successful digital project. If you
are new to digitization, there should be enough here to help you answer the important questions, as well as
prompt many more questions that will lead you to success. If your organization has already initiated a
project, or has been digitizing for some time, I think there are points here that can help you make your
projects and programs even more successful.
![Page 18: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/18.jpg)
18 Robert Suriano| T: (314) 961-7434 E: [email protected]
APPENDIX A: Some common terms used in digitization
AIF (Audio Interchange File format ): an audio file format standard used for storing sound data for
personal computers and other electronic audio devices. Co-developed by Apple Computer in 1988. The file
extension for the format is .aif (or .aiff). There is also a compressed variant of AIF knowns as AIFF-C or
AIFC.
Bit Depth: a computer graphics term describing the number of bits used to represent the color of a single
pixel in a bitmapped image or video frame buffer. Can be expressed per channel (8 bits, for example) or as
a total for all channels or more commonly in bits per pixel (bpp). Also known as color depth.
Bitonal: a mode of digital capture where one bit per pixel represents black and white. Bitonal imaging is
best suited for textual documents and books, with minimal to no colors or shading.
CMYK (Cyan-Magenta-Yellow-Black): a color model in which all colors are described as a mixture of
these four colors. It is the standard model used in offset printing for full-color documents.
Compression: the process used to compress digital signals to allow transmission within a much small
smaller bandwith. There are two methods of compression: lossy and lossless. A lossy method is a method
that designed to compress the file by selectively removing portions of the data, but that when
uncompressed the resulting file is different, but close enough to the original that any difference cannot be
detected by the human eye or ear. Lossless data compression allows the exact original data to be
reconstructed from the compressed data. While lossless compression will result in an exact duplicate of the
original, the compressed file size will be significantly larger than that created using a lossy algorithm.
Dots per inch (DPI): a measure of resolution used for printed text or images and monitor display.
Grayscale: a range of shades of gray in an image or the values represented between black and white.
JPEG (Joint Photographic Experts Group): a compression algorithm for condensing the size of image files.
This format allows for online access to full screen image files because they require less storage and are
therefore quicker to download into a web page.
JPEG 2000: a compression standard developed by the ISO JPEG committee to improve on the performance
of JPEG while adding significant new features and capabilities to enable new imaging applications.
Compared to TIFF files, JPEG 2000 compression can reduce a file size by a magnitude, or more.
MP3: an audio file format, based on MPEG technology. It creates very small files suitable for streaming or
downloading over the Internet.
MPEG (Motion Picture Expert Group): a set of standards for the compression of digital video and audio
data or a file of data compressed according to those standards.
OCR (Optical Character Recognition): the mechanical or electronic translation of text within a scanned
image into machine-encoded text. OCR makes it possible to edit the text, and search for words or phrases.
![Page 19: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/19.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 19
Typically, the text produced through the OCR process resides alongside (or “behind”) the scanned image
and can be retrieved and searched by the user.
PDF (Portable Document Format): one of the most widely used formats to preserve electronic documents
and ensure long-term survival. This is primarily due to the open nature of its creator, Adobe, in allowing
for the format to be applied to almost any common operating environment or application.
Pixels per inch (PPI): the number of pixels captured in a given inch and used when discussing scanning
resolution and on-screen display. Increased PPI will result in higher quality images, but also increase the
file size.
Resolution: determines the quality of an image. It is described either by pixel dimensions (height and
width) for on-screen use or physical size and PPI. There is no perfect resolution standard. Resolution
should be adjusted based on size, quality, condition, and uses of the digital object.
RGB: refers to Red, Green , Blue, the colors output from a typical computer monitor. In terms of
digitization, RGB typically refers to a mode for capturing a digital image where multiple bits per pixel
represent color.
TIFF (Tagged Image File Format): the most frequently used file format for master images. It is a flexible
and highly portable open standard format. TIFF files may or may not use lossless compression, but due to
the typically large file sizes they are not suitable for web delivery.
WAV (Waveform Audio Format): an PC/Windows audio file format developed by IBM and Microsoft for
storing sound data. The file extention for the format is .wav.
![Page 20: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/20.jpg)
20 Robert Suriano| T: (314) 961-7434 E: [email protected]
APPENIX B: Selection for Digitizing, A Decision-Making Matrix © Harvard University, May 1997
(http://www.clir.org/pubs/reports/hazen/matrix.html as of 7/28/2010)
![Page 21: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/21.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 21
APPENDIX C: Some Digital Collection Sites of note
Canadian War Museum: “Canada and the First World War” (http://www.warmuseum.ca/firstworldwar)
Digital Library of Georgia (http://dlg.galileo.usg.edu/?Welcome)
East Carolina University Joyner Library Digital Collections (http://digital.lib.ecu.edu/)
Historic Pittsburgh / University of Pittsburgh (http://digital.library.pitt.edu/pittsburgh/)
Kentuckiana Digital Library (http://kdl.kyvl.org/)
Mississippi Digital Library (http://www.msdiglib.org/)
State Library of North Carolina Digital Repository (http://digital.ncdcr.gov/cdm4/index.php)
University of Washington Digital Collections (http://content.lib.washington.edu/)
Utah Museum of Fine Arts Digital Collections (http://umfa.utah.edu/DigitalCollections)
World Digital Library (http://www.wdl.org/en/)
Windows on the Past, Cornell University (http://cdl.library.cornell.edu/)
![Page 22: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/22.jpg)
22 Robert Suriano| T: (314) 961-7434 E: [email protected]
APPENDIX D: Some additional sources of information
_____. “BCR’s CDP Digital Imaging Best Practices, Version 2.0”, BCR’s Colorado Digitization Project
Digital Imaging Best Practices Working Group, June 2008. (http://www.bcr.org/dps/cdp/best/digital-
imaging-bp.pdf as of 7/28/10).
_____. “Digital Library of Georgia Digitization Guide, Version 2.0”, University of Georgia Libraries,
September 2004. (http://dlg.galileo.usg.edu/AboutDLG/DigitizationGuide.html?Welcome&Welcome as
of 7/30/2010)
_____. “Good Practice Guide for Developers of Cultural Heritage Web Services”, UKOLN, April 2006.
(http://www.ukoln.ac.uk/interop-focus/gpg/print-all/ as of 7/28/10).
_____. “Guidelines for Digitization Projects for collections and holdings in the public domain, particularly
those held by libraries and archives” International Federation of Libraries and Archives, March 2002.
(https://www.cu.edu/digitallibrary/cudldigitizationbp.pdf as of 7/28/10).
_____. “Proposed Digital Imaging Standards and Best Practices”, Indiana State Library, February 8, 2007.
(http://www.in.gov/library/files/dig_imgst.pdf as of 7/28/10)
_____. “Technical Standards for Digital Conversion of Text and Graphic Materials” United States Library
of Congress, December 2006 (http://memory.loc.gov/ammem/about/techStandards.pdf as of 7/28/10).
_____. “Typical Elements for Use in a Statement of Work for the Digital Conversion of Sound Recordings
and Related Documents” United States Library of Congress, March 2001
(http://www.loc.gov/rr/mopic/avprot/audioSOW.html as of 7/28/2010).
_____. “University of Colorado Digital Library Digitization Best Practices, version 1.0”, University of
Colorado Digital Library, University of Colorado, Boulder, CO, August 2009.
(https://www.cu.edu/digitallibrary/cudldigitizationbp.pdf as of 7/28/10).
Casey, Mike and Gordon, Bruce. Sound Directions: Best Practices for Audio Preservation, (Harvard
University and Indiana University, 2007)
(http://www.dlib.indiana.edu/projects/sounddirections/papersPresent/sd_bp_07.pdf as of 7/28/10).
Jones, Trevor. “An Introduction to Digital Projects for Libraries, Museums and Archives”, Illinois
Digitization Institute, May 2001. (http://images.library.uiuc.edu/resources/introduction.htm as of 7/30/10).
Smith, Abby. “Strategies for Building Digitized Collections”, Digital Library Federations and Council of
Library and Information Resources, September 2001.
(http://www.clir.org/pubs/reports/pub101/contents.html as of 7/28/2010).
![Page 23: Developing a Successful Digitization Program · 1 An argument can be made that digitizing for preservation is reason enough. That access to these (digital copies) could be put off](https://reader030.vdocuments.us/reader030/viewer/2022041017/5ec9bb0eddf91c6ce73d23a0/html5/thumbnails/23.jpg)
T: (314) 961-7434 E: [email protected] | Robert Suriano 23
Questions?
Robert Suriano
MLS, Archives and Records Administration
Tel.: 314-961-7434
Email: [email protected]