digitization

Post on 29-Jun-2015

141 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Digitization

What is it?

• Digitization is the process of converting analog materials into a digital format that computer systems can understand and read

• In other words, materials become machine-readable

Are digitized materials the “real thing?”

• Digitized materials are representations of hands-on items.

What and how?

• written records, photographs, oral history tapes, films, material culture, pretty much most analog documents and artifacts

• digital data is only a sampling of the original data that is then encoded into the 1s and 0s that a computer understands. Information is translated into numerical values.

Cons?

• With digitization, data (information) is lost. – CDs versus vinyl– Text formatting, e.g. page layout, spacing,

handwritten information.

PROS

• Greater accessibility– More materials– More efficient search mechanisms

• Larger sets of information• Higher end technology gives us a clearer view

of some content

What difference does it make?

• Digitization transforms the way we research, present, and even preserve the past

• It transforms access to materials

Page Image

• Scanned or photographed printed page or microfilm

• Disadvantages:– not machine-readable, therefore not searchable.

You have to go through the pages one by one– They can be huge—slow to load, cumbersome to

navigate• Advantages: may more closely represent the

original materialhttps://www.flickr.com/photos/leeanncafferata/sets/72157626269585342/

Markup

• The digital version of the traditional copy editor

• In historical documents, TEI (Text Encoding Initiative) and XML (eXtensible Markup Language) markup is often used.

• Very simply, that’s a set of tags that describe the parts of a document. It is machine-readable.

This TEI/XML from the Folger

• <?xml version="1.0" encoding="utf-8"?>• <?xml-stylesheet type="text/xsl" href="fdt.xsl"?>• <TEI xmlns="http://www.tei-c.org/ns/1.0">• <teiHeader>• <fileDesc>• <titleStmt>• <title>As You Like It</title>• <author>William Shakespeare</author>• <editor xml:id="BAM">Barbara A. Mowat</editor>• <editor xml:id="PW">Paul Werstine</editor>

http://www.folgerdigitaltexts.org/?chapter=0

Creates this…

OCR (Optical Character Recognition)

• A system including an optical scanner that reads text and software that analyzes the scanned image

• The result: machine-readable, searchable, editable materials

Challenges

• Not too good at handwritten materials• Ability to read different languages and fonts

depends on sophistication of technology• Less likely to represent the origin—particularly

important in cases of historic documents, annotated texts, revised manuscripts

Some Examples

• Making of America: http://ebooks.library.cornell.edu/m/moa/

• Library of Congress: Chronicling America

• JSTOR: shows us the scanned image, searches OCR

• Google Books

top related