digitization

14
Digitization

Upload: lee-cafferata

Post on 29-Jun-2015

141 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digitization

Digitization

Page 2: Digitization

What is it?

• Digitization is the process of converting analog materials into a digital format that computer systems can understand and read

• In other words, materials become machine-readable

Page 3: Digitization

Are digitized materials the “real thing?”

• Digitized materials are representations of hands-on items.

Page 4: Digitization

What and how?

• written records, photographs, oral history tapes, films, material culture, pretty much most analog documents and artifacts

• digital data is only a sampling of the original data that is then encoded into the 1s and 0s that a computer understands. Information is translated into numerical values.

Page 5: Digitization

Cons?

• With digitization, data (information) is lost. – CDs versus vinyl– Text formatting, e.g. page layout, spacing,

handwritten information.

Page 6: Digitization

PROS

• Greater accessibility– More materials– More efficient search mechanisms

• Larger sets of information• Higher end technology gives us a clearer view

of some content

Page 7: Digitization

What difference does it make?

• Digitization transforms the way we research, present, and even preserve the past

• It transforms access to materials

Page 8: Digitization

Page Image

• Scanned or photographed printed page or microfilm

• Disadvantages:– not machine-readable, therefore not searchable.

You have to go through the pages one by one– They can be huge—slow to load, cumbersome to

navigate• Advantages: may more closely represent the

original materialhttps://www.flickr.com/photos/leeanncafferata/sets/72157626269585342/

Page 9: Digitization

Markup

• The digital version of the traditional copy editor

• In historical documents, TEI (Text Encoding Initiative) and XML (eXtensible Markup Language) markup is often used.

• Very simply, that’s a set of tags that describe the parts of a document. It is machine-readable.

Page 10: Digitization

This TEI/XML from the Folger

• <?xml version="1.0" encoding="utf-8"?>• <?xml-stylesheet type="text/xsl" href="fdt.xsl"?>• <TEI xmlns="http://www.tei-c.org/ns/1.0">• <teiHeader>• <fileDesc>• <titleStmt>• <title>As You Like It</title>• <author>William Shakespeare</author>• <editor xml:id="BAM">Barbara A. Mowat</editor>• <editor xml:id="PW">Paul Werstine</editor>

http://www.folgerdigitaltexts.org/?chapter=0

Page 11: Digitization

Creates this…

Page 12: Digitization

OCR (Optical Character Recognition)

• A system including an optical scanner that reads text and software that analyzes the scanned image

• The result: machine-readable, searchable, editable materials

Page 13: Digitization

Challenges

• Not too good at handwritten materials• Ability to read different languages and fonts

depends on sophistication of technology• Less likely to represent the origin—particularly

important in cases of historic documents, annotated texts, revised manuscripts

Page 14: Digitization

Some Examples

• Making of America: http://ebooks.library.cornell.edu/m/moa/

• Library of Congress: Chronicling America

• JSTOR: shows us the scanned image, searches OCR

• Google Books