so you want to start a digital library? a presentation by tom, hugh, and noel

35
So you want to start a digital libra A presentation by Tom, Hugh, and Noel

Upload: elmer-mclaughlin

Post on 18-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

So you want to start a digital library?

A presentation by Tom, Hugh, and Noel

Digital Libraries in focusDigital Libraries in focus

• UC Berkeley Digital Library Project• The Perseus Project• The Digital Scriptorium• The William Blake Archive

Berkeley D-Lib OverviewBerkeley D-Lib Overview

• Very much a test bed: emphasis on developing technologies for the digital library, not so much focus on building a coherent, fully-functional library (so far…)

• technology-focused• Contents

Perseus ProjectPerseus Project

• “The Perseus Project is an evolving digital library of resources for the study of the ancient world and beyond. Collaborators initially formed the project to construct a large, heterogeneous collection of materials, textual and visual, on the Archaic and Classical Greek world…Recent expansion into Latin texts and tools and Renaissance materials has served to add more coverage within Perseus and has prompted the project to explore new ways of presenting complex resources for electronic publication.”

• (inter)connection-focused• Starting Points

The Digital ScriptoriumThe Digital Scriptorium

• The Digital Scriptorium is basically the extension into cyberspace of Duke’s Rare Book, Manuscript, and Special Collections Library.

• collection-focused• Projects

The William Blake ArchiveThe William Blake Archive

• “..the Blake Archive was conceived as an international public resource that would provide unified access to major works of visual and literary art that are highly disparate, widely dispersed, and more and more often severely restricted as a result of their value, rarity, and extreme fragility.”

• (single) book-focused• Does one thing really well

• The texts

Components of a Digital LibraryComponents of a Digital Library

STORAGE MANAGEMENT DELIVERY

formatting

archiving

metadata/history

collections

search capabilities

browsing

user interaction

• must be standardized• need for multiple formats

• on-site, system-wide, both?

accessibility

• useful/meaningful metadata can increase usability

• arbitrary groupings of data

• multiple ways of accessing the data (entry-points)

• predefined structure to the data (cf. collections)

• ability to re-view the data

• users of differing physical and mental capabilities must have access to the library

• digital library should maintain detailed records of object history

Components of a Digital LibraryComponents of a Digital Library

STORAGE

MANAGEMENT

SERVER

CLIENT

MANAGEMENT

DELIVERY

DELIVERY SEARCHING

BROWSING

USING

BROWSING

SEARCHING

USING

UC Berkeley’s Digital Library ProjectUC Berkeley’s Digital Library Project

formatting

archiving

metadata/history

collections

search capabilities

browsing

user interaction

• represents a test-bed of info and archiving best-practices

accessibility

• metadata standards defined, some implemented

• addresses and implements multiple search techniques; results vary

• addresses and implements multiple searching techniques

• information-overkill

DELIVERYSTORAGE MANAGEMENT

• discrete, disconnected collections

• experimental tools in text, image, GIS, etc. (buggy)

• reliance on Java = not universally accessible

Informix Universal Server. Database backend.DBI. Perl module for web cgi access to databases.AMASS Storage software. From Emass/ADIC. "Transforms offline storage into direct access mass storage."Cheshire II Search Engine. In-house search engine project.

The Perseus ProjectThe Perseus Project

formatting

archiving

metadata/history

collections

search capabilities

browsing

user interaction

• standardized to the Web• only basic formats available

• further file formats retained

accessibility

• metadata embedded with texts, images

• multiple access points (via both texts and objects)

• offers numerous predefined collections

• easily navigable site• not approved by Bobby

DELIVERYSTORAGE MANAGEMENT

UNAVAILABLE TO THE PUBLIC.

Duke’s Digital ScriptoriumDuke’s Digital Scriptorium

formatting

archiving

metadata/history

collections

search capabilities

browsing

user interaction

• multiple Web-centric formats available

• masters not retained, only JPEG format used

accessibility

• metadata via SGML/HTML • basic metadata search capabilities (limited by SGML)

• offers useful predefined collections (“canned searches”)

• easily navigable site•Bobby-approved

DELIVERYSTORAGE MANAGEMENT

• includes history behind data

• discrete, disconnected collections

DynaWeb. From Inso. A tool that allows searches through structured SGML documents and translates from SGML to HTML on-the-fly.SGML. Using the Encoded Archival Description DTD.Webinator. From Thunderstone. Used to index the various static HTML pages in the Scriptorium. Also used to index the Duke Papyrus Collection.

The William Blake ArchiveThe William Blake Archive

formatting

archiving

metadata/history

collections

search capabilities

browsing

user interaction

• multiple, standard formats, most available from the site

• TIFF originals retained

accessibility

• metadata retained on every region of every image

• text and image-based searches (both based on metadata)

• the limited scope limits passive collection-browsing

• easily navigable site• not approved by Bobby

DELIVERYSTORAGE MANAGEMENT

• Works-in-Progress area allows for collaborative CM

• INote software allows for individual image markup

DynaWeb. SGML. Java Applets. (ImageSizer, INote)

Finding Things in the Digital LibraryFinding Things in the Digital Library

Analog Library

Catalog / keyword search

Browsing

Special collections (varied / unique finding aids)

~ ~ ~ ~

Digital Library

Metadata-based searches

Virtual collections (varied finding aids)

Content-based (exploitive) searches

Finding ExamplesFinding Examples

• Using metadata (Blake Images)• Browsing (Perseus Texts)• By collection (Digital Scriptorium)• Using content

– Berkeley Cheshire II (Documents)– Berkeley Cheshire II Tilebars (Documents)– Other media types (images, video, audio)

• Helping the user distinguish (or not)– Berkeley (what am I really searching against)– Perseus search tools (metadata-based with pointers to

content-based options)

Texts in Berkeley D-LibTexts in Berkeley D-Lib

• Multivalent documents– “Multivalent documents (MVD) represent an open,

extensible, network-centric document model.”• Enable high functionality for scanned page images. E.g., in a scanned

page image “enlivened’” by MVD, you can select and paste text, highlight matching search terms, and perform a variety of other manipulations, such as sorting a table in a scanned image.

• Support distributed annotations. With MVD, annotations of many sorts can be made by any user on any supported document type.

• Generate alternative views of components of documents. For example, MVD lenses allow a different view of a region of a screen. A magnification lens will magnify a region; an “OCR lens’” will show what an OCR process produces for that region.

• Alternative selection. Instead of just selecting text, you can chose to have the selection modified in particular ways.

The Digital ScriptoriumThe Digital Scriptorium

• Metadata:– EAD, which has 145 tags.– EAD is designed to describe hierarchical

collections. An EAD file contains components (<c></c>), which can contain other components nested within them (<c01><c02></c02></c01>).

An Example of EADAn Example of EAD<c03 level="item"><did><unitid id="SHE-156">156.</unitid><unitdate normal="16650404">4 April 17 Chas. II [1665]</unitdate><note><p><list><item>(1) <persname authfilenumber="957702">George Shepperd</persname> of the <geogname authfilenumber="NT0526">town and county of Newcastle upon Tine</geogname>, gent.</item><item>(2) <persname authfilenumber="23549">Anne Carr (n&eacute;e Franks)</persname> of <geogname authfilenumber="SS0032">South Sheiles</geogname> in the county of Durham, widow.</item></list> Lease by (1) to (2) of his half part of the messuage in <geogname uthfilenumber="PO0016">Pockerley</geogname> in the county of Durham with its <subject authfilenumber="c56">collieries and coalmines</subject>, and a fulling mill.<lb>Term: 1 month from <date normal=16650331">31 March 1665</date>.<lb>Consideration: &pound;10.<lb>Signed: (1 ). Seal: red wax, papered, on parchment tag.</p></note><physdesc><extent>Parchment. 1m.</extent></physdesc><unitloc loctype="container">114/5-1</unitloc></did><c04 level="item"><did><unitid id="SHE-156a">156. (a)</unitid><unitdate normal="16650414">14 April 17 Chas. II [1665]</unitdate><note><p>Attached to 156:<lb> Minutes of consultation with <persname authfilenumber="68239"> cousin Nan</persname> about above agreement.<lb> Refers to a book of surveys called <title render="italic">The Book of Pockerley</title>created in <date normal="162203xx">March 1622</date>.<lb>See <ref target="SHE-2056">no. 2056 below</ref> for letter containing description of this meeting.</p></note><physdesc><extent>Paper. 1f.</extent></physdesc><unitloc loctype="container">114/5-2</unitloc></did></c04></c03>

Rendered into plain text:Rendered into plain text:156. 4 April 17 Chas. II [1665](1) George Shepperd of the town and county of Newcastle upon Tine, gent.(2) Anne Carr (née Franks) of South Sheiles in the county of Durham, widow.Lease by (1) to (2) of his half part of the messuage in Pockerley in the county of Durham with its collieries and coalmines, and a fulling mill.Term: 1 month from 31 March 1665.Consideration: £10.Signed: (1 ). Seal: red wax, papered, on parchment tag.Parchment. 1m.[114/5-1]

156. (a) 14 April 17 Chas. II [1665] Attached to 156: Minutes of consultation with "cousin Nan" about above agreement. Refers to a book of surveys called The Book of Pockerley created in March 1622. See no. 2056 below for letter containing description of this meeting. Paper. 1f. [114/5-2]

Texts in the Blake ArchiveTexts in the Blake Archive

• Also, essentially, multivalent documents. – Though with much stricter bounds than the

Berkeley MVD’s.

• They, too, use SGML markup to describe their archive.

Texts in the Blake ArchiveTexts in the Blake Archive

<component type="figure" location="D"> <characteristic>shepherd</characteristic> <characteristic>male</characteristic> <characteristic>young</characteristic> <characteristic>short hair</characteristic> <characteristic>tights</characteristic> <characteristic>standing</characteristic> <characteristic>contrapposto</characteristic> <characteristic>looking</characteristic> <illusobjdesc> A young, short-haired male shepherd in tights stands in contrapposto, watching his grazing flock of sheep--perhaps looking at the sheep that lifts its head toward him. He holds a crook in his left hand; his purse is visible near his right knee. </illusobjdesc>

</component>

Possibilities for textsPossibilities for texts

• Full markup = very powerful finding/linking capabilities– Text Encoding Initiative (~400 tags!)– Perseus is an example of how fully marked-

up texts can be used.

What Else?What Else?

• Georeferences• Contextual finding/browsing• Intelligent full-text searching

ImageryImagery

ISSUES• storage

• management

• delivery (searching, browsing, interaction)

Imagery: Best-Practices ExampleImagery: Best-Practices Example

storage of multiple resolutions and TIFF originals

TIFF v. JPEG

The Blake Archive

Imagery: Best-Practices ExampleImagery: Best-Practices Example

metadata applied to images regionally

The Blake Archive

Imagery: Best-Practices ExampleImagery: Best-Practices Example

The Blake Archive

As a result, searching is improved.

It further allows for interactive programs like INote, a regional metadata assignment program, used by contributors (thusfar) to enhance this metadata store.

Imagery: Further Issues?Imagery: Further Issues?

The Perseus Project

While Perseus does archive larger image versions, the images that are accessible on the Web are useful only as peripheral learning aides, not learning tools in themselves.

Perseus has strong searching tools for text and has applied this paradigm to its imagery. This creates very powerful and useful metadata binding to the image object. But can we do more?

Unacceptablefor research.

Imagery: Interesting delivery?Imagery: Interesting delivery?

Image searching via pattern recognition.

Berkeley’s Blobworldhttp://elib.cs.berkeley.edu/photos/blobworld/

Geographic data in the digital libraryGeographic data in the digital library

Tools for using geodata

•Perseus Atlas

•Berkeley GIS viewer

Tools for searching with geodata or relating it to other objects

Bueller?Bueller?

Searching / relating geodataSearching / relating geodata

• Interactive map: select feature(s) by browsing or query, get access to “related” objects in the collection

• Pick a non-geodata object, use GIS & full-text searches in background to “lookup” potentially related objects (geodata and/or not)

• Plot features found in non-geodata source on an interactive map

Lessons learnedLessons learned

• What kind of digital library (libraries) do we want?– Repository & access for multiple more-or-less

discrete collections?– Cutting-edge test bed for cool DL technologies?– “Working library” to support a set of defined needs

(research, teaching, outreach)?– A set of tools, resources & expertise to allow units

and divisions to assemble one or more of the above?– Hybrid?

Lessons learnedLessons learned

• What kind of digital library (libraries) do we want?

• Clearly defined mission, capabilities, features and institutional home/support keys to successful implementation (Blake)

Lessons learnedLessons learned

• What kind of digital library (libraries) do we want?

• Clearly defined mission, capabilities, features and institutional home/support keys to successful implementation (Blake)

• The storage/management/delivery model will underpin whatever choices we make– How does each candidate vendor “solution” map

onto this model?– How much customization / interconnectedness /

extensibility is possible?

Lessons learnedLessons learned

• What kind of digital library (libraries) do we want?

• Clearly defined mission, capabilities, features and institutional home/support keys to successful implementation (Blake)

• The storage/management/delivery model will underpin whatever choices we make

• Mission before selection? Compromise on features inevitable but fraught with risk