william y. arms corporation for national research initiatives march 22, 1999 object models, overlay...

32
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

William Y. Arms

Corporation for National Research Initiatives

March 22, 1999

Object models, overlay journals, and virtual collections

William Y. Arms

Department of Computer Science, Cornell University

March 22, 1999

Object models, overlay journals, and virtual collections

William Y. Arms

Corporation for National Research Initiatives

March 22, 1999

Object models, overlay journals, and virtual collections

Physical and Logical Views of Information

Physical view:

Data structures, files, directories, servers

Publishers, libraries, web sites

Logical view:

Works, expressions, manifestations, items

Object models (document models)

Overlay journals

Virtual collections

What is Content?

Works, expressions, manifestations, items

Work

Work

The underlying abstraction.

Examples

• Homer's The Iliad.• Beethoven's Fifth Symphony.• The Unix operating system.

Expression

Expression

A work is realized through an expression.

Examples

• The Iliad was first expressed orally, then it was written down as a fixed sequence of words.

• Beethoven's Fifth Symphony can be expressed as a printed score or by any one of many performances.

• The Unix operating system has separate expressions as source code and machine code.

Works and Expressions

Works and Expression

Many works are realized through a single expression.

Examples

• The poem, The Road Not Taken by Robert Frost.

• The picture:

In such examples, there is no practical distinction between expression and work.

Manifestations

Manifestation

A expression is given form in one or more manifestations.

Examples

• The text of The Iliad has been manifest in numerous manuscripts and printed books.

• A musical performance can be distributed on CD, or broadcast on television.

• Software is manifest as files, which may be stored or transmitted in any digital medium.

Items

Item

When many copies are made of a manifestation, each is a separate item.

Examples

• A specific copy of a book.

• A copy of a computer file.

Object Models

Beyond Simple Documents

Many digital objects are more than static files of data.

Dynamic objects: What is presented to the user depends upon the execution of computer programs or other external activities.

Complex objects: Objects are made up from many inter-related elements.

Alternate disseminations: Digital objects may offer the user a choice of access methods.

Databases: A database comprises many alternative records, with different records selected each time the database is accessed.

Object Models and Structural Types

Web object

Digitized materials

Digitized image Set of digitized page images Marked-up text with page images Digitized audio recording

Sets

Set of digital objects Searchable set of digital objects

Web Object: File with URL & Data Type

Identifier

Data

Metadata

http://www.dlib.org/boats/swan56

jpg

Object Model: Digitized Image

Data

Several manifestations: thumbnail image reference image archival image

Metadata

Each manifestations may have its own metadata

Object Model: Digitized Image

Identifier

Data

Metadataarchive

jpg

hdl:loc.ndlp/amrlp.1234567

thumbnailgif

referencejpg

objectmetadata

Object Model: Set of Digitized Page Images

Data

Each page:

separate image

Metadata

Structure of work:

page sequence page numbers special pages

Object Model: Set of Digitized Page Images

Identifier

Data

Metadatapage 3

gif

hdl:loc.ndlp/amrlp.13579

page 1gif

page 2gifpage map

Page Map

• List of pages

• Numbers printed on pages

• Blocking of information on pages (columns, figures)

• Sequences of information across pages

A page map relates the page images to the structure of the information, e.g.:

A page map is metadata for a specific manifestation

Overlay Journalsand

Virtual Collections

Logical organization of physically separate works

The NSF SMETE Library

Soon, all scientific and engineering information will be available on-line:

• Journals, reports, papers, standards, patents

• Data sets, instruments, sensors

• Computer programs, simulations, designs

• Maps, images, films

• ... etc., etc., etc.

The Instructor's Wish List

To discover materials and services:• Good science

• Comprehensible to students -- effective for teaching

• Stable -- will not change or disappear

Access to collections and services that are provided by many independent organizations:

• No uniform catalog or index to everything

• Mixture of for-profit and open access information

The Instructor's Wish List

To discover materials and services:• Good science

• Comprehensible to students -- effective for teaching

• Stable -- will not change or disappear

Access to collections and services that are provided by many independent organizations:

• No uniform catalog or index to everything

• Mixture of for-profit and open access information

Conventional Journal

Contents Articles

Overlay Journal

Contents

Articles inRepository A

Articles inRepository B

Overlay Journals

Contents ofJournal I

Articles inRepository A

Articles inRepository B

Contents ofJournal II

Overlay Journals with Preprint Servers

Contents ofJournal I

ResearchWeb site

Preprint server

Contents ofJournal II

CoRRCornell CS Reports

NCSTRLUser CSTR

ACM

CoRR

D-Lib

SMETE Library: Physical Sites

SMETE Library: Virtual Collections

SMETE

Links show the members of the virtual collection

Metadata for Virtual Collections

Reference linking

Identifiers (URLs, URNs, ...) Citations and reverse citations

Information discovery

Cataloguing and indexing

Object models

Structural types Disseminators

Indexing and Cataloguing

Conventional cataloguing and indexing: Skilled professionals, following quality guidelines.

Web spiders and gatherers: Programs that gather information and build indexes (e.g., Infoseek, Harvest).

Meta-data in publishing: Addition of metadata by the creator to aid automatic indexing (e.g., Dublin Core).

Content extraction: Indexing using structured text, speech recognition, or image content.

The End

Physical view:

Data structures, files, directories, servers

Publishers, libraries, web sites

Logical view:

Works, expressions, manifestations, items

Object models (document models)

Overlay journals

Virtual collections