the national science digital library (nsdl) as an example of information science research

41
1 William Y. Arms Cornell University October 25, 2002 The National Science Digital Library (NSDL) as an Example of Information Science Research

Upload: ady

Post on 14-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

The National Science Digital Library (NSDL) as an Example of Information Science Research. William Y. Arms Cornell University October 25, 2002. Some Light Reading. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The National Science Digital Library  (NSDL) as an Example of Information Science Research

1

William Y. ArmsCornell University

October 25, 2002

The National Science Digital Library (NSDL) as an Example of Information

Science Research

Page 2: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2

Some Light Reading

William Y. Arms, "Economic models for open-access publishing." iMP, March 2000. http://www.cisp.org/imp/march_2000/03_00arms.htm

William Y. Arms, "Automated digital libraries." D-Lib Magazine, July/August 2000. http://www.dlib.org/dlib/july20/07contents.html

William Y. Arms, "What are the alternatives to peer review? Quality control in scholarly publishing on the web." Journal of Electronic Publishing, 8(1), August 2002. http://www.press.umich.edu/jep/08-01/arms.html

William Y. Arms, et al., "A Spectrum of Interoperability: The Site for Science Prototype for the NSDL."  D-Lib Magazine, 8(1), January 2002. http://www.dlib.org/dlib/january02/arms/01arms.html

Page 3: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3

A Scenario

A faculty member wished to find a paper for students to read in a class. He began by asking an expert. She suggested the original research paper as suitable.

Later, he typed a few terms into Google, browsed the hits, selected one that led to ResearchIndex, found the paper, and downloaded a PDF version from the author's web site.

Page 4: The National Science Digital Library  (NSDL) as an Example of Information Science Research

4

SocietyCognitiveStudies HCI

Viewpoints

Computer Science

Page 5: The National Science Digital Library  (NSDL) as an Example of Information Science Research

5

HCI: Eye Tracking

Page 6: The National Science Digital Library  (NSDL) as an Example of Information Science Research

6

Page 7: The National Science Digital Library  (NSDL) as an Example of Information Science Research

7

SocietyCognitiveStudies HCI

Computer Science

Applications

Information Science

Page 8: The National Science Digital Library  (NSDL) as an Example of Information Science Research

8

Open Access to Scientific, Scholarly and

Professional Information

Page 9: The National Science Digital Library  (NSDL) as an Example of Information Science Research

9

Before the Web

Access to Scientific, Medical, Legal Information

In the United States:

excellent if you belonged to a rich organization (e.g, a major university)

very poor otherwise (e.g., most K-12 schools)

In many countries of the world:

very poor for everybody

Page 10: The National Science Digital Library  (NSDL) as an Example of Information Science Research

10

Research Libraries are Expensive

library materials

buildings & facilities

staff

Page 11: The National Science Digital Library  (NSDL) as an Example of Information Science Research

11

Baumol's Cost Disease

Year

Price

1900 1950 2000

Bundle of goods and services

Labor-intensive services

Manufactured goods

2050

Page 12: The National Science Digital Library  (NSDL) as an Example of Information Science Research

12

Baumol's Cost Disease

Year

Price

1900 1950 2000

Bundle of goods and services

Labor-intensive services

Manufactured goods

2050

Moore's Law

Page 13: The National Science Digital Library  (NSDL) as an Example of Information Science Research

13

Brute Force Computing

Few people really understand Moore's Law

Computing power doubles every 18 monthsIncreases 100 times in 10 yearsIncreases 10,000 times in 20 years

Simple algorithms

plus

immense computing power

can outperform human intelligence

Page 14: The National Science Digital Library  (NSDL) as an Example of Information Science Research

14

Example: Catalogs and Indexes

Cost disease: catalogs and indexes

Catalog, index and abstracting records are very expensive when created by skilled professionals

Moore's Law: automatic indexing of full text

Retrieval effectiveness using automatic indexing can be at least as effective as manual indexing with controlled vocabularies

(Cleverdon 1967, reporting on experiments by Salton)

Page 15: The National Science Digital Library  (NSDL) as an Example of Information Science Research

15

Brute Force Computing:Substitutes for Human Intelligence

Automated algorithms for information discovery

Similarity of two documents

Vector space and statistical methods

(Salton, Sparc Jones, et al.)

Importance of digital object

Rank importance of web pages by analysis of the graph of web links

(Kleinberg, Page, et al.)

Page 16: The National Science Digital Library  (NSDL) as an Example of Information Science Research

16

Information Discovery:1992 and 2002

1992 2002

Content print digital

Computing expensive inexpensive

Choice of content selective comprehensive

Index creation human automatic

Frequency one time monthly

Vocabulary controlled not controlled

Query Boolean ranked retrieval

Users trained untrained

Page 17: The National Science Digital Library  (NSDL) as an Example of Information Science Research

17

Brute Force Computing: Automated Metadata Extraction

Informedia (Carnegie Mellon)

Automatic processing of segments of video, e.g., television news.

Algorithms for:

dividing raw video into discrete items

generating short summaries

indexing the sound track using speech recognition

recognizing faces

(Wactlar, et al.)

Page 18: The National Science Digital Library  (NSDL) as an Example of Information Science Research

18

Page 19: The National Science Digital Library  (NSDL) as an Example of Information Science Research

19

Simple algorithms

plus

immense computing power

plus

the intelligence of the user

can replace labor-intensive services

CognitiveStudies HCI

Brute Force Computing + Intelligence of the User

Computer Science

Page 20: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2020

The National Science Foundation'sNational Science Digital Library

(NSDL)

http://www.nsdl.org

Page 21: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2121

ScopeAll digital information relevant to any level of education in any branch of science.

Scientific and technical information

Materials used in education

Materials tailored toeducation

Page 22: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2222

All branches of science, all levels of education, very broadly defined:

Five year targets

1,000,000 different users

10,000,000 digital objects

10,000 to 100,000 independent sites

How Big might the NSDL be?

Page 23: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2323

... to provide a coherent set of collections and services across

great diversity

The Integration Task ...

Page 24: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2424

Resources

Integration team

Budget $4-6 million

Staff 25 - 30

Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

Page 25: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2525

It is possible to build a very large digital library with a small staff.

But ...

Every aspect of the library must be planned with scalability in mind.

Some compromises will be made.

Philosophy

Page 26: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2626

Example 1:

The Mortal behind the Portal

[This space left intentionally blank.]

Page 27: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2727

Example 2: Interoperability

The Problem

Conventional approaches require partners to support agreements (technical, content, and business)

But NSDL needs thousands of very different partners

... most of whom are not directly part of the NSDL program

The challenge is to create incentives for independent digital libraries to adopt agreements

Page 28: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2828

Function Versus Cost of Acceptance

Function

Cost of acceptance

Many adopters

Few adopters

Page 29: The National Science Digital Library  (NSDL) as an Example of Information Science Research

2929

Example: Textual Mark-up

Function

Cost of acceptance

SGML

ASCII

HTML

XML

Page 30: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3030

The Spectrum of Interoperability

Level Agreements Example

Federation Strict use of standards AACR, MARC(syntax, semantic, Z 39.50and business)

Harvesting Digital libraries expose Open Archivesmetadata; simple metadata harvesting

protocol and registry

Gathering Digital libraries do not Web crawlerscooperate; services must and search enginesseek out information

Page 31: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3131

Example 3: Searching

Basic Assumptions

The integration team will not manage any collections

The integration team will not create any metadata

Page 32: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3232

Effective Information RetrievalComprehensive metadata with Boolean retrieval (e.g.,

monograph catalog).

Can be excellent for well-understood categories of material, but requires expensive metadata, which is rarely available.

Full text indexing with ranked retrieval (e.g., news articles).

Excellent for relatively homogeneous material, but requires available full text.

Full text indexing with contextual information and ranked retrieval (e.g., Google).

Excellent for mixed textual information with rich structure.

Contextual information without non-textual materials and ranked retrieval (e.g., Google image retrieval).

Promising, but still experimental.

Page 33: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3333

Full Text or Metadata?

Full text indexing is excellent, but is not possible for all materials (non-textual, no access for indexing).

Comprehensive metadata is available for very few of the materials.

What Architecture to Use?

Few collections support an established search protocol (e.g., Z39.50).

The NSDL Search Service

Page 34: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3434

Broadcast Searching does not Scale

User interfaceserver

User

Collections

Page 35: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3535

Users

Collections

Metadata repository

The Metadata Repository

Services

The metadata repository is a resource for service providers.

It holds information about every collection and item known to the NSDL, including contextual information.

Page 36: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3636

The Metadata Repository as a Resource

Records are exposed through Open Archives Initiative protocol for metadata harvesting.

Core Integration team provides some services based on the metadata repository.

The architecture encourages others to build services.

Support for Service Providers

Page 37: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3737

Search Service

Portal

Portal

Portal

Search andDiscoveryServices Collections

SDLIP OAI

http

Metadata repository

James Allan, Bruce Croft (University of Massachusetts, Amherst)

Page 38: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3838

Where is the Center of the Universe?

NSDL

Alexandria

Elsevier

Informedia

Library of Congress

Joe's PicturesMath DL

Page 39: The National Science Digital Library  (NSDL) as an Example of Information Science Research

3939

Where is the Center of the Universe?

NSDL

British Library

Elsevier

OCLC

Library of Congress

Internet Archive

Harvard

Page 40: The National Science Digital Library  (NSDL) as an Example of Information Science Research

4040

Where is the Center of the Universe?

NSDL

email

Course web sites

News and weather

Bill Arms

Office

Technical documentation

Google

Directories

Page 41: The National Science Digital Library  (NSDL) as an Example of Information Science Research

4141

The NSDL is a program of the National Science Foundation's Directorate for Education and Human Resources, Division of Undergraduate Education.

The NSDL Core Integration is a collaboration between the University Center for Atmospheric Research (Dave Fulker), Columbia University (Kate Wittenberg) and Cornell University (Bill Arms). The Technical Director is Carl Lagoze (Cornell University).

Acknowledgement