thoughts about computer science research in information-rich applications areas william y. arms

21
1 Thoughts about Computer Science Research in Information-rich Applications Areas William Y. Arms Cornell University March 14, 2000

Upload: tate-quinn

Post on 31-Dec-2015

24 views

Category:

Documents


2 download

DESCRIPTION

Thoughts about Computer Science Research in Information-rich Applications Areas William Y. Arms Cornell University March 14, 2000. Changes in Computer Science. Over 25 years, computer science has broadened From: a narrow range of academic topics To include: systems - PowerPoint PPT Presentation

TRANSCRIPT

1

Thoughts about

Computer Science Research in Information-rich Applications Areas

William Y. ArmsCornell University

March 14, 2000

2

Changes in Computer Science

Over 25 years, computer science has broadened

From: a narrow range of academic topics

To include:

• systems

• human computer interactions

• economic, legal, and social aspects

3

Computer Science Today

• Past achievements in computer science are a powerful force in the national prosperity.

• Universities have excellent students who have tremendous opportunities.

• An extensive body of theoretical and practical knowledge has accumulated.

• Exciting research can be found in every direction.

4

Approaches to Computer Science Research

Applications

Theory Experimentation

5

Computing and Information Science(Cornell)

Interdisciplinary partnerships:

• Computational biology, genomics, protein folding, etc.

• Computational science

• Computer graphics, architecture, design, film-making

• Digital libraries, information management

• Computational finance, economics

Computer science can contribute to each of these fields.

Each field can stimulate new research in computer science.

6

The University as a Test Bed

University tradition of innovation in computing:

• Time sharing (MIT, Dartmouth)

• Networks and distributed computing (Carnegie Mellon, MIT)

• Online information (Illinois, etc.)

• Wireless and nomadic computing (???)

Advantages:

• Tight feedback loop between researcher and user

• Innovation valued for its own sake

• Access to resources (equipment, people, money)

7

Research Partners

Academic research

Industrial R&D

Entrepreneurs

8

Example: Digital Libraries

In 1990, there were many experiments in building digital libraries:

• CORE (Bellcore, Cornell, OCLC) Lesk, et al.

• Gopher (Minnesota) Gopher team

• Mercury (Carnegie Mellon) Arms, et al.

• WAIS (Thinking Machines) Kahle, et al.

• World Wide Web (CERN) Berners-Lee, et al.

• Z 39.50 (Major libraries) Lynch, et al.

The leaders of all projects were either computer scientists or had spent most of their working life in state-of-the-art computing.

9

Foundations of the Web

Technology Ancestors

Internet ARPAnet/NSFnet, X.25, ISO

URL Domain Name System

HTML SGML, TeX, PostScript

HTTP TCP / FTP / Gopher, Z 39.50, SQL

MIME Email, ODA

Security None, SNA, Kerberos

Business model None, pay-by-use, subscription

10

Example: Web Search Engines

Lycos (Mauldin, Carnegie Mellon)

Technical basis:

• Research in text-skimming (Ph.D. thesis)• Pursuit free text retrieval engine (TREC)• Robot exclusion research (private interest)

Organizational basis:

• Center for Machine Translation• Grant flexibility (DARPA)

11

Example: Web Search Engines

Google (Page and Brin, Stanford)

Technical basis:

• Research in ranking hyperlinks (Ph.D. thesis)

Organizational basis:

• Grant flexibility (NSF Digital Libraries Initiative)• Equipment grant (Hewlett Packard)

12

The Internet Graph

Theoretical research in graph theory

• Six degrees of separation• Pareto distributions

Algorithms

• Hubs and authorities (Kleinberg, Cornell)

Empirical data

• Commercial (Yahoo!, Google, Alexa, AltaVista, Lycos)• Not-for-profit (Internet Archive)

13

The Limits of the Web

• The web has grown upon existing computer science knowledge.

• The strengths of that knowledge have enabled enormous growth.

• The limits of that knowledge have constrained the growth.

Al Demers

14

The Web: Limits to Growth -- Databases

Transaction processing databases: e.g, Amazon.com

The biggest online systems ever built, with many computers around the world.

Desirable features:

• No interruptions• No transactions ever lost• Secure from all intruders

In practice some transactions are lost; data is sometimes inconsistent. This is acceptable for selling books, but what about banking?

15

The Web: Limits to Growth -- Security

Why is security on the Internet so difficult?

1. Public key encryption invented in mid-1980s, yet widespread deployment remains elusive.

2. System security is riddled with loopholes

• operating system security developed when operating systems were simple monitors

• now operating systems are very complex and hence vulnerable

• language based security seeks for simpler interfaces to attach security

Fred Schneider

16

The Web: Limits to Growth -- Security

The Internet is based on stateless protocols

routing

http

Stateless protocols have allowed flexible growth, but inhibit certain controls

junk email

denial of service attacks

Can we quantify the trade-off?

17

Priorities

Function

Schedule Cost

academic research

industry

18

Priorities: Andrew File System

Carnegie Mellon Industry

Microsoft (2000)

IBM (1989)

Campus file system (1985)

Coda research

19

Two Fears

Two fears for digital libraries:

• Librarians will ignore the expertise of computer science.

Two fears for X:

• Specialists in X will ignore the expertise of computer science.

• Computer scientists will ignore the insights of specialists in X.

• Computer scientists will ignore the insights of librarians.

20

Thoughts for the NSF

• Applications and computer science need to be side by side.

• Big projects appear to be more productive than small ones.

• Inter-disciplinary collaboration cannot be forced.

21

Thoughts about

Computer Science Research in Information-rich Applications Areas

William Y. ArmsCornell University

March 14, 2000