future of database systems 2: xml databases and grid-based digital libraries

62
IS 257 – Fall 2005 2005.11.22- SLIDE 1 Future of Database Systems 2: XML Databases and Grid-based Digital Libraries University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management

Upload: moral

Post on 08-Feb-2016

44 views

Category:

Documents


1 download

DESCRIPTION

Future of Database Systems 2: XML Databases and Grid-based Digital Libraries. University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management. Lecture Outline. Review Future of Database Systems XML and DBMS Grid-Based Digital Libraries - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 1

Future of Database Systems 2: XML Databases and Grid-based Digital

LibrariesUniversity of California, Berkeley

School of Information Management and Systems

SIMS 257: Database Management

Page 2: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 2

Lecture Outline• Review

– Future of Database Systems• XML and DBMS• Grid-Based Digital Libraries

– Data Grids– Grid-based IR

• DBMS and usability

Page 3: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 3

Lecture Outline• Review

– Future of Database Systems• XML and DBMS• Grid-Based Digital Libraries

– Data Grids– Grid-based IR

• DBMS and usability

Page 4: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 4

• Radio has no future, Heavier-than-air flying machines are impossible. X-rays will prove to be a hoax.– William Thompson (Lord Kelvin), 1899

Page 5: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 5

• This “Telephone” has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us.– Western Union, Internal Memo, 1876

Page 6: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 6

• I think there is a world market for maybe five computers– Thomas Watson, Chair of IBM, 1943

Page 7: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 7

• By the turn of this century, we will live in a paperless society.– Roger Smith, Chair of GM, 1986

Page 8: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 8

• I predict the internet… will go spectacularly supernova and in 1996 catastrophically collapse.– Bob Metcalfe (3-Com founder and inventor of

ethernet), 1995

Page 9: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 9

Accomplishments of DBMS Research

• DBMS are now used in almost every computing environment to create, organize and maintain large collections of information, and this is largely due to the results of the DBMS research community’s efforts, in particular:– Relational DBMS– Transaction management– Distributed DBMS

Page 10: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 10

Next Generation Database Systems

• Where are we going from here?– Hardware is getting faster and cheaper– DBMS technology continues to improve and

change• OODBMS• ORDBMS

– Bigger challenges for DBMS technology• Medicine, design, manufacturing, digital libraries,

sciences, environment, planning, etc...

Page 11: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 11

Examples• NASA EOSDIS

– Estimated 1016 Bytes (Exabyte)• Computer-Aided design• The Human Genome• Department Store tracking

– Mining non-transactional data (e.g. Scientific data, text data?)

• Insurance Company– Multimedia DBMS support

Page 12: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 12

New Features• New Data types• Rule Processing• New concepts and data models• Problems of Scale• Parallelism/Grid-based DB• Tertiary Storage vs Very Large-Scale Disk

Storage• Heterogeneous Databases• Memory Only DBMS

Page 13: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 13

Coming to a Database Near You…

• Browsibility• User-defined access methods• Security• Steering Long processes• Federated Databases• IR capabilities• XML• The Semantic Web(?)

Page 14: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 14

Some things to consider• Bandwidth will keep increasing and getting

cheaper (and go wireless)• Processing power will keep increasing

– Moore’s law: Number of circuits on the most advanced semiconductors doubling every 18 months

• Memory and Storage will keep getting cheaper (and probably smaller)– “Storage law”: Worldwide digital data storage capacity

has doubled every 9 months for the past decade• Put it all together and what do you have?

– “The ideal database machine would have a single infinitely fast processor with infinite memory with infinite bandwidth – and it would be infinitely cheap (free)” : David DeWitt and Jim Gray, 1992

Page 15: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 15

Lecture Outline• Review

– Future of Database Systems• XML and DBMS• Grid-Based Digital Libraries

– Data Grids– Grid-based IR

• DBMS and usability

Page 16: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 16

Standards: XML/SQL• As part of SQL3 an extension providing a

mapping from XML to DBMS is being created called XML/SQL

• The (draft) standard is very complex, but the ideas are actually pretty simple

• Suppose we have a table called EMPLOYEE that has columns EMPNO, FIRSTNAME, LASTNAME, BIRTHDATE, SALARY

Page 17: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 17

Standards: XML/SQL• That table can be mapped to:

<EMPLOYEE> <row><EMPNO>000020</EMPNO> <FIRSTNAME>John</FIRSTNAME> <LASTNAME>Smith</LASTNAME> <BIRTHDATE>1955-08-21</BIRTHDATE> <SALARY>52300.00</SALARY> </row>

<row> … etc. …

Page 18: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 18

Standards: XML/SQL• In addition the standard says that

XMLSchemas must be generated for each table, and also allows relations to be managed by nesting records from tables in the XML.

• Don’t know whether this has actually been implemented by anyone– There is actually something very similar in the

Cheshire II interface to RDBMS

Page 19: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 19

Lecture Outline• Review

– Future of Database Systems• XML and DBMS• Grid-Based Digital Libraries

– Data Grids– Grid-based IR

• DBMS and usability

Page 20: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 20

Grid-based Digital Libraries• So what’s this Grid thing anyhow?• Data Grids and Distributed Storage• Grid-Based IR• Grid-Based Digital Libraries

This lecture borrows heavily from presentations by Ian Foster (Argonne National Laboratory & University of Chicago), Reagan Moore and others from San Diego Supercomputer Center

Page 21: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 21

The Grid: On-Demand Access to Electricity

Time

Qua

lity,

eco

nom

ies

of s

cale

Source: Ian Foster

Page 22: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 22

By Analogy, A Computing Grid

• Decouples production and consumption– Enable on-demand access– Achieve economies of scale– Enhance consumer flexibility– Enable new devices

• On a variety of scales– Department– Campus– Enterprise– Internet

Source: Ian Foster

Page 23: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 23

Not Exactly a New Idea …• “The time-sharing computer system can

unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.”– Fernando Corbato and Robert Fano , 1966

• “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967

Source: Ian Foster

Page 24: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 24

But, Things are Different Now

• Networks are far faster (and cheaper)– Faster than computer backplanes

• “Computing” is very different than pre-Net– Our “computers” have already disintegrated– E-commerce increases size of demand peaks– Entirely new applications & social structures

• We’ve learned a few things about software

Source: Ian Foster

Page 25: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 25

Computing isn’t Really Like Electricity

• I import electricity but must export data• “Computing” is not interchangeable but highly

heterogeneous: data, sensors, services, …• This complicates things; but also means that the

sum can be greater than the parts – Real opportunity: Construct new capabilities

dynamically from distributed services• Raises three fundamental questions

– Can I really achieve economies of scale?– Can I achieve QoS across distributed services?– Can I identify apps that exploit synergies?

Source: Ian Foster

Page 26: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 26

Why the Grid?(1) Revolution in Science• Pre-Internet

– Theorize &/or experiment, aloneor in small teams; publish paper

• Post-Internet– Construct and mine large databases of

observational or simulation data– Develop simulations & analyses– Access specialized devices remotely– Exchange information within

distributed multidisciplinary teamsSource: Ian Foster

Page 27: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 27

Why the Grid?(2) Revolution in Business• Pre-Internet

– Central data processing facility• Post-Internet

– Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B)

– Business processes increasingly computing- & data-rich

– Outsourcing becomes feasible => service providers of various sorts

Source: Ian Foster

Page 28: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 28

New OpportunitiesDemand New Technology

“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”

Source: Ian Foster

Page 29: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 29

Building an Open Grid

Page 30: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 30

Building an Open Grid

OpenStandards

Page 31: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 31

Building an Open Grid

OpenStandards

OpenSource

Page 32: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 32

Building an Open Grid

OpenStandards

OpenSource Open

Infrastructure

Page 33: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 33

Building an Open Grid

OpenStandards

OpenSource Open

Infrastructure

OpenGrid

Page 34: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 34

Building an Open Grid

OpenStandards

OpenSource Open

Infrastructure

OpenGrid

Page 35: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 35

Grids and Open StandardsIn

crea

sed

func

tiona

lity,

stan

dard

izat

ion

Time

Customsolutions

Open GridServices Arch

GGF: OGSI, …(+ OASIS, W3C)

Multiple implementations,including Globus Toolkit

Web services

Globus Toolkit

Defacto standardsGGF: GridFTP, GSI

X.509,LDAP,FTP, …

App-specificServices

Page 36: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 36

Open Grid Services Architecture

• Service-oriented architecture– Key to virtualization, discovery, composition,

local-remote transparency • Leverage industry standards

– Internet, Web services• Distributed service management

– A “component model for Web services”• A framework for the definition of

composable, interoperable services“The Physiology of the Grid: An Open Grid Services Architecture for

Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002

Page 37: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 37

Realizing a Service-Oriented Architecture: How Do I

• Create, name, manage, discover services?• Render resources, data, sensors as services?• Negotiate service level agreements?• Express & negotiate policy?• Organize & manage service collections?• Establish identity, negotiate authentication?• Manage VO membership & communication?• Compose services efficiently?• Achieve interoperability?

Page 38: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 38

Web Services • XML-based distributed computing technology• Web service = a server process that exposes typed ports

to the network• Described by the Web Services Definition Language, an

XML document that contains– Type of message(s) the service understands & types of

responses & exceptions it returns– “Methods” bound together as “port types”– Port types bound to protocols as “ports”

• A WSDL document completely defines a service and how to access it

Page 39: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 39

Open Grid Services Infrastructure

Implementation

Servicedata

element

Other standard interfaces:factory,

notification,collections

Hosting environment/runtime(“C”, J2EE, .NET, …)

Servicedata

element

Servicedata

element

GridService(required)

Dataaccess

Lifetime management• Explicit destruction• Soft-state lifetime

Introspection:• What port types?• What policy?• What state?

Client

Grid ServiceHandle

Grid ServiceReference

handleresolution

Page 40: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 40

The Gridas Enabler of 21st Century Science• Entirely new approaches to enquiry based

on– Deep analysis of huge quantities of data– Interdisciplinary collaboration– Large-scale simulation– Smart instrumentation

• Enabled by an infrastructure that enables access to, and integration of, resources & services without regard for location

Page 41: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 41

Grid Infrastructure• Broadly deployed services in support of

fundamental collaborative activities– Formation & operation of virtual organizations– Authentication, authorization, discovery, …

• Services, software, and policies enabling on-demand access to critical resources– Computers, databases, networks, storage, software

services,…• Operational support for 24x7 availability• Integration with campus and commercial

infrastructures

Page 42: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 42

Tier0/1 facility

Tier2 facility

10 Gbps link

2.5 Gbps link

622 Mbps link

Other link

Tier3 facility

The Foundations are Being Laid

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

Soton

London

Belfast

DL

RAL Hinxton

Page 43: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 43

Data Grid Problem• “Enable a geographically distributed

community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of data”

• Note that this problem:– Is common to many areas of science– Overlaps strongly with other Grid problems

Page 44: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 44

Data Grids forHigh Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm ~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

Image courtesy Harvey Newman, Caltech

Page 45: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 45

Data Intensive Issues Include …• Harness [potentially large numbers of]

data, storage, network resources located in distinct administrative domains

• Respect local and global policies governing what can be used for what

• Schedule resources efficiently, again subject to local and global constraints

• Achieve high performance, with respect to both speed and reliability

• Catalog software and virtual data

Page 46: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 46

Data Intensive Computing and Grids

• The term “Data Grid” is often used– Implies a distinct infrastructure, which it isn’t; but easy

to say • Data-intensive computing shares numerous

requirements with collaboration, instrumentation, computation, …– Security, resource mgt, info services, etc.

• Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained

• Fortunately this seems easy to do!

Page 47: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 47

Examples ofDesired Data Grid Functionality• High-speed, reliable access to remote data• Automated discovery of “best” copy of data • Manage replication to improve performance• Co-schedule compute, storage, network• “Transparency” wrt delivered performance• Enforce access control on data• Allow representation of “global” resource

allocation policies

Page 48: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 48

A Model Architecture for Data Grids

Metadata Catalog

Replica Catalog

Tape Library

Disk Cache

Attribute Specification

Logical Collection and Logical File Name

Disk Array Disk Cache

ApplicationReplica Selection

Multiple Locations

NWS

SelectedReplica

GridFTP Control ChannelPerformanceInformation &Predictions

Replica Location 1 Replica Location 2 Replica Location 3

MDS

GridFTPDataChannel

Source: Arcot Rajasekar (SDSC)

Page 49: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 49

Data Grid Requirements• Seamless access to data and information stored

at local and remote sites• Virtualization of data, collection and meta information • Handle Dataset Scaling – size & number• Integrate Data Collections & Associated Metadata• Handle Multiplicity of Platforms,

Resource & Data Types• Handle Seamless Authentication• Handle Access Control • Provide Auditing Facilities• Handle Legacy Data & Methods

Source: Arcot Rajasekar (SDSC)

Page 50: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 50

SRB as a Solution

Application

SRB Server

Distributed Storage Resources(database systems, archival storage systems, file systems, ftp, http, …)

MCAT

HRM DB2, Oracle, Illustra, ObjectStore HPSS, ADSM, UniTree UNIX, NTFS, HTTP, FTP

• The Storage Resource Broker is a middleware• It virtualizes resource access• It mediates access to distributed heterogeneous resources• It uses a MetaCATalog to facilitate the brokering• It integrates data and metadata

Source: Arcot Rajasekar (SDSC)

Page 51: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 51

SRBArchives

HPSS, ADSM,UniTree, DMF

DatabasesDB2, Oracle,

Sybase

File SystemsUnix, NT,Mac OSX

Application

C, C++, Linux I/O

Unix Shell

Dublin Core

Resource,User

User Defined

ApplicationMeta-data

RemoteProxies

DataCutter

Third-partycopy

Java, NTBrowsers

WebPrologPython

MCAT

HRM

Source: Arcot Rajasekar (SDSC)

SDSC Storage Resource Broker & Meta-data Catalog

Page 52: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 52

SRBMaster

SRB agents

Application

MCAT

(port)

1

2 4

AuthenticationSecure Password,

GSI or SEA

Server spawned 3

Identification & Initialization

Session Established

(Host,port)

CA

3

Source: Arcot Rajasekar (SDSC)

SRB Single SignOn

Page 53: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 53

SRBserver

SRB agent

SRBserver

Federated SRB Operation

MCAT

Read Application

SRB agent

1

2

34

6

5

Logical NameOr

Attribute Condition

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer

Brokering

Server(s) SpawningData

Access

Parallel Data Access

R1R2

5/6

Source: Arcot Rajasekar (SDSC)

Page 54: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 54

SRB Concepts• Abstraction of User Space

– Single sign-on– Multiple authentication schemes

• certificates, (secure) passwords, tickets, group permissions, roles

• Virtualization of Resources– Resource Location, Type & Access transparency– Logical Resource Definitions - bundling

• Abstraction of Data and Collections– Virtual Collections: Persistent Identifier and Global Name Space– Replication & Segmentation

• Data Discovery – system & application metadata– User-defined Metadata – Structural & Descriptive– Attribute-based Access (path names become irrelevant)

• Uniform Access Methods– APIs, Command Line, GUI Browsers, Web-Access (Portal,WSDL, CGI)– Parallel Access with both Client and Server-driven strategies

Source: Arcot Rajasekar (SDSC)

Page 55: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 55

OceanStore:Everyone’s data, One big Utility

“The data is just out there”

• Separate information from location– Locality is an only an optimization (an important one!)– Wide-scale coding and replication for durability

• All information is globally identified– Unique identifiers are hashes over names & keys– Single uniform lookup interface replaces: DNS, server

location, data location– No centralized namespace required (such as SDSI)

OStore

Source: John Kubiatowicz (UCB)

Page 56: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 56

Basic Structure:Irregular Mesh of “Pools” OStore

Source: John Kubiatowicz (UCB)

Page 57: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 57

Amusing back of the envelope calculation• How many files in the OceanStore?

– Assume 1010 people in world– Say 10,000 files/person (very conservative?)– So 1014 files in OceanStore!

– If 1 gig files (not likely), get 1 mole of files!

• Truly impressive number of elements…• … but small relative to physical constants

– (courtesy Bill Bolotsky, Microsoft)

OStore

Source: John Kubiatowicz (UCB)

Page 58: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 58

Utility-based Infrastructure

• Service provided by confederation of companies– Monthly fee paid to one service provider– Companies buy and sell capacity from each other

Pac Bell

Sprint

IBMAT&T

CanadianOceanStore

IBM

OStore

Source: John Kubiatowicz (UCB)

Page 59: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 59

Lecture Outline• Review

– Future of Database Systems• Grid-Based Digital Libraries

– Data Grids– Grid-based IR

• DBMS and usability

Page 60: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 60

DBMS and Usability• What features would you like to see in

DBMS?

Page 61: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 61

DBMS and Usability• What do you hate about Database

Management Systems?– From your experiences– In general

• What do you like about Database Management Systems?– From your experience– In general

Page 62: Future of Database Systems 2: XML Databases and Grid-based Digital Libraries

IS 257 – Fall 2005 2005.11.22- SLIDE 62

Next Week• Workshops to help you develop the final

reports and presentations.