a sketch of regres mike carey joey hellerstein michael stonebraker

43
A Sketch of Regres Mike Carey Joey Hellerstein Michael Stonebraker

Post on 19-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

A Sketch of Regres

Mike Carey

Joey Hellerstein

Michael Stonebraker

Outline

• Why we need to rethink everything!– All current DBMSs architected in the late

1970s– why the world is different now

• A sketch of Regres– a new data base architecture for the millenium

The World is Different

• CPU, memory, disk up by 10 ** 6 in the last 20 years

• Design point of 1 Tbyte buffer pool in 2005, up from 1 Mbyte in the 1970s

• It will NOT be 250 million 4K pages!

Need to rethink storage architectures!

The World is Different

• Most serious applications use a TP monitor

• I.e. a three tier application architecture– data at the bottom in a DBMS– code in middle tier in TP monitor– user interface on the client

Why?

• 2 tier doesn’t scale and is too hard to manage

• DBMS couldn’t execute code

Need to rethink application architecture!

Probably undesirable to decompose function this way!

Want code “near” the data it accesses

The World is Different

• 7X 24 a serious requirement most everywhere!

• End-to-end issue – RAID not the (complete) answer– Require wide area network replication

Need to design in, not bolt on this capability!

The World is Different

• The web changes everything– not a client-server protocol!– stateless– requirement to deal with HTML, XML, ...

Need to have web-centric architecture!

The World is Different

• ERP and web applications require scalability, unheard of previously – 100,000 ERP seats not uncommon– E-commerce on the web will entail huge

transactions rates

Need to think at these levels!

The World is Different

• Warehousing is a new application area– typical app is data mining– queries run forever

Need to design in, not bolt on sampling!

The World is Different

• Multiprocessor architectures common– Clusters are here– NUMA is here– MPP is here

Need to design in, not bolt on load balance!

The World is Different

• The gizmo revolution is coming– mobile clients– disconnected operation

Need to design in not bolt on disconnected operation!

The World is Different

• The gizmo revolution is coming– small footprint servers (coke machine as a data

base)

Need to scale down as well as up in one system!

The World is Different

• SQL-3 is here– components (blades, extenders, OLE, Corba)

in the data base– multiple language support required– inheritance required

Need to design in, not bolt on method support in a variety of component models!

And….

• DBMSs are currently “bloated”– stored procedures– object-relational features– warehouse features– triggers– standard benchmark hacks

• Users have a low tolerance for errors

Debugging the next release is getting hard!

Conclusion

• Need to rethink DBMS architecture from the ground up

• This comment also applies to operating systems

• and probably to networks

The Result -- Regres

• A mix of some discarded ideas (whose time has re-come)

• And some new ideas

Assumptions -- Must Design for a Data and Machine Federation

• 7 X 24 operation requires wide area replication– understood by the DBMS– transactionally consistent– fastest mechanism is to move the log

Argues for Federated DBMS!

Assumptions -- Must Design for a Data and Machine Federation

• Integrating code and data on multiple machines is a better idea than TP monitors– data and code on each machine in a network!

Requires a Federated DBMS!

Assumptions -- Must Design for a Data and Machine Federation

• Incredible scalability requires more than the biggest single system

Federated DBMS a good model!

Advantages of a Federated DBMS

• Mimics the enterprise, which is distributed

• Naturally supports mergers

• Allows “jelly bean” hardware components

• Can be incrementally built and extended

Assumptions -- Semantic Heterogeneity a Must!

• No systems to be federated have a common schema– salary in US is gross dollars– salary in France is net francs with a lunch

allowance

Must deal with this!

Assumptions -- Local Autonomy a Must!

• Few systems to be federated are in the same “administrative moat”!

• Must allow local DBAs to control their own destiny!

Traditional Distributed DBMSs (and all commercial systems)

• Do neither of these

• Are a non-starter for a future architecture

Cannot have a traditional query optimizer!

Mariposa (and Cohera) made a good start

• Economic paradigm for federated query processing– each query has a budget– each site is an independent contractor– federator acts like a general contractor, trying

to solve query under the budget

Agoric systems are starting to get traction!

Mariposa (and Cohera) made a good start

• Flexible heterogeneous replication– master-slave or peer-to-peer– bounded out-of-date-ness

• Mobile (and disconnected) sites ok– out-of-date replica

Mariposa (and Cohera) Data Model

• A collection of fragments of a SQL-3 table– range partitioning– type conversion of data types when federated

• Each “owned” by a local DBA

But there is much room for improvement!

• Query decomposition into economic units of work– bottom-up– top down– heuristic decomposition

But there is much room for improvement!

• Change the economic plan midflight if circumstances change– how to tell things have changed– what to do

But there is much room for improvement!

• Partial answers are often a good idea– how to integrate Control ideas into an agoric

system– can it be done without knowing how much of

the answer the user will want?

But there is much room for improvement!

• Future data will be imprecise– imagine federating Michelin and Fodors

restaurant guide

• Query processing must become evidence accumulation– built-in not bolted on– model of “likely sites” required

Local DBMS -- Storage Model

• Store segments– I.e. the unit of federation

• Also the unit of movement between disk and cache (segmented storage)

• Need “split” and “coalesce” to keep variable length segments reasonably sized

Shades of the Burroughs B5000!

Storage Model -- Open Issues

• When to coalesce and split segments

• LRU a bad model for eviction

Local DBMS -- System Services

• DBMS provides buffer pool, file system– Can provide file system abstraction easily

• Thread management from compiler

• Reliable message delivery from network

• DBMS is only application running on the machine– no need for a schedulerVery thin OS will do…..

Local DBMS -- No Knobs

• Current DBMSs are WAY too hard to use

• Not enough talented DBAs to go around

• Tuning typically done by vendor’s SE

Want to have NO tuning knobs!

Only control: go/stop

Not clear how to do this!

Protocol

• Federation components must communicate with an asynchronous (stateless) protocol

• Design challenge for a world where sessions are the norm

Local DBMS -- Attacking Bloat

• Basic Problem -- two data representations– the log– the data in the data base

• Consistency of these representations on crashes drives a lot of complexity

Idea Number One

• One representation -- no log

• “No overwrite” versioning storage system (like POSTGRES) for undo

• Wide area replication for recovery

Issue

• POSTGRES storage system required 4 writes to commit a transaction– too slow to be interesting in OLTP

• Can we design a “no overwrite” storage system with high performance?

Idea Number Two

• Log is the only storage system

• When data is brought into main memory, it is “swizzled” into a high performance format

• and “unswizzled” on cache eviction

Issue

• Can cache residency be made long enough to justify the overhead?

• Will “cold data” performance be unacceptably bad?

Semantic Heterogeneity

• Lots of approaches– code (Mariposa, Cohera)– Rules (Mergent)– Prolog

• Lots of past work– e.g. Multibase

Space well picked over!

Regres Focus

• Regres must be repository-based

• Regres must provide yellow pages for economic model

• Regres must provide “schema discovery” tools

Focus on the repository and building semantic heterogeneity support into it

Summary

• Thin local system; fat Federator

• Lots of interesting design challenges

• Focus of DBMS seminar this semester