distributed dbmspage 5. 1 © 1998 m. tamer Özsu & patrick valduriez outline introduction...

16
Distributed DBMS Page 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design Fragmentation Data Location Distributed Query Processing (Briefly) Distributed Transaction Management (Extensive) Building Distributed Database Systems (RAID) Mobile Database Systems Privacy, Trust, and Authentication Peer to Peer Systems

Upload: hubert-poole

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 1© 1998 M. Tamer Özsu & Patrick Valduriez

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design

Fragmentation Data Location

Distributed Query Processing (Briefly) Distributed Transaction Management

(Extensive) Building Distributed Database Systems

(RAID) Mobile Database Systems Privacy, Trust, and Authentication Peer to Peer Systems

Page 2: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 2© 1998 M. Tamer Özsu & Patrick Valduriez

Useful References W. W. Chu, Optimal File Allocation in

Multiple Computer System, IEEE Transaction on Computers, 885-889, October 1969.

Page 3: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 3© 1998 M. Tamer Özsu & Patrick Valduriez

Allocation Alternatives

Non-replicated partitioned : each fragment resides at only one

site Replicated

fully replicated : each fragment at each site partially replicated : each fragment at some of

the sites Rule of thumb:

If replication is advantageous,

otherwise replication may cause problems

read - only queriesupdate queries

1

Page 4: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 4© 1998 M. Tamer Özsu & Patrick Valduriez

Comparison of Replication Alternatives

Full-replication Partial-replication Partitioning

QUERY PROCESSING

Easy Same Difficulty

Same DifficultyDIRECTORYMANAGEMENT

Easy orNon-existant

CONCURRENCYCONTROL

EasyDifficultModerate

RELIABILITY Very high High Low

REALITYPossible

applicationRealistic

Possibleapplication

Page 5: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 5© 1998 M. Tamer Özsu & Patrick Valduriez

Four categories:

Database information

Application information

Communication network information

Computer system information

Information Requirements

Page 6: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 6© 1998 M. Tamer Özsu & Patrick Valduriez

Fragment Allocation

Problem StatementGiven

F = {F1, F2, …, Fn} fragments

S ={S1, S2, …, Sm} network sites

Q = {q1, q2,…, qq} applications

Find the "optimal" distribution of F to S. Optimality

Minimal cost Communication + storage + processing (read & update) Cost in terms of time (usually)

PerformanceResponse time and/or throughput

Constraints Per site constraints (storage & processing)

Page 7: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 7© 1998 M. Tamer Özsu & Patrick Valduriez

Information Requirements

Database information selectivity of fragments size of a fragment

Application information access types and numbers access localities

Computer system information unit cost of storing data at a site unit cost of processing at a site

Communication network information bandwidth latency communication overhead

Page 8: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 8© 1998 M. Tamer Özsu & Patrick Valduriez

File Allocation (FAP) vs Database Allocation (DAP):

Fragments are not individual files

relationships have to be maintained

Access to databases is more complicated

remote file access model not applicable

relationship between allocation and query processing

Cost of integrity enforcement should be considered

Cost of concurrency control should be considered

Allocation

Page 9: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 9© 1998 M. Tamer Özsu & Patrick Valduriez

Allocation – Information Requirements

Database Information selectivity of fragments size of a fragment

Application Information number of read accesses of a query to a fragment number of update accesses of query to a fragment A matrix indicating which queries updates which fragments A similar matrix for retrievals originating site of each query

Site Information unit cost of storing data at a site unit cost of processing at a site

Network Information communication cost/frame between two sites frame size

Page 10: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 10© 1998 M. Tamer Özsu & Patrick Valduriez

General Form

min(Total Cost)

subject to

response time constraint

storage constraint

processing constraint

Decision Variable

Allocation Model

xij 1 if fragment Fi is stored at site Sj 0 otherwise

Page 11: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 11© 1998 M. Tamer Özsu & Patrick Valduriez

Total Cost

Storage Cost (of fragment Fj at Sk)

Query Processing Cost (for one query)

processing component + transmission component

Allocation Model

(unit storage cost at Sk) (size of Fj) xjk

query processing cost all queries

cost of storing a fragment at a siteall fragmentsall sites

Page 12: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 12© 1998 M. Tamer Özsu & Patrick Valduriez

Query Processing CostProcessing component

access cost + integrity enforcement cost + concurrency control cost

Access cost

Integrity enforcement and concurrency control costs Can be similarly calculated

Allocation Model

(no. of update accesses+ no. of read accesses) all fragmentsall sites

xijlocal processing cost at a site

Page 13: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 13© 1998 M. Tamer Özsu & Patrick Valduriez

Query Processing CostTransmission component

cost of processing updates + cost of processing retrievals

Cost of updates

Retrieval Cost

Allocation Model

update message cost all fragmentsall sites

acknowledgment cost all fragmentsall sites

(minall sitesall fragments cost of retrieval command

cost of sending back the result)

Page 14: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 14© 1998 M. Tamer Özsu & Patrick Valduriez

Constraints Response Time

execution time of query ≤ max. allowable response time for that query

Storage Constraint (for a site)

Processing constraint (for a site)

Allocation Model

storage requirement of a fragment at that site all fragments

storage capacity at that site

processing load of a query at that site all queries

processing capacity of that site

Page 15: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 15© 1998 M. Tamer Özsu & Patrick Valduriez

Solution Methods

FAP is NP-complete

DAP also NP-complete

Heuristics based on

single commodity warehouse location (for FAP)

knapsack problem

branch and bound techniques

network flow

Allocation Model

Page 16: Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database

Distributed DBMS Page 5. 16© 1998 M. Tamer Özsu & Patrick Valduriez

Attempts to reduce the solution space

assume all candidate partitionings known; select the “best” partitioning

ignore replication at first

sliding window on fragments

Allocation Model