distributed database management system (ddbs)
DESCRIPTION
Distributed Database Management System (DDBS). Motivation : Data is used at multiple distributed sites (e.g. Branch offices). Communication between sites is ----- costly ----- potentially unreliable Solution : - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/1.jpg)
Distributed Database Management System (DDBS)
Motivation: Data is used at multiple distributed sites (e.g. Branch offices).
Communication between sites is ----- costly----- potentially unreliable
Solution: ----- Allow sites to store/maintain the data they use
most often/specialize in ----- Sharing with other sites/HQs if combinations of
data necessary
![Page 2: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/2.jpg)
Network Topology
A B
CF
E D
A
CFB
DE
A B
C
DE
F
A
B E
C D F
AB
E F
C
D
Fully connected networkPartially connected network
Tree structured network
Star network
Ring network
Cost/reliability #of hops
![Page 3: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/3.jpg)
Tradeoffs between
1. keeping data in centralized headquarters:• simpler maintenance• simpler consistency enforcement• possibly more efficient if many updates,aggregate
computations
2. or distributed across branch offices:• lower communication cost• reliability• parallelism can be implemented locally
![Page 4: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/4.jpg)
• Interconnectivity of pre-existing DBs• Expandability (don’t need to replace whole system
to grow)• Cost (many small engines on PC’s cheaper than
mainframes) issue:communication costs vs. hardware computation costs.
• Performance (place data near where used)• Availability and reliability
Advantages of DDBS (heterogeneous)
![Page 5: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/5.jpg)
Complicating factors
• Maintaining data consistency (in face of replication and sharing)
• Distributed directory management (who controls mapping of data to sites)
• Security
• Heterogeneous Databases
different database architectures
![Page 6: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/6.jpg)
Distributed Database Design Issues
Options for storing a relation R across multiple sites:
• Replication (maintain copies/replicas of R on multiple sites)
• Fragmentation (Relation store in fragments/ pieces on multiple sites)
• combination of both
![Page 7: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/7.jpg)
R1
R2
Copy of R1 Copy of R1
Copy of R2 Copy of R2
site1 site2 site3 site4
1/3 of R1
1/3 of R1
1/3 of R1
1
2
3 4
5
67
8
9
½OfR2
½OfR2
A/B C/D
(Horizontal)
(Vertical)
FRAGMENTATION
REPLICATION
![Page 8: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/8.jpg)
Replication
• Issues: (whole database replication v.s. no replication)
- what to replicate?(all relations or only frequently user shared data)
- where to replicate? (function of communication costs,usage needs,resources)
- which relations to replicate?
-” primary copy” of relation (simplifies consistency enforcement, but where located?)
Non redundantallocation
![Page 9: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/9.jpg)
Replication (cont)
•Advantages:
–Improved availability (multiple sources for a relation if a site is down)
–Increased parallelism (sites can process (primarily) read- only operations in parallel,minimizing data transfer)
(well suited for read-only, majority read-only data access)
![Page 10: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/10.jpg)
Replication (cont)
• Disadvantages:o problems/overhead for writes/updateso costs of consistency enforcement
- updates propagated to all sites (communication costs)
- costs of synchronization/locking for consistency enforcement on update
greater than in single source models.
Complicates concurrency and recover Replication inefficient in databases with frequent
updates
![Page 11: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/11.jpg)
FRAGMENTATION
• Vertical• Horizontal• mixel
Issues:
- completeness: Every tuple/attribute in some fragment
- reconstruction:easy way of reconstructing full relation
- transparency
![Page 12: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/12.jpg)
-Fragments contain subsets of complete tuples (all attributes at all sites)
How to reconstruct
R=Rs1 Rs2 ……. Rsn
HORIZONTAL FRAGMENTATION
Original relation
A1 A2 ………. An1
1
1
2
2
3
3
3
T1
T2
T3
.
.T60
T61
.
.
Tn
A1 A2 ………. An
A1 A2 ………. AnT1
T2
T3
.
.T60
T61
.
.
Tn
Site 1
Site 2
![Page 13: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/13.jpg)
Horizontal Fragmentation
• Example Usefulness:
- Each branch office maintains complete attribute set of its employees
(salary,benefits,address/phone,departments,projects,etc.)
- Site of Fragment easily determined by a key attribute value -e.g. Branch_office*
![Page 14: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/14.jpg)
VERTICAL FRAGMENTATION
A1 A2 A3 A4
A1 A2 A3 A4
Original Relation (R) t1
t2
tn
RS1
RS2
t1
t2
tn
t1
t2
tn
SITE1 SITE2
How to Reconstruct:
R=Rs1 Rs2 Rsn
TID –Tuple ID Hidden Attribute to
ensure account and simple join reconstruction
RS1.TID=RS2.TID
Join condition
1
2
n
1
2
n
TID TID
![Page 15: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/15.jpg)
Example usefulness:
Salary Office
Benefits Office
Directory (Name|address|phone|fax)
Dependents Management Office
each control their own appropriate
attribute for all corporate branch offices
VERTICAL –Attribute-centered management
(keep all instances of an attribute in one place)
HORIZONTAL – tuple/individual-centered management
(keep all values of a tuple in one place)
VERTICAL FRAGMENTATION
![Page 16: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/16.jpg)
MIXED FRAGMENTATION
usa
Europe
A1 A2 A3
A1 A2 A3
A4 A5
A4 A5
A1 A2 A3 A4 A5
(Salary Attributes)
(Benefit Attributes)
Rs1
Rs2
Rs3
Rs4
R
![Page 17: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/17.jpg)
Partition of Attributes/tuples need not be disjoint
REPLICATION and FRAGMENTATION
A1 A2 A3 A4 A5
A1 A2 A3 A4 A2 A3 A4 A5
Overlap
(replication of attributes)
![Page 18: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/18.jpg)
TRANSPARENCY
Fragmentation Transparency
-User doesn’t need to know mapping between relations and fragmented subrelations
Replication Transparency
-User doesn’t need to know about existence or location of other copies (treat as if single copy of DB)
Location and Naming Transparency
-Use shouldn’t need to know about location and full names of data on the server
Salary(ssn=so(Employee)) Site27, Employee. Fragment3. Replica7
Name Server Proper site, Fragment, replica for this data access
Unique name
![Page 19: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/19.jpg)
Issues1:
Parallel Processing across Fragments
LName(salary>40,000(Employee))
LName(salary>40,000(Emp1)) U LName(salary>40,000(Emp2))
QUERY PROCESSING IN DDMS
=Emp1 U Emp2
2 Fragments
Site 1 Site 2
Execution in Parallel on fragments
and union results together
Horizontal fragmentations
![Page 20: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/20.jpg)
(A B) C
A (B C)50K 0.5K
1K 3K
0.5K
0.5K
Site1 Site2 Site3
50K 1K 3KJoins- symmetric and
associative
Parallel Processing
(xx(A)) (B C)
![Page 21: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/21.jpg)
QUERY PROCESSING IN DDBS
R= Fnames, Cnames, Dnames (Employee Department)
Strategies:1)Ship both relations to the result site and join there
2)Ship employee to 2, join at 2, results to 3
3)Ship Department to 1, join at 1, results to 3
minimize total communication cost of data transfer
1,003,000 bytes transfered
1,002,000 bytes transfered
5,000 bytes transfered
Join Strategies
Site 3
100 records, 2000 bytes
Site 1
10,000 records, 1,000,000 bytes
Site 2
100 records, 3000 bytes
Mg rssn to ssn
![Page 22: Distributed Database Management System (DDBS)](https://reader033.vdocuments.us/reader033/viewer/2022061520/56814aee550346895db7fb8d/html5/thumbnails/22.jpg)
-transaction managers / coordinators
-log managers
Problems:
-failure of site
-failure of link
-loss of messages
if server is down, elect new server what about network partitioning?
RECOVERY IN DDBS
Server’s link
Newly elected Server
Difficult to know which had occurred
Original Server