my slide distributed database management systems

38
Rushdi Shams, Dept of CSE, KU ET 1 Database Systems Database Systems Distributed Database Distributed Database Systems Systems Version 1.0 Version 1.0

Upload: rushdi-shams

Post on 19-Jan-2015

2.634 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: My slide  distributed database management systems

Rushdi Shams, Dept of CSE, KUET 1

Database Database SystemsSystems

Distributed Database Distributed Database SystemsSystems

Version 1.0Version 1.0

Page 2: My slide  distributed database management systems

2Rushdi Shams, Dept of CSE, KUET

Introduction Introduction A distributed database systems is a A distributed database systems is a

database systems which is database systems which is fragmented fragmented or replicatedor replicated on machines on machines

These machines are usually located on These machines are usually located on different geographical location of an different geographical location of an organizationorganization

FragmentationFragmentation is made of some subsets is made of some subsets of the original databaseof the original database

ReplicationReplication refers to the copy of the refers to the copy of the whole database or part of the original whole database or part of the original databasedatabase

Page 3: My slide  distributed database management systems

3Rushdi Shams, Dept of CSE, KUET

Idea of Distributed Idea of Distributed Database SystemsDatabase Systems

4 sites connected by a communication network4 sites connected by a communication network Sites 1, 2 and 4 run a single databaseSites 1, 2 and 4 run a single database Site 3 has no database. It accesses the other 3 Site 3 has no database. It accesses the other 3

sites for data manipulationsites for data manipulation

Page 4: My slide  distributed database management systems

4Rushdi Shams, Dept of CSE, KUET

FragmentationFragmentation

There are 2 basic types of There are 2 basic types of fragmentationsfragmentations

1.1. Horizontal fragmentationHorizontal fragmentation

2.2. Vertical fragmentationVertical fragmentation

Page 5: My slide  distributed database management systems

5Rushdi Shams, Dept of CSE, KUET

Horizontal Horizontal FragmentationFragmentation

Horizontal fragmentation is the subset of rows of a Horizontal fragmentation is the subset of rows of a single tablesingle table

Say, we need to manipulate a table that contains Say, we need to manipulate a table that contains information about British Peopleinformation about British People

We have 3 sitesWe have 3 sites Edinburgh site will have those rows of the table that Edinburgh site will have those rows of the table that

have information about Scottish peoplehave information about Scottish people Cardiff site will have those rows of the table that have Cardiff site will have those rows of the table that have

information about Welsh peopleinformation about Welsh people London site will have those rows of the table that have London site will have those rows of the table that have

information about English peopleinformation about English people The 3 sites are working as distributed processors. So, The 3 sites are working as distributed processors. So,

together they represent information about all the together they represent information about all the British peopleBritish people

Page 6: My slide  distributed database management systems

6Rushdi Shams, Dept of CSE, KUET

Horizontal Fragmentation Horizontal Fragmentation (continued)(continued)

Page 7: My slide  distributed database management systems

7Rushdi Shams, Dept of CSE, KUET

Horizontal Fragmentation Horizontal Fragmentation (continued)(continued)

Horizontal fragmentation is done by Horizontal fragmentation is done by restricting the table with WHERE restricting the table with WHERE condition in query languages!!condition in query languages!!

In the previous example, you can In the previous example, you can fragment the table like fragment the table like

1.1. WHERE LOCATION=EDINBURGHWHERE LOCATION=EDINBURGH2.2. WHERE LOCATION=CARDIFFWHERE LOCATION=CARDIFF3.3. WHERE LOCATION=LONDONWHERE LOCATION=LONDON To find the original table, you just union To find the original table, you just union

all the fragmented tables!all the fragmented tables! Easy, huh?Easy, huh?

Page 8: My slide  distributed database management systems

8Rushdi Shams, Dept of CSE, KUET

Horizontal Fragmentation Horizontal Fragmentation (continued)(continued)

Consider the horizontal fragmentation of relation Consider the horizontal fragmentation of relation Proj according to its BUDGET value.Proj according to its BUDGET value.

Relations with BUDGET > 200000 go into Proj1 Relations with BUDGET > 200000 go into Proj1 and the rest goes into Proj2.and the rest goes into Proj2.

Proj1= Proj1= (budget>200000) (budget>200000) ProjProjProj2= Proj2= (budget (budget ≤≤ 200000) 200000) ProjProj

Page 9: My slide  distributed database management systems

9Rushdi Shams, Dept of CSE, KUET

Vertical FragmentationVertical Fragmentation

Vertical fragmentation is a method Vertical fragmentation is a method of fragmenting a table byof fragmenting a table by projectingprojecting columns of a table columns of a table with with primary keyprimary key

To find out the original table, you To find out the original table, you just need to join the newly created just need to join the newly created tables according to the primary key!tables according to the primary key!

Again, it’s easy, huh?Again, it’s easy, huh?

Page 10: My slide  distributed database management systems

10Rushdi Shams, Dept of CSE, KUET

Vertical Fragmentation Vertical Fragmentation (continued)(continued)

The table proj is fragmented into 2 tables proj 1 and The table proj is fragmented into 2 tables proj 1 and proj 2proj 2

Both tables have the primary key- PNO. Keep an eye on Both tables have the primary key- PNO. Keep an eye on it, fellows!it, fellows!

If you join them according to the PNO of both table, If you join them according to the PNO of both table, what do you get? Answer- Proj table again!! what do you get? Answer- Proj table again!!

Page 11: My slide  distributed database management systems

11Rushdi Shams, Dept of CSE, KUET

Both Fragmentation at A Both Fragmentation at A GlanceGlance

Page 12: My slide  distributed database management systems

12Rushdi Shams, Dept of CSE, KUET

Why FragmentationWhy Fragmentation

Usage:Usage:

Applications work with views rather Applications work with views rather than entire relationsthan entire relations

Efficiency:Efficiency:

Data is stored close to where it is Data is stored close to where it is most frequently usedmost frequently used

Data that is not needed by local Data that is not needed by local applications are not storedapplications are not stored

Page 13: My slide  distributed database management systems

13Rushdi Shams, Dept of CSE, KUET

Why Fragmentation Why Fragmentation (continued)(continued)

Parallelism:Parallelism:

Transaction can be divided into Transaction can be divided into several subqueries that operate on several subqueries that operate on fragmentsfragments

Security:Security:

Data that is not needed by local Data that is not needed by local applications are not stored and so is applications are not stored and so is not vulnerable to unauthorized usersnot vulnerable to unauthorized users

Page 14: My slide  distributed database management systems

14Rushdi Shams, Dept of CSE, KUET

Disadvantage of Disadvantage of FragmentationFragmentation

Performance:Performance:

If queries involve to fetch data from If queries involve to fetch data from tables that are on different sites, it tables that are on different sites, it requires processing timerequires processing time

Page 15: My slide  distributed database management systems

15Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of FragmentationFragmentation

Well, when I first hear correctness- I Well, when I first hear correctness- I was boomed! Actually it means was boomed! Actually it means nothing rather than some properties nothing rather than some properties of fragmentationof fragmentation

So, don’t worry about that. It is So, don’t worry about that. It is called CORRECTNESS in database called CORRECTNESS in database jargon, so, don’t call it property, jargon, so, don’t call it property, a’right?a’right?

Page 16: My slide  distributed database management systems

16Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of Fragmentation (continued)Fragmentation (continued)

There are 3 correctness rulesThere are 3 correctness rules

1.1. CompletenessCompleteness

2.2. ReconstructionReconstruction

3.3. DisjointnessDisjointness

Page 17: My slide  distributed database management systems

17Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of Fragmentation (continued)Fragmentation (continued)

1.1. Completeness:Completeness:

If relation R is fragmented into If relation R is fragmented into fragments R1, R2, R3… Rn, each data fragments R1, R2, R3… Rn, each data item that can be found in R must appear item that can be found in R must appear in at least one fragmentin at least one fragment

So, why don’t you say this way- no data So, why don’t you say this way- no data item of original relation R gets missing!!item of original relation R gets missing!!

Man, I hate theoretical definitions!Man, I hate theoretical definitions!

Page 18: My slide  distributed database management systems

18Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of Fragmentation (continued)Fragmentation (continued)

2.2. Reconstruction:Reconstruction:

There must be a relational There must be a relational operation by which we can operation by which we can reconstruct R from the fragmentsreconstruct R from the fragments

We already saw that by Unioning We already saw that by Unioning (() horizontal fragments we can ) horizontal fragments we can have original R and by joining have original R and by joining vertical fragments, we can achieve vertical fragments, we can achieve R!R!

Page 19: My slide  distributed database management systems

19Rushdi Shams, Dept of CSE, KUET

Correctness of Correctness of Fragmentation (continued)Fragmentation (continued)

3.3. Disjointness:Disjointness:

If data item Di appears in fragment If data item Di appears in fragment Ri, then it should not appear in any Ri, then it should not appear in any other fragmentother fragment

Exception of this is in vertical Exception of this is in vertical fragmentation, where primary key fragmentation, where primary key attributes must be repeated to allow attributes must be repeated to allow reconstructionreconstruction

Page 20: My slide  distributed database management systems

20Rushdi Shams, Dept of CSE, KUET

TransparencyTransparency

You have distributed one table to 3 You have distributed one table to 3 sites just now. The user, when he sites just now. The user, when he requires data, should not know this!requires data, should not know this!

This process of hiding the This process of hiding the fragmentation and distribution the fragmentation and distribution the fragments to different sites is called fragments to different sites is called transparencytransparency

Page 21: My slide  distributed database management systems

21Rushdi Shams, Dept of CSE, KUET

Types of TransparencyTypes of Transparency

1.1. Location transparencyLocation transparencyUser should not be aware of the location of the User should not be aware of the location of the data. This simplifies the user interface and data. This simplifies the user interface and user programs that are used to query the tableuser programs that are used to query the table

2.2. Fragmentation transparencyFragmentation transparencyUser must not know that the data have been User must not know that the data have been fragmented and how the data have been fragmented and how the data have been fragmentedfragmented

3.3. Replication transparencyReplication transparencyReplication is necessary sometimes as this Replication is necessary sometimes as this makes the processing faster. But user should makes the processing faster. But user should not be aware of it.not be aware of it.

Page 22: My slide  distributed database management systems

22Rushdi Shams, Dept of CSE, KUET

Need of TransparencyNeed of Transparency A manager wishing to find the total

number of employees at the Scottish subsidiary need not be aware that he is querying a remote database

A manager running a query in London should not need to be aware that to produce the aggregate salary bill for the company all three sites – London, Cardiff and Edinburgh – need to be interrogated

When periodically data need to be updated, the user need not directly know that three sites are effectively updated

Page 23: My slide  distributed database management systems

23Rushdi Shams, Dept of CSE, KUET

Foundation RuleFoundation Rule

The foundation rule of distributed The foundation rule of distributed database systems states-database systems states-

““Although the database systems are Although the database systems are distributed to several sites, it must look distributed to several sites, it must look like a centralised database systems to the like a centralised database systems to the user”user”

Then how do you make this foundation Then how do you make this foundation rule true?rule true?

Answer- by applying 3 types of Answer- by applying 3 types of transparencies transparencies

Page 24: My slide  distributed database management systems

24Rushdi Shams, Dept of CSE, KUET

Advantages of Distributed Advantages of Distributed Database SystemsDatabase Systems

Reflects organizational structureReflects organizational structure — database fragments are located in — database fragments are located in the departments they relate to. the departments they relate to.

Local autonomyLocal autonomy — a department — a department can control the data about them (as can control the data about them (as they are the ones familiar with it.) they are the ones familiar with it.)

Improved availabilityImproved availability — a fault in — a fault in one database system will only affect one database system will only affect one fragment, instead of the entire one fragment, instead of the entire database database

Page 25: My slide  distributed database management systems

25Rushdi Shams, Dept of CSE, KUET

Advantages of Distributed Advantages of Distributed Database Systems Database Systems

(continued)(continued) Improved performanceImproved performance — data is located — data is located

near the site of greatest demand, and the near the site of greatest demand, and the database systems themselves are parallelized, database systems themselves are parallelized, allowing load on the databases to be balanced allowing load on the databases to be balanced among servers. (A high load on one module of among servers. (A high load on one module of the database won't affect other modules of the database won't affect other modules of the database in a distributed database.) the database in a distributed database.)

EconomicsEconomics — it costs less to create a — it costs less to create a network of smaller computers with the power network of smaller computers with the power of a single large computer. of a single large computer.

Modularity Modularity — systems can be modified, — systems can be modified, added and removed from the distributed added and removed from the distributed database without affecting other modules database without affecting other modules (systems). (systems).

Page 26: My slide  distributed database management systems

26Rushdi Shams, Dept of CSE, KUET

Disadvantages of Disadvantages of Distributed Database Distributed Database

SystemsSystems ComplexityComplexity — extra work must be done by the — extra work must be done by the

DBAs to ensure that the distributed nature of the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be system is transparent. Extra work must also be done to maintain multiple disparate systems, done to maintain multiple disparate systems, instead of one big one. Extra database design instead of one big one. Extra database design work must also be done to account for the work must also be done to account for the disconnected nature of the database — for disconnected nature of the database — for example, joins become prohibitively expensive example, joins become prohibitively expensive when performed across multiple systems. when performed across multiple systems.

Economics Economics — increased complexity and a more — increased complexity and a more extensive infrastructure means extra labour extensive infrastructure means extra labour costs. costs.

Page 27: My slide  distributed database management systems

27Rushdi Shams, Dept of CSE, KUET

Disadvantages of Disadvantages of Distributed Database Distributed Database Systems (continued)Systems (continued)

SecuritySecurity — remote database fragments must be — remote database fragments must be secured, and they are not centralized so the remote secured, and they are not centralized so the remote sites must be secured as well. The infrastructure sites must be secured as well. The infrastructure must also be secured (eg: by encrypting the must also be secured (eg: by encrypting the network links between remote sites). network links between remote sites).

Difficult to maintain integrityDifficult to maintain integrity — in a distributed — in a distributed database enforcing integrity over a network may database enforcing integrity over a network may require too much networking resources to be require too much networking resources to be feasible. feasible.

InexperienceInexperience — distributed databases are difficult — distributed databases are difficult to work with, and as a young field there is not much to work with, and as a young field there is not much readily available experience on proper practice. readily available experience on proper practice.

Page 28: My slide  distributed database management systems

28Rushdi Shams, Dept of CSE, KUET

Types of Distributed Types of Distributed Database SystemsDatabase Systems

1.1. Homogeneous Database SystemsHomogeneous Database Systems

2.2. Heterogeneous Database SystemsHeterogeneous Database Systems

3.3. Federated Database SystemsFederated Database Systems

Page 29: My slide  distributed database management systems

29Rushdi Shams, Dept of CSE, KUET

Homogeneous Distributed Homogeneous Distributed Database SystemsDatabase Systems

Data is distributed across 2 or more Data is distributed across 2 or more systemssystems

All the systems will have to run the All the systems will have to run the same DBMS (eg. Oracle)same DBMS (eg. Oracle)

Moreover, the systems should be run Moreover, the systems should be run on the same hardware platformon the same hardware platform

And the systems should be run on the And the systems should be run on the same Operating Systemssame Operating Systems

Hmm, pretty weird??Hmm, pretty weird??

Page 30: My slide  distributed database management systems

30Rushdi Shams, Dept of CSE, KUET

Homogeneous Distributed Homogeneous Distributed Database Systems Database Systems

(continued)(continued)

Page 31: My slide  distributed database management systems

31Rushdi Shams, Dept of CSE, KUET

Heterogeneous Distributed Heterogeneous Distributed Database SystemsDatabase Systems

Data is distributed across 2 or more Data is distributed across 2 or more systemssystems

Those systems’ hardware & software Those systems’ hardware & software configuration is diverseconfiguration is diverse

One site might be running ORACLE under Windows NT, another site Informix under UNIX, and yet another site Ingress under Windows NT

Pretty Cool, huh?

Page 32: My slide  distributed database management systems

32Rushdi Shams, Dept of CSE, KUET

Heterogeneous Distributed Heterogeneous Distributed Database Systems Database Systems

(continued)(continued)

UNIX

INFORMIX

INGRESS

Page 33: My slide  distributed database management systems

33Rushdi Shams, Dept of CSE, KUET

Federated Distributed Federated Distributed Database SystemsDatabase Systems

Switzerland is a country that is Switzerland is a country that is comprised with several political comprised with several political federationsfederations

These federations are autonomous and These federations are autonomous and political unitspolitical units

The national level decisions are made The national level decisions are made by combining their own decisionsby combining their own decisions

A federated database system is made up of a number of relatively independent, autonomous databases

Page 34: My slide  distributed database management systems

34Rushdi Shams, Dept of CSE, KUET

Federated Distributed Federated Distributed Database Systems Database Systems

(continued)(continued)

Page 35: My slide  distributed database management systems

35Rushdi Shams, Dept of CSE, KUET

Centralized DBMS vs Centralized DBMS vs Distributed DBMSDistributed DBMS

The system catalogue of a distributed database has to be more complex. For instance, it has to store details about the location of fragments and replicates

Concurrency problems are multiplied in distributed systems. The problems of propagating updates to a series of different sites are very involved

Page 36: My slide  distributed database management systems

36Rushdi Shams, Dept of CSE, KUET

Centralized DBMS vs Centralized DBMS vs Distributed DBMS Distributed DBMS

(continued)(continued) A query optimiser in a true

distributed system should be able to utilise information about the structure of the network in deciding how best to satisfy a given query

To ensure a robust system, the distributed DBMS should not be located solely at one site. Software as well as data need to be distributed

Page 37: My slide  distributed database management systems

37Rushdi Shams, Dept of CSE, KUET

Implementation Phase of Implementation Phase of Distributed DBMSDistributed DBMS

1. In the first phase we distribute queries between sites but update only to a single site

2. In the second phase we not only distribute queries, we also distribute transactions between sites.

The latter scenario is clearly the more technically challenging of the two

Most existing distributed database systems are in phase 1

Very few organisations seem to have solved all of the problems associated with phase 2 applications

Page 38: My slide  distributed database management systems

38Rushdi Shams, Dept of CSE, KUET

ReferencesReferences

www.wikipedia.orgwww.wikipedia.org Database Systems by Paul Beynon-Database Systems by Paul Beynon-

Devies, Palgrave Macmillan, 2004Devies, Palgrave Macmillan, 2004 www.cs.uga.edu/~tartir/classes/8370/FDBwww.cs.uga.edu/~tartir/classes/8370/FDB

S.htmlS.html

Distributed Database Design by Fabio Distributed Database Design by Fabio Porto, Database LaboratoryPorto, Database Laboratory

John hall, Senior Lecturer, University of John hall, Senior Lecturer, University of Bolton, United KingdomBolton, United Kingdom