the evolution of distributed dbms 4social and technical changes in the 1980’s u business...

49
The Evolution of Distributed DBMS Social and Technical Changes in the 1980’s Business operations became more decentralized geographically. Competition increased at the global level. Customer demands and market needs favored a decentralized management style. Rapid technological change created low-cost microcomputers. The LANs became the basis for computerized solutions. The large number of applications based on DBMSs and the need to protect investments in centralized DBMS software made the notion of data sharing attractive.

Upload: jody-anderson

Post on 13-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

The Evolution of Distributed DBMSThe Evolution of Distributed DBMS

Social and Technical Changes in the 1980’s

Business operations became more decentralized geographically.

Competition increased at the global level.

Customer demands and market needs favored a decentralized management style.

Rapid technological change created low-cost microcomputers. The LANs became the basis for computerized solutions.

The large number of applications based on DBMSs and the need to protect investments in centralized DBMS software made the notion of data sharing attractive.

Page 2: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

The Evolution of Distributed DBMSThe Evolution of Distributed DBMS

Two Database Requirements in a Dynamic Business Environment: Quick ad hoc data access became crucial in the quick-

response decision making environment. The decentralization of management structure based on

the decentralization of business units made decentralized multiple-access and multiple-location databases a necessity.

Developments in the 1990’s affecting DBMS The growing acceptance of the Internet and the World

Wide Web as the platform for data access and distribution.

The increased focus on data analysis that led to data mining and data warehousing.

Page 3: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

The Evolution of Distributed DBMSThe Evolution of Distributed DBMS

DDBMS Advantages

Data are located near the “greatest demand” site.

Faster data access

Faster data processing

Growth facilitation

Improved communications

Reduced operating costs

User-friendly interface

Less danger of a single-point failure

Processor independence

DDBMS Disadvantages

Complexity of management and control

Security

Lack of standards

Increased storage requirements

Page 4: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distributed Processingand Distributed DatabaseDistributed Processing

and Distributed Database

Distributed processing shares the database’s logical processing among two or more physically independent sites that are connected through a network. (See Figure 10.1)

Distributed database stores a logically related database over two or more physically independent sites connected via a computer network. (See Figure 10.2)

Page 5: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distributed Processing Environment

Figure 10.1

Page 6: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distributed Database Environment

Figure 10.2

Page 7: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distributed Processingand Distributed DatabaseDistributed Processing

and Distributed Database

Distributed processing does not require a distributed database, but a distributed database requires distributed processing.

Distributed processing may be based on a single database located on a single computer. In order to manage distributed data, copies or parts of the database processing functions must be distributed to all data storage sites.

Both distributed processing and distributed databases require a network to connect all components.

Page 8: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

What Is A Distributed DBMS?What Is A Distributed DBMS?

A distributed database management system (DDBMS) governs the storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites.

Page 9: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

What Is A Distributed DBMS?What Is A Distributed DBMS? Functions of a DDBMS

Application interface

Validation to analyze data requests

Transformation to determine request’s components

Query-optimization to find the best access strategy

Mapping to determine the data location

I/O interface to read or write data

Formatting to prepare the data for presentation

Security to provide data privacy

Backup and recovery

Database administration

Concurrency control

Transaction management

Page 10: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Centralized Database Management System

Figure 10.3

Page 11: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Fully Distributed Database Management System

Figure 10.4

Page 12: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

DDBMS ComponentsDDBMS Components

Computer workstations that form the network system.

Network hardware and software components that reside in each workstation.

Communications media that carry the data from one workstation to another.

Transaction processor (TP) receives and processes the application’s data requests.

Data processor (DP) stores and retrieves data located at the site. Also known as data manager (DM).

Page 13: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distributed Database System Components

Figure 10.5

Page 14: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

DDBMS ComponentsDDBMS Components

DDBMS protocol determines how the DDBMS will:

Interface with the network to transport data and commands between DPs and TPs.

Synchronize all data received from DPs (TP side) and route retrieved data to the appropriate TPs (DP side).

Ensure common database functions in a distributed system -- security, concurrency control, backup, and recovery.

Page 15: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Levels of Data & Process DistributionLevels of Data & Process Distribution

Single-Site Processing, Single-Site Data (SPSD) All processing is done on a single CPU or host

computer. All data are stored on the host computer’s local disk. The DBMS is located on the host computer. The DBMS is accessed by dumb terminals. Typical of most mainframe and minicomputer DBMSs. Typical of the 1st generation of single-user

microcomputer database.

Table 10.1

Page 16: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Nondistributed (Centralized) DBMS

Figure 10.6

Page 17: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Levels of Data & Process DistributionLevels of Data & Process Distribution

Multiple-Site Processing, Single-Site Data (MPSD) Typically, MPSD requires a network file server on which

conventional applications are accessed through a LAN.

A variation of the MPSD approach is known as a client/server architecture.

Figure 10.7

Page 18: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Levels of Data & Process DistributionLevels of Data & Process Distribution

Multiple-Site Processing, Multiple-Site Data (MPMD)

Fully distributed DBMS with support for multiple DPs and TPs at multiple sites.

Homogeneous DDMS integrate only one type of centralized DBMS over the network.

Heterogeneous DDBMS integrate different types of centralized DBMSs over a network. (See Figure 10.8)

Page 19: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Figure 10.8 Heterogeneous Distributed Database Scenario

Page 20: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distributed DB TransparencyDistributed DB Transparency

DDBMS transparency features have the common property of allowing the end users to think that he is the database’s only user.

Distribution transparency

Transaction transparency

Failure transparency

Performance transparency

Heterogeneity transparency

Page 21: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distribution TransparencyDistribution Transparency

Distribution transparency allows us to manage a physically dispersed database as though it were a centralized database.

Three Levels of Distribution Transparency Fragmentation transparency Location transparency Local mapping transparency

Table 10.2

Page 22: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distribution TransparencyDistribution Transparency

Example (Figure 10.9): Employee data (EMPLOYEE) are distributed over three locations: New York, Atlanta, and Miami.Depending on the level of distribution transparency support, three different cases of queries are possible:

Figure 10.9 Fragment Locations

Page 23: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distribution TransparencyDistribution Transparency

Case 1: DB Supports Fragmentation Transparency

SELECT *FROM EMPLOYEEWHERE EMP_DOB < ‘01-JAN-1940’;

Page 24: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distribution TransparencyDistribution Transparency

Case 2: DB Supports Location Transparency

SELECT *FROM E1WHERE EMP_DOB < ‘01-JAN-1940’;

UNION

SELECT *FROM E2WHERE EMP_DOC < ‘01-JAN-1940’;

UNION

SELECT *FROM E3WHERE EMP_DOC < ‘01-JAN-1940’;

Page 25: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distribution TransparencyDistribution Transparency

Case 3: DB Supports Local Mapping Transparency

SELECT *FROM E1 NODE NYWHERE EMP_DOB < ‘01-JAN-1940’;

UNION

SELECT *FROM E2 NODE ATLWHERE EMP_DOC < ‘01-JAN-1940’;

UNION

SELECT *FROM E3 NODE MIAWHERE EMP_DOC < ‘01-JAN-1940’;

Page 26: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distribution TransparencyDistribution Transparency

Distribution transparency is supported by distributed data dictionary (DDD) or a distributed data catalog (DDC).

The DDC contains the description of the entire database as seen by the database administrator.

The database description, known as the distributed global schema, is the common database schema used by local TPs to translate user requests into subqueries.

Page 27: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Performance Transparency andQuery Optimization

Performance Transparency andQuery Optimization

The objective of a query optimization routine is to minimize the total cost associated with the execution of a request. The costs associated with a request are a function of the:

Access time (I/O) cost involved in accessing the physical data stored on disk.

Communication cost associated with the transmission of data among nodes in distributed database systems.

CPU time cost associated with the processing overhead of managing distributed transactions.

Page 28: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Query optimization must provide distribution transparency as well as replica transparency.

Replica transparency refers to the DDBMSs ability to hide the existence of multiple copies of data from the user.

Most of the query optimization algorithms are based on two principles: Selection of the optimum execution order Selection of sites to be accessed to minimize

communication costs

Performance Transparency andQuery Optimization

Performance Transparency andQuery Optimization

Page 29: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Operation Modes of Query Optimization Automatic query optimization means that the DDBMS

finds the most cost-effective access path without user intervention.

Manual query optimization requires that the optimization be selected and scheduled by the end user or programmer.

Timing of Query Optimization Static query optimization takes place at compilation

time. Dynamic query optimization takes place at execution

time.

Performance Transparency andQuery Optimization

Performance Transparency andQuery Optimization

Page 30: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Optimization Techniques by Information Used Statistically based query optimization uses statistical

information about the database. In the dynamic statistical generation mode, the

DDBMS automatically evaluates and updates the statistics after each access.

In the manual statistical generation mode, the statistics must be updated periodically through a user-selected utility.

Rule-based query optimization algorithm is based on a set of user-defined rules to determine the best query access strategy.

Performance Transparency andQuery Optimization

Performance Transparency andQuery Optimization

Page 31: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Distributed Database DesignDistributed Database Design

The design of a distributed database introduces three new issues:

How to partition the database into fragments.

Which fragments to replicate.

Where to locate those fragments and replicas.

Page 32: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data FragmentationData Fragmentation

Data fragmentation allows us to break a single object into two or more segments or fragments.

Each fragment can be stored at any site over a computer network.

Data fragmentation information is stored in the distributed data catalog (DDC), from which it is accessed by the transaction processor (TP) to process user requests.

Three Types of Fragmentation Strategies: Horizontal fragmentation Vertical fragmentation Mixed fragmentation

Page 33: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

A Sample CUSTOMER Table

Figure 10.16

Page 34: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data FragmentationData Fragmentation

Horizontal FragmentationDivision of a relation into subsets (fragments) of tuples (rows). Each fragment is stored at a different node, and each fragment has unique rows. Each fragment represents the equivalent of a SELECT statement, with the WHERE clause on a single attribute.

Table 10.3 Horizontal Fragmentation of the CUSTOMER Table By State

Page 35: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Table Fragments In Three Locations

Figure 10.17

Page 36: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data FragmentationData Fragmentation

Vertical FragmentationDivision of a relation into attribute (column) subsets. Each subset (fragment) is stored at a different node, and each fragment has unique columns -- with the exception of the key column. This is the equivalent of the PROJECT statement.

Table 10.4 Vertical Fragmentation of the CUSTOMER Table

Page 37: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Vertically Fragmented Table Contents

Figure 10.18

Page 38: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data FragmentationData Fragmentation

Mixed FragmentationCombination of horizontal and vertical strategies. A table may be divided into several horizontal subsets (rows), each one having a subset of the attributes (columns).

Page 39: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Table 10.5 Mixed Fragmentation of the CUSTOMER Table

Page 40: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Figure 10.19

Page 41: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data ReplicationData Replication

Data replication refers to the storage of data copies at multiple sites served by a computer network. Fragment copies can be stored at several sites to serve

specific information requirements. The existence of fragment copies can enhance data

availability and response time, reducing communication and total query costs.

Figure 10.20

Page 42: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data ReplicationData Replication

Mutual Consistency Rule Replicated data are subject to the mutual consistency

rule, which requires that all copies of data fragments be identical.

DDBMS must ensure that a database update is performed at all sites where replicas exist.

Data replication imposes additional DDBMS processing overhead.

Page 43: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data ReplicationData Replication

Replication Conditions A fully replicated database stores multiple copies of all

database fragments at multiple sites. A partially replicated database stores multiple copies of

some database fragments at multiple sites.

Factors for Data Replication Decision Database Size Usage Frequency

Page 44: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data AllocationData Allocation

Data allocation describes the processing of deciding where to locate data.

Data Allocation Strategies Centralized

The entire database is stored at one site.

PartitionedThe database is divided into several disjoint parts (fragments) and stored at several sites.

ReplicatedCopies of one or more database fragments are stored at several sites.

Page 45: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Data AllocationData Allocation

Data allocation algorithms take into consideration a variety of factors: Performance and data availability goals

Size, number of rows, the number of relations that an entity maintains with other entities.

Types of transactions to be applied to the database, the attributes accessed by each of those transactions.

Page 46: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Client/Server vs. DDBMSClient/Server vs. DDBMS

Client/server architecture refers to the way in which computers interact to form a system.

It features a user of resources or a client and a provider of resources or a server.

The architecture can be used to implement a DBMS in which the client is the transaction processor (TP) and the server is the data processor (DP).

Page 47: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Client/Server ArchitectureClient/Server Architecture

Client/Server Advantages Client/server solutions tend to be less expensive.

Client/server solutions allow the end user to use the microcomputer’s graphical user interface (GUI), thereby improving functionality and simplicity.

There are more people with PC skills than with mainframe skills.

The PC is well established in the workplace.

Numerous data analysis and query tools exist to facilitate interaction with many of the DBMSs.

There are considerable cost advantages to off-loading application development from the mainframe to PCs.

Page 48: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

Client/Server ArchitectureClient/Server Architecture

Client/Server Disadvantages The client/server architecture creates a more complex

environment with different platforms.

An increase in the number of users and processing sites often paves the way for security problems.

The burden of training a wider circle of users and computer personnel increases the cost of maintaining the environment.

Page 49: The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition

C. J. Date’s 12 Commandments for Distributed Database

C. J. Date’s 12 Commandments for Distributed Database

1.Local Site Independence

2.Central Site Independence

3.Failure Independence

4.Location Transparency

5.Fragmentation Transparency

6.Replication Transparency

7.Distributed Query Processing

8.Distributed Transaction Processing

9.Hardware Independence

10.Operating System Independence

11.Network Independence

12.Database Independence