distributed database management system
Post on 01-Jan-2016
37 Views
Preview:
DESCRIPTION
TRANSCRIPT
Distributed Database Distributed Database Management System Management System Distributed Database Distributed Database Management System Management System
Lecture 8Lecture 8
BZUPAGES.COM 2
SemijoinSemijoinSemijoinSemijoin
• Subset of tuples of R that Subset of tuples of R that participate in join of R with Sparticipate in join of R with S
R S = F R AABB (S)F
BZUPAGES.COM 3
DivisionDivisionDivisionDivision
• The division of R with degree r The division of R with degree r with S with degree s is the set of with S with degree s is the set of (r-s)-tuples t such that (r-s)-tuples t such that
BZUPAGES.COM 4
for all s-tuples in , the tuple tu is in for all s-tuples in , the tuple tu is in R.R.
RS = AA(R)- (R)- AA((AA(R) xS)-R)(R) xS)-R)
for all s-tuples in , the tuple tu is in for all s-tuples in , the tuple tu is in R.R.
BZUPAGES.COM 5
eNoeNo pNopNo pNamepName budgetbudget
E1E1
E2E2E1E1
E4E4
P1P1
P1P1
P3P3
P2P2
BridgeBridge
BridgeBridge
TowerTower
MosqueMosque
11.5m11.5m
11.5m11.5m
10.2m10.2m
9.1m9.1m
pNopNo pNamepName budgetbudget
P1P1
P3P3
BridgeBridge
TowerTower
11.5m11.5m
10.2m10.2m
R
S
The employees who work in all projects more than 10M budget
BZUPAGES.COM 7
• Rather than how to obtain results, Rather than how to obtain results, we say what the result is by we say what the result is by specifying relationship between specifying relationship between datadata
BZUPAGES.COM 8
Tuple Relational CalculusTuple Relational Calculus
• Based on first-order predicate Based on first-order predicate logiclogic
• Expressed asExpressed as
{ t | F(t)} { t | F(t)}
• Where t is a tuple variable and F is Where t is a tuple variable and F is
well-formed formulawell-formed formula
BZUPAGES.COM 9
Find the set of all tuples t such that Find the set of all tuples t such that F(t) is true, where F implies the F(t) is true, where F implies the predicate condition.predicate condition.
BZUPAGES.COM 10
Atomic FormulaAtomic FormulaAtomic FormulaAtomic Formula• Tuple-variable membership expression: Tuple-variable membership expression:
specified as R(t) or R.tspecified as R(t) or R.t• ConditionsConditions
– s[A] Θ t[B]s[A] Θ t[B]– S[A] Θ cS[A] Θ c
BZUPAGES.COM 11
SQLSQLSQLSQL
• Language based on Tuple-Language based on Tuple-oriented Calculsoriented Calculs
BZUPAGES.COM 12
ExampleExampleExampleExample
• Select EMP.eName, DEP.dName Select EMP.eName, DEP.dName from EMP, DEP where EMP.dNo = from EMP, DEP where EMP.dNo = DEP.dNoDEP.dNo
BZUPAGES.COM 13
Domain Relational CalculusDomain Relational CalculusDomain Relational CalculusDomain Relational Calculus
• Domain variable ranges over Domain variable ranges over values in a domain and specifies values in a domain and specifies a tuplea tuple
BZUPAGES.COM 14
• A query in DRCA query in DRCxx11,x,x22,….x,….xnn|F(x|F(x11,….x,….xnn))where F is a wff and Xs are free variableswhere F is a wff and Xs are free variables• Implementation: QBEImplementation: QBE• Query by Example (QBE) is a method of Query by Example (QBE) is a method of
creating database queries using examples creating database queries using examples based on a text string, the name of a based on a text string, the name of a document or a list of documents. The QBE document or a list of documents. The QBE system converts the user input into a formal system converts the user input into a formal database query. This approach allows the database query. This approach allows the user to perform powerful searches without the user to perform powerful searches without the need of having to learn a more formalized need of having to learn a more formalized query mechanism such as Structured Query query mechanism such as Structured Query Language (SQL). Language (SQL).
Interface with Programming LanguagesInterface with Programming LanguagesInterface with Programming LanguagesInterface with Programming Languages
BZUPAGES.COM 16
• Tightly CoupledTightly Coupled–Programming Language and Programming Language and
database languages are mergeddatabase languages are merged
• Loosely CoupledLoosely Coupled–PL is extended with special conceptsPL is extended with special concepts
BZUPAGES.COM 18
A computer network is a system A computer network is a system for communication between two for communication between two or more computersor more computers
BZUPAGES.COM 19
• Computers areComputers are–Interconnected Interconnected –AutonomousAutonomous
• NWing involvesNWing involves–Hardware componentsHardware components–Software componentsSoftware components
BZUPAGES.COM 20
• Computers are called Nodes, sites, Computers are called Nodes, sites, hosts, in general node or host is hosts, in general node or host is meant the hardware and site means meant the hardware and site means hw+swhw+sw
• Other equipment at nodes is also Other equipment at nodes is also possible like Printers, Disks etc.possible like Printers, Disks etc.
• Equipment connected via links and Equipment connected via links and channels, link is a physical thing channels, link is a physical thing where as channel is logical onewhere as channel is logical one
BZUPAGES.COM 22
• Comm links carry data in form of Comm links carry data in form of Digital or analog signalsDigital or analog signals
• Each channel has a certain Each channel has a certain capacity, that is capability of capacity, that is capability of transmitting data over a certain transmitting data over a certain time unittime unit
• This capacity is referred as This capacity is referred as bandwidthbandwidth
BZUPAGES.COM 23
• Data transmitted on analog links Data transmitted on analog links is to be is to be Modulated Modulated which is done which is done by changing three basic by changing three basic properties carrier signalproperties carrier signal
• At the receiving end it has to be At the receiving end it has to be DemodulatedDemodulated
• Modem is the device that Modem is the device that performs this taskperforms this task
BZUPAGES.COM 24
• Multiplexing is the technique that Multiplexing is the technique that allows multiple signals to be allows multiple signals to be transmitted over the same line transmitted over the same line simultaneously simultaneously
• Two types FDM, TDMTwo types FDM, TDM
BZUPAGES.COM 25
Mode of OperationMode of OperationMode of OperationMode of Operation
• Simplex: link operates in only one Simplex: link operates in only one direction, like printersdirection, like printers
• Half Duplex: can transmit in both Half Duplex: can transmit in both directions, but not simultaneously, directions, but not simultaneously, link has to be “turned around”link has to be “turned around”
• Full Duplex: Simultaneously both Full Duplex: Simultaneously both waysways
BZUPAGES.COM 26
Performance of a Communication Performance of a Communication SystemSystem
Performance of a Communication Performance of a Communication SystemSystem
• BandwidthBandwidth
• Mode of OperationMode of Operation
• Software employedSoftware employed–Redundancies within messageRedundancies within message
–Headers and trailers with the Headers and trailers with the messagemessage
BZUPAGES.COM 27
Header Block ErrorBlock ErrorCheckCheck
TextText
•Source Address•Destination Address•Message Number•Packet Number•Acknowledgement•Control Information
BZUPAGES.COM 29
Classification CriteriaClassification CriteriaClassification CriteriaClassification Criteria
• Interconnection Structure Interconnection Structure (Topology)(Topology)
• Transmission ModeTransmission Mode
• Geo. Distribution (Scale)Geo. Distribution (Scale)
Topology based Topology based ClassificationClassification
Topology based Topology based ClassificationClassification
BZUPAGES.COM 32
StarStarStarStar
• All communication via a central All communication via a central nodenode
• Excessive load on the central Excessive load on the central nodenode
• Disadvantage is that if the central Disadvantage is that if the central node fails all network goes downnode fails all network goes down
BZUPAGES.COM 34
Ring NetworkRing NetworkRing NetworkRing Network
• Computers connected with Computers connected with transmission media in the form of transmission media in the form of looploop
• Each station also serves as a Each station also serves as a repeater, it repeats the signal that it repeater, it repeats the signal that it receivesreceives
• Control is generally managed via a Control is generally managed via a TokenToken
BZUPAGES.COM 35
• A token is circulated on the A token is circulated on the around the network, with certain around the network, with certain bit pattern to indicate the network bit pattern to indicate the network is freeis free
• Any site wanting to communicate Any site wanting to communicate grabs the token, sets it to busy grabs the token, sets it to busy and then sends the messageand then sends the message
• When communication is over, the When communication is over, the site again sets token to free BPsite again sets token to free BP
BZUPAGES.COM 36
• To improve the reliability a double To improve the reliability a double loop topology has been proposed loop topology has been proposed that lessens the chance of that lessens the chance of network failure in case of a single network failure in case of a single node breakage node breakage
BZUPAGES.COM 38
Bus TopologyBus TopologyBus TopologyBus Topology
• Common channel used to Common channel used to transmit and receive datatransmit and receive data
• Link control is performed asLink control is performed as–CSMACSMA–CSMA/CDCSMA/CD
• In addition, token can also be In addition, token can also be usedused
BZUPAGES.COM 39
CSMA/CDCSMA/CDCSMA/CDCSMA/CD
• Behave in CSMA, except that node Behave in CSMA, except that node keep listening to bus after even they keep listening to bus after even they have transmittedhave transmitted
• The purpose is to detect if some The purpose is to detect if some collision has occurredcollision has occurred
• Collision occurs when multiple sites Collision occurs when multiple sites try to transmit at the same timetry to transmit at the same time
BZUPAGES.COM 40
• When collision is detected, sites When collision is detected, sites abort the transmissions, wait for abort the transmissions, wait for an arbitrary time and re-transmit an arbitrary time and re-transmit the messagethe message
BZUPAGES.COM 42
Meshed NetworkMeshed NetworkMeshed NetworkMeshed Network
• Every computer connected with Every computer connected with every other. every other.
• Gives maximum reliability, but is Gives maximum reliability, but is not practicable even for not a very not practicable even for not a very large networklarge network
BZUPAGES.COM 44
• Point to Point (unicast)Point to Point (unicast)
• Broadcast (multi-point networks)Broadcast (multi-point networks)
BZUPAGES.COM 45
Point to PointPoint to PointPoint to PointPoint to Point
• One or more links between One or more links between sender and receiversender and receiver
• Link may be direct or via Link may be direct or via intermediate connectionsintermediate connections
• Sender and receiver’s addresses Sender and receiver’s addresses are placed in the header of the are placed in the header of the messagemessage
BZUPAGES.COM 46
• The intermediate nodes check the The intermediate nodes check the destination address in the destination address in the message header, if not for them message header, if not for them transmit to next intermediate nodetransmit to next intermediate node
• Communication medium is Communication medium is generally Coaxial, Twisted Pair or generally Coaxial, Twisted Pair or the Fibre optic cablesthe Fibre optic cables
BZUPAGES.COM 47
Broadcast NetworksBroadcast NetworksBroadcast NetworksBroadcast Networks
• Common channel utilized by all Common channel utilized by all nodesnodes
• Message received by all, Message received by all, ownership checkedownership checked
• Multicasting: message sent to a Multicasting: message sent to a certain subset of nodes in nwcertain subset of nodes in nw
• Generally Radio or Satellite basedGenerally Radio or Satellite based
BZUPAGES.COM 48
• In Satellite based, each site In Satellite based, each site beams transmission to satellitebeams transmission to satellite
• That beams it back at a different That beams it back at a different frequencyfrequency
• Broadcast Networks can also use Broadcast Networks can also use Microwave that can be over Microwave that can be over Satellite or Terrestrial Satellite or Terrestrial
BZUPAGES.COM 50
• Local Area NWLocal Area NW
• Metropolitan Area NWMetropolitan Area NW
• Wide Area NWWide Area NW
• Distinction between them is Distinction between them is blurred, still they existblurred, still they exist
• Major categorization is probably Major categorization is probably Protocols, to be discussed nextProtocols, to be discussed next
BZUPAGES.COM 51
WANsWANsWANsWANs
• Used Inter-City, country or even Used Inter-City, country or even continentalcontinental
• Gives low bandwidth, high latency Gives low bandwidth, high latency due to different switching, due to different switching, equipment and transmission equipment and transmission mediummedium
BZUPAGES.COM 52
• Can be Broadcast and Point to Can be Broadcast and Point to PointPoint
• In Point to PointIn Point to Point–Circuit Switching: generally used in Circuit Switching: generally used in
telephone connections, connection telephone connections, connection between sender and receiver is between sender and receiver is maintained till the end of maintained till the end of communicationcommunication
–Packet SwitchingPacket Switching
BZUPAGES.COM 53
• In Packet switchingIn Packet switching–Message is broken into packets, Message is broken into packets,
each packet transmitted each packet transmitted individually, and may take different individually, and may take different route but to the same destinationroute but to the same destination
–May reach out of order, destination May reach out of order, destination will have to sort them into original will have to sort them into original order order
BZUPAGES.COM 54
Advantages of Packet SwitchingAdvantages of Packet SwitchingAdvantages of Packet SwitchingAdvantages of Packet Switching
• Higher utilization of link, since it is Higher utilization of link, since it is not dedicated for a certain not dedicated for a certain communicationcommunication
• Computer communication is Computer communication is bursty in nature not continuous, bursty in nature not continuous, meanwhile others can use the linkmeanwhile others can use the link
• Message can be sent in parallelMessage can be sent in parallel
BZUPAGES.COM 55
LANsLANsLANsLANs
• Small geographical area (usu. 2 km)
• High bandwidth• Low latency• Technology
– Mainly Ethernet, now 100/1000Mbps
BZUPAGES.COM 56
MANsMANsMANsMANs
• Between LAN and WANBetween LAN and WAN
• Cover city or portionCover city or portion
• Larger LANsLarger LANs
BZUPAGES.COM 59
•Connecting computers is not enough to establish communication
•Requires software systems called protocols
•Set of rules and formats for exchanging data, arranged into layers called protocol suite/ stack.
BZUPAGES.COM 60
• WAN faces max heterogeneity, of WAN faces max heterogeneity, of varying equipment, word length, varying equipment, word length, speed, coding scheme etc.speed, coding scheme etc.
• Needs protocols than othersNeeds protocols than others• Most widely known WAN protocol Most widely known WAN protocol
is based ISO/OSI architecture is based ISO/OSI architecture (International Standards (International Standards Organization, Open Systems inter Organization, Open Systems inter connection connection
BZUPAGES.COM 61
ISO/OSI ArchitectureISO/OSI ArchitectureISO/OSI ArchitectureISO/OSI Architecture
• Network built in seven layersNetwork built in seven layers• Interfaces for passing information Interfaces for passing information
b/w layersb/w layers• Protocols between corresponding Protocols between corresponding
layers at different siteslayers at different sites• Lower three layers form Comm. Lower three layers form Comm.
Subnet, responsible for providing Subnet, responsible for providing reliable physical communication reliable physical communication
BZUPAGES.COM 63
TCP/IP ArchitectureTCP/IP ArchitectureTCP/IP ArchitectureTCP/IP Architecture
• Another popular ArchitectureAnother popular Architecture
• Five layers Five layers
• Standardization is specified by IEEE Standardization is specified by IEEE Committee 802, who has specified Committee 802, who has specified different standards for different different standards for different protocolsprotocols
BZUPAGES.COM 65
Architecture of a systems Architecture of a systems defines its structure, means, the defines its structure, means, the components of the systems, components of the systems, function performed by each function performed by each component and the relationship component and the relationship among componentsamong components
BZUPAGES.COM 66
• Three major architectures of Three major architectures of DDBMS discussedDDBMS discussed–Peer to peerPeer to peer
–Client/ServerClient/Server
–MultdatabaseMultdatabase
• These are idealized architectures, These are idealized architectures, practical installations may warypractical installations may wary
BZUPAGES.COM 67
DBMS Standardization
A conceptual framework whose purpose is to divide standardization work into manageable pieces and to show at a general level how these pieces are related to one another. Approaches
BZUPAGES.COM 68
• Component-based Components of the system are defined together with the interrelationships between components. Good for design and implementation of the system. However it is difficult to determine functionality of system by seeing its individual component
BZUPAGES.COM 69
• Function-based Classes of users are identified together with the functionality that the system will provide for each class. The objectives of the system are clearly identified. But how do you achieve these objectives?
BZUPAGES.COM 70
• Data-based Identify the different types of describing data and specify the functional units that will realize and/or use data according to these views
BZUPAGES.COM 71
Practically Practically Practically Practically
• Every aspect has to be consideredEvery aspect has to be considered• These Classification schemes are These Classification schemes are
Orthogonal Orthogonal
• A committee for the DBMS A committee for the DBMS standardization was established in standardization was established in 1972 by ANSI under SPARC 1972 by ANSI under SPARC (Standards Planning and (Standards Planning and Requirement Committee)Requirement Committee)
BZUPAGES.COM 72
• Published its initial report in 1975 Published its initial report in 1975 and then in 1977and then in 1977
• Its full name being Its full name being “ANSI/X3/SPARC DBMS “ANSI/X3/SPARC DBMS Framework”Framework”
• Mainly based on Data Mainly based on Data organizationorganization
BZUPAGES.COM 73
Reference ModelReference Model Reference ModelReference Model
Internal View
Conceptual View
External View
External View
External View
Internal Schema
Conceptual Schema
External Schema
Users
BZUPAGES.COM 74
Dimensions for DDBS ArchitectureDimensions for DDBS ArchitectureDimensions for DDBS ArchitectureDimensions for DDBS Architecture
Autonomy refers to the distribution of Autonomy refers to the distribution of control not of data. It indicates the control not of data. It indicates the degree to which individual DBMSs can degree to which individual DBMSs can operate independently. Types could operate independently. Types could be Design, Communication and be Design, Communication and Execution Autonomy. Degree of Execution Autonomy. Degree of Autonomy varies in different DDBS Autonomy varies in different DDBS architecturesarchitectures
BZUPAGES.COM 75
Distribution deals with data. Distribution deals with data. Logically, data appears to be Logically, data appears to be placed at a single place but placed at a single place but practically it may be spread at practically it may be spread at physically different locationsphysically different locations
BZUPAGES.COM 76
HeterogeneityHeterogeneity refers the refers the differences in hardware and differences in hardware and software among the individual software among the individual databases. Like different databases. Like different machines, OS, Data Models, machines, OS, Data Models, DBMSs, or query languagesDBMSs, or query languages
BZUPAGES.COM 78
Major DDBS Architectures-IMajor DDBS Architectures-IMajor DDBS Architectures-IMajor DDBS Architectures-I1.1.Client-Server ArchitectureClient-Server Architecture
– The term used in different meanings; The term used in different meanings; generally C and S refer to processes, may be generally C and S refer to processes, may be running at the same machinesrunning at the same machines
– In the context of DDBS both client and server In the context of DDBS both client and server are machines not processes (Fig 4.4) are machines not processes (Fig 4.4)
– Server performs most of the data Server performs most of the data Management;Management;
• Query Processing
• Transaction and Storage Management
– Client, mainly has Client, mainly has • application and user interface • a client module of DBMS• Data and lock management cached their sometimes
BZUPAGES.COM 79
Major DDBS Architectures-I & Major DDBS Architectures-I & IIII
Major DDBS Architectures-I & Major DDBS Architectures-I & IIII–Client passes user queries to server without Client passes user queries to server without
trying to understand or optimize them trying to understand or optimize them –One Server Multiple ClientsOne Server Multiple Clients–Multiple ServersMultiple Servers
• One Server at a time• Multiple Servers (transparently) at a time (A DDBS)
2- Peer to Peer Distributed Systems2- Peer to Peer Distributed Systems• Heterogeneous databases at each site, defining Heterogeneous databases at each site, defining
Local Internal SchemasLocal Internal Schemas• On top of that, Local Conceptual Schema, then On top of that, Local Conceptual Schema, then
the overall view is depicted by Global conceptual the overall view is depicted by Global conceptual schema that supports the External schemas (Fig schema that supports the External schemas (Fig 4.5)4.5)
BZUPAGES.COM 80
Interpreting user commands
and formats results
Checks if user query can be processed
Optimized execution strategy
Global queries to local ones
Coordinates distribution execution of user requests
BZUPAGES.COM 81
Chooses best access path to any data item
Makes sure the consistency of local data even in case of failure
Physically accesses the data as per the commands generated by query optimizer. Interacts with the OS
BZUPAGES.COM 82
A Multidatabase SystemA Multidatabase System• Provides access from multiple, Provides access from multiple,
autonomous heterogeneous, and autonomous heterogeneous, and distributed databases. Two Major distributed databases. Two Major architectures:architectures:
• Global Schema Architecture• Federated Schema Architecture
Major DDBS Architectures-IIIMajor DDBS Architectures-IIIMajor DDBS Architectures-IIIMajor DDBS Architectures-III
BZUPAGES.COM 83
Multidatabase Systems: Architectures ExternalSchema
ExternalSchema
GlobalSchema
ComponentSchema
ComponentSchema• • •
LocalSchema
LocalSchema• • •
Global Schema Architecture
Schema Translation
Schema Integration
FederatedSchema
FederatedSchema
ExportSchema
ComponentSchema
ComponentSchema• • •
LocalSchema
LocalSchema• • •
ExportSchema
ExportSchema
ExternalSchema
ExternalSchema
Federated Database Architecture
ExportSchema
BZUPAGES.COM 84
Global Directory IssuesGlobal Directory IssuesGlobal Directory IssuesGlobal Directory Issues
• A directory is a database that contains A directory is a database that contains data about data (meta-data). Called global data about data (meta-data). Called global directory in case of a DDBS. Three issues;directory in case of a DDBS. Three issues;
• A single large or local for each siteA single large or local for each site• Location; whether to keep at a single site Location; whether to keep at a single site
or distributed. or distributed. • Single copy or replicationSingle copy or replication• All three issues are orthogonal to each All three issues are orthogonal to each
otherother
• That concludes chapter 4, questions?That concludes chapter 4, questions?
BZUPAGES.COM 86
Distributed Database Options• Homogeneous - Same DBMS at each node.
– Autonomous - Independent DBMSs.
– Non-autonomous - Central , coordinating DBMS.
• Heterogeneous - Different DBMSs at different nodes.– Gateways - Simple paths are created to other databases
without the benefits of one logical database.
BZUPAGES.COM 87
Distributed Database OptionsDistributed Database OptionsDistributed Database OptionsDistributed Database Options
–Systems - Supports some or all of the Systems - Supports some or all of the functionality of one logical database.functionality of one logical database.• Full DBMS Functionality - All dist. Db functions.• Partial-Multi-database - Some dist. Db functions.
–Federated - Supports local Federated - Supports local databases for unique data requests.databases for unique data requests.
» Loose Integration - Local dbs have their own schemas.» Tight Integration - Local dbs use common schema.
–Unfederated - Requires all access to Unfederated - Requires all access to go through a central, coordinating go through a central, coordinating module.module.
BZUPAGES.COM 88
Homogeneous, Non-Homogeneous, Non-Autonomous Autonomous DatabaseDatabaseHomogeneous, Non-Homogeneous, Non-Autonomous Autonomous DatabaseDatabase
• Data is distributed across all the nodes.Data is distributed across all the nodes.• Same DBMS at each node.Same DBMS at each node.• All data is managed by the distributed All data is managed by the distributed
DBMS (no exclusively local data.)DBMS (no exclusively local data.)• All access is through one, global All access is through one, global
schema.schema.• The global schema is the union of all the The global schema is the union of all the
local schema.local schema.
BZUPAGES.COM 89
Focus on The Following Focus on The Following Heterogeneous EnvironmentHeterogeneous Environment
Focus on The Following Focus on The Following Heterogeneous EnvironmentHeterogeneous Environment
• Data distributed across all the nodes.Data distributed across all the nodes.• Different DBMSs may be used at Different DBMSs may be used at
each node.each node.• Local access is done using the local Local access is done using the local
DBMS and schema.DBMS and schema.• Remote access is done using the Remote access is done using the
global schema.global schema.
top related