data grid, cloud and vertical rdbms
DESCRIPTION
Data Grid, Cloud and Vertical RDBMS. Presenter: Dipesh Gautam. Overview. Introduction Why Data Grid? High Level View Design Considerations Data Grid Services Topology Grids and Cloud Convergence of Grid and Cloud Vertical RDBMS Benefits of column-oriented layout. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Data Grid, Cloud and Vertical RDBMS
Presenter:Dipesh Gautam
2
Introduction Why Data Grid? High Level View Design Considerations Data Grid Services Topology Grids and Cloud Convergence of Grid and Cloud Vertical RDBMS Benefits of column-oriented layout
Overview
3
Data Grid: an architecture or set of services that enable individual or group of users ability to access and transact large amounts of geographically distributed data.
The data may be replicated throughout the grid outside the original administrative domain of the data.
The integration between users and the data are handled and controlled by the data grid middleware.
Introduction
4
Large dataset size Geographic distribution of users and
resources Computationally intensive analysis No other architecture exists that allows us
to apply technologies in large scale application domains
Why Data Grid?
5
A High Level View
6
Mechanism Neutrality ◦ Designed to be as independent as possible of low level
mechanisms ◦ Defining interfaces that sum up oddness of specific storage
systems. Compatibility with Grid Infrastructure
◦ Take advantage of fundamental Grid infrastructure ◦ Compatible with lower level Grid mechanisms
Uniformity of Information Infrastructure ◦ The same data model and interface used to access the
grids metadata
Design Considerations
7
Middleware provides following services:◦ Universal namespace◦ Data transport service◦ Data access service◦ Data replication service◦ Resource management system(RMS)
Data Grid Services
8
Number of systems and networks are connected within a grid
Different file naming conventions of separate systems within grid
Physical file names merely do not address the problem locating the data.
Universal namespace provides logical file names Storage Resource Broker provides service to map
between logical and physical file names Upon requesting logical file names, all matching
physical file names are returned and the end user chose appropriate replica
Why Universal namespace?
9
Middleware service for data transfer The atomicity of the requested data transfer ensures the fault
tolerant service◦ Data transfer is resumed after each interruption until all requested data is
receive◦ Many possible strategies:
Starting the entire transmission from the beginning Resuming from the point of interruption. E.g: GridFTP sends data from the last
acknowledged byte without starting the entire transfer from the beginning. Provides service for low-level access and connection between
hosts for file transfer Provides I/O functions that allow user to see remote files as if
they were local to their system Provides high level abstraction of the access and transfer of data
between different systems hiding the complexity and presenting user as a unified data source
Data Transport Service
10
Work with data transport service to provide security, access control and management of data transfer within the grid
Provides security service to authenticate users Provides authorization service to control access
by simple file permission to Access Control Lists (ACLs), Role-Based Access control
Provides encryption service to protect the confidentiality of the data transport (e.g SSL )
Data access service
11
Why replication?◦ Scalability◦ Fast access◦ User collaboration
Replicas are often placed close to the sites where users need them
Replication is controlled by a replica management system
Replica management system determines the needs of replicas based on the requests
Timely update of the replica is performed by propagating the changes in some node to all the nodes in the grid
Data replication service
12
Centralized model: single master replica updates all others
Decentralized model: all peers update each other
The topology of node placement influence update strategy
Replica update
13
Static replication◦ Uses a fixed replica set of nodes with no dynamic changes to the files being replicated
Dynamic replication◦ based on popularity of data◦ If request exceeds the replication threshold, the replica is placed on the server that
directly services the client provided that the storage is available◦ Dynamic deletion of replicas that have null access value
Adaptive replication◦ The dynamic threshold is computed based on request arrival rates from clients over a
period of time◦ The replicas with lower threshold and were not created in the current replication interval
can be removed Fair-share replication
◦ Based on access load and storage load of candidate servers◦ Server with less access load is selected for replication as the replicated in server with
more access load degrades the performance for all clients◦ Among the candidate servers with same access load, server with less storage load is
selected Lot more replication placement strategy exists
Replica Placement
14
Core functionality of data grid Manages all the actions related to storage resources Fulfils user and application requests for data
resources based on type of request and policies Schedules creation of replicas Enforces policy and security within the data grid
resources by including authentication, authorization and access support systems with different administrative policies to inter-operate
Enforces system fault tolerance and stability requirements
Resource management system(RMS)
15
Various topologies have been used to address need of the scientific community
Four major types of topologies◦ Federation topology◦ Monadic topology◦ Hierarchical topology◦ Hybrid topology
Topology
16
Allows each institution control over their data
The institution who receives request from authorized institution determines whether to send data to the requesting institution
The federation could be loosely or tightly integrated
Preferred by the institutions that wish to share data from already existing systems
Federation Topology
17
All the collected data is fed into a central repository
Central repository responds to all queries for data
No replicas in the topology This topology is well
suited when all access to the data is local or within a single region with high speed connectivity
Monadic topology
18
Suited for collaborating data from single source to distributed multiple locations around the world
Hierarchical Topology
19
Any combination of other topologies
Suited for researches working on projects want to share their results to further research by making it readily available for collaboration
Hybrid Topology
20
Grid◦ Grid refers for distributed
computing in science and engineering
◦ In grid computing, virtual organizations share computer resources over a network
◦ Scientific research , collaboration
◦ Share local resources◦ Heterogeneous , real
resource◦ Geographically distributed,
locally owned and managed
Grids and Clouds• Cloud
– Cloud refers for a computer network in the context of network management
– In cloud computing anybody can access data and compute services over the internet
– Web services, business apps– Make huge data centers
available– Homogeneous virtualized
resources– Geographically distributed,
centrally owned and managed
21
Interoperability standards among the service providers of both grid and cloud should be considered by the user
Interoperating cloud looks like grid
Convergence of Grid and Cloud
22
Column-Oriented DBMS◦ Store data column wise instead of row wise
◦ In row oriented DBMS the values on the rows are serialized and stored in memory as:1, Smith, Joe, 40000;2, Jones, Mary, 50000;3, Johnson, Cathy, 44000;
◦ In column oriented DBMS the columns are serialized as:◦ 1, 2, 3;
Smith, Jones, Johnson;Joe, Mary, Cathy;40000, 50000, 44000;
Vertical RDBMS
EmpId Lastname Firstname Salary1 Smith Joe 400002 Jones Mary 500003 Johnson Cathy 44000
23
Efficient when aggregate needs to be computed over many rows but only for notably smaller subset of columns
Efficient in writing a column when new values of column for all rows are supplied at once
Suite for Online Analytical Processing(OLAP) like workloads which involve a smaller number of highly complex queries over all data of terabyte size.
Benefits of Column-Oriented layout
24
http://en.wikipedia.org/wiki/Data_grid http://www.globus.org/toolkit/about.html Martin Antony Walker, Grids and Clouds,
http://www.ogf.org/OGF25/materials/1500/Grids+and+Clouds+OGF25+MAW.pdf
http://staff.science.uva.nl/~adam/courses/2004/documents/Course-DataGrid.ppt
http://en.wikipedia.org/wiki/Column-oriented_DBMS
References