data grid, cloud and vertical rdbms

24
Data Grid, Cloud and Vertical RDBMS Presenter: Dipesh Gautam

Upload: hakan

Post on 22-Mar-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Data Grid, Cloud and Vertical RDBMS. Presenter: Dipesh Gautam. Overview. Introduction Why Data Grid? High Level View Design Considerations Data Grid Services Topology Grids and Cloud Convergence of Grid and Cloud Vertical RDBMS Benefits of column-oriented layout. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Grid, Cloud and Vertical RDBMS

Data Grid, Cloud and Vertical RDBMS

Presenter:Dipesh Gautam

Page 2: Data Grid, Cloud and Vertical RDBMS

2

Introduction Why Data Grid? High Level View Design Considerations Data Grid Services Topology Grids and Cloud Convergence of Grid and Cloud Vertical RDBMS Benefits of column-oriented layout

Overview

Page 3: Data Grid, Cloud and Vertical RDBMS

3

Data Grid: an architecture or set of services that enable individual or group of users ability to access and transact large amounts of geographically distributed data.

The data may be replicated throughout the grid outside the original administrative domain of the data.

The integration between users and the data are handled and controlled by the data grid middleware.

Introduction

Page 4: Data Grid, Cloud and Vertical RDBMS

4

Large dataset size Geographic distribution of users and

resources Computationally intensive analysis No other architecture exists that allows us

to apply technologies in large scale application domains

Why Data Grid?

Page 5: Data Grid, Cloud and Vertical RDBMS

5

A High Level View

Page 6: Data Grid, Cloud and Vertical RDBMS

6

Mechanism Neutrality ◦ Designed to be as independent as possible of low level

mechanisms ◦ Defining interfaces that sum up oddness of specific storage

systems. Compatibility with Grid Infrastructure

◦ Take advantage of fundamental Grid infrastructure ◦ Compatible with lower level Grid mechanisms

Uniformity of Information Infrastructure ◦ The same data model and interface used to access the

grids metadata

Design Considerations

Page 7: Data Grid, Cloud and Vertical RDBMS

7

Middleware provides following services:◦ Universal namespace◦ Data transport service◦ Data access service◦ Data replication service◦ Resource management system(RMS)

Data Grid Services

Page 8: Data Grid, Cloud and Vertical RDBMS

8

Number of systems and networks are connected within a grid

Different file naming conventions of separate systems within grid

Physical file names merely do not address the problem locating the data.

Universal namespace provides logical file names Storage Resource Broker provides service to map

between logical and physical file names Upon requesting logical file names, all matching

physical file names are returned and the end user chose appropriate replica

Why Universal namespace?

Page 9: Data Grid, Cloud and Vertical RDBMS

9

Middleware service for data transfer The atomicity of the requested data transfer ensures the fault

tolerant service◦ Data transfer is resumed after each interruption until all requested data is

receive◦ Many possible strategies:

Starting the entire transmission from the beginning Resuming from the point of interruption. E.g: GridFTP sends data from the last

acknowledged byte without starting the entire transfer from the beginning. Provides service for low-level access and connection between

hosts for file transfer Provides I/O functions that allow user to see remote files as if

they were local to their system Provides high level abstraction of the access and transfer of data

between different systems hiding the complexity and presenting user as a unified data source

Data Transport Service

Page 10: Data Grid, Cloud and Vertical RDBMS

10

Work with data transport service to provide security, access control and management of data transfer within the grid

Provides security service to authenticate users Provides authorization service to control access

by simple file permission to Access Control Lists (ACLs), Role-Based Access control

Provides encryption service to protect the confidentiality of the data transport (e.g SSL )

Data access service

Page 11: Data Grid, Cloud and Vertical RDBMS

11

Why replication?◦ Scalability◦ Fast access◦ User collaboration

Replicas are often placed close to the sites where users need them

Replication is controlled by a replica management system

Replica management system determines the needs of replicas based on the requests

Timely update of the replica is performed by propagating the changes in some node to all the nodes in the grid

Data replication service

Page 12: Data Grid, Cloud and Vertical RDBMS

12

Centralized model: single master replica updates all others

Decentralized model: all peers update each other

The topology of node placement influence update strategy

Replica update

Page 13: Data Grid, Cloud and Vertical RDBMS

13

Static replication◦ Uses a fixed replica set of nodes with no dynamic changes to the files being replicated

Dynamic replication◦ based on popularity of data◦ If request exceeds the replication threshold, the replica is placed on the server that

directly services the client provided that the storage is available◦ Dynamic deletion of replicas that have null access value

Adaptive replication◦ The dynamic threshold is computed based on request arrival rates from clients over a

period of time◦ The replicas with lower threshold and were not created in the current replication interval

can be removed Fair-share replication

◦ Based on access load and storage load of candidate servers◦ Server with less access load is selected for replication as the replicated in server with

more access load degrades the performance for all clients◦ Among the candidate servers with same access load, server with less storage load is

selected Lot more replication placement strategy exists

Replica Placement

Page 14: Data Grid, Cloud and Vertical RDBMS

14

Core functionality of data grid Manages all the actions related to storage resources Fulfils user and application requests for data

resources based on type of request and policies Schedules creation of replicas Enforces policy and security within the data grid

resources by including authentication, authorization and access support systems with different administrative policies to inter-operate

Enforces system fault tolerance and stability requirements

Resource management system(RMS)

Page 15: Data Grid, Cloud and Vertical RDBMS

15

Various topologies have been used to address need of the scientific community

Four major types of topologies◦ Federation topology◦ Monadic topology◦ Hierarchical topology◦ Hybrid topology

Topology

Page 16: Data Grid, Cloud and Vertical RDBMS

16

Allows each institution control over their data

The institution who receives request from authorized institution determines whether to send data to the requesting institution

The federation could be loosely or tightly integrated

Preferred by the institutions that wish to share data from already existing systems

Federation Topology

Page 17: Data Grid, Cloud and Vertical RDBMS

17

All the collected data is fed into a central repository

Central repository responds to all queries for data

No replicas in the topology This topology is well

suited when all access to the data is local or within a single region with high speed connectivity

Monadic topology

Page 18: Data Grid, Cloud and Vertical RDBMS

18

Suited for collaborating data from single source to distributed multiple locations around the world

Hierarchical Topology

Page 19: Data Grid, Cloud and Vertical RDBMS

19

Any combination of other topologies

Suited for researches working on projects want to share their results to further research by making it readily available for collaboration

Hybrid Topology

Page 20: Data Grid, Cloud and Vertical RDBMS

20

Grid◦ Grid refers for distributed

computing in science and engineering

◦ In grid computing, virtual organizations share computer resources over a network

◦ Scientific research , collaboration

◦ Share local resources◦ Heterogeneous , real

resource◦ Geographically distributed,

locally owned and managed

Grids and Clouds• Cloud

– Cloud refers for a computer network in the context of network management

– In cloud computing anybody can access data and compute services over the internet

– Web services, business apps– Make huge data centers

available– Homogeneous virtualized

resources– Geographically distributed,

centrally owned and managed

Page 21: Data Grid, Cloud and Vertical RDBMS

21

Interoperability standards among the service providers of both grid and cloud should be considered by the user

Interoperating cloud looks like grid

Convergence of Grid and Cloud

Page 22: Data Grid, Cloud and Vertical RDBMS

22

Column-Oriented DBMS◦ Store data column wise instead of row wise

◦ In row oriented DBMS the values on the rows are serialized and stored in memory as:1, Smith, Joe, 40000;2, Jones, Mary, 50000;3, Johnson, Cathy, 44000;

◦ In column oriented DBMS the columns are serialized as:◦ 1, 2, 3;

Smith, Jones, Johnson;Joe, Mary, Cathy;40000, 50000, 44000;

Vertical RDBMS

EmpId Lastname Firstname Salary1 Smith Joe 400002 Jones Mary 500003 Johnson Cathy 44000

Page 23: Data Grid, Cloud and Vertical RDBMS

23

Efficient when aggregate needs to be computed over many rows but only for notably smaller subset of columns

Efficient in writing a column when new values of column for all rows are supplied at once

Suite for Online Analytical Processing(OLAP) like workloads which involve a smaller number of highly complex queries over all data of terabyte size.

Benefits of Column-Oriented layout