dynamic data grid replication strategy based on internet hierarchy

18
Ajou University, South Korea GCC 2003 Presentation Dynamic Data Grid Dynamic Data Grid Replication Strategy based Replication Strategy based on Internet Hierarchy on Internet Hierarchy Sang Min Park , Jai-Hoon Kim, and Young-Bae Ko Ajou University South Korea

Upload: emil

Post on 02-Feb-2016

55 views

Category:

Documents


3 download

DESCRIPTION

Dynamic Data Grid Replication Strategy based on Internet Hierarchy. Sang Min Park , Jai-Hoon Kim, and Young-Bae Ko Ajou University South Korea. Contents. Introduction to Data Grid Optimizations in Data Grid Novel Replication Strategy based on Internet Hierarchy Simulation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea

GCC 2003 Presentation

Dynamic Data Grid Replication Dynamic Data Grid Replication Strategy based on Internet HierarchyStrategy based on Internet Hierarchy

Sang Min Park , Jai-Hoon Kim,

and Young-Bae Ko

Ajou University

South Korea

Page 2: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea2

GCC 2003 Presentation

ContentsContents

Introduction to Data Grid

Optimizations in Data Grid

Novel Replication Strategy based on Internet Hierarchy

Simulation

Simulation Results

Conclusions

Page 3: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea3

GCC 2003 Presentation

Introduction to Data GridIntroduction to Data Grid

Data Grid Motivations

Petabyte scale data production

Distributed data storage to store parts of data

Distributed computing resources which process the data

Two Most Important Approaches for Data Grid

Secure, reliable, and efficient data transport protocol

(ex. GridFTP)

Replication (ex. Replica catalog)

Replication

Large size files are partially replicated among sites

Reduce data access time

Application Scheduling, Dynamic replication issues are emerging

Page 4: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea4

GCC 2003 Presentation

Introduction to Data GridIntroduction to Data Grid

Typical Job Execution Scenario

CE SE

Grid site

CE SE

Grid site

SE

Grid site

CE

Grid site

CE SE

Grid site

CE SE

Grid site

Internet andData Grid

CE SE

Grid site

Initial replication

Job Distribution

User

Master Site

Large-sizeStorage

Job Broker

Data Fetch andReplication

Data fromExperiments (HEP,Bioinformatics, etc)

Page 5: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea5

GCC 2003 Presentation

Optimizations in Data Optimizations in Data GridGrid

Reducing the Overall Job Execution TimeScheduling Optimization

Deciding where to allocate the jobConsidering location of replicas and computational capabilities of sites

Short-term OptimizationDeciding from where to fetch replicasConsidering available network bandwidth between sites

Long-term Optimization (Dynamic Replication Strategy)

Shortage of storage in a siteDeciding which file should be remaining as a replicaBetter to replicate popular files because of its future usage

Page 6: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea6

GCC 2003 Presentation

Existing Dynamic Replication Existing Dynamic Replication StrategiesStrategies

Replica Optimization based on Site-level Locality

Replicate the file that is predicted to be used in future from the perspective of a site

Try to reduce the number of fetch

Delete Oldest, Delete LRU Method

Economic Strategy from European Data Grid

Developing OptorSim –Data Grid Optimization Simulator

Using Auction Protocol to trigger Long-term Optimization

Site-level Locality based on File access patterns

Page 7: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea7

GCC 2003 Presentation

Existing Dynamic Replication Existing Dynamic Replication StrategiesStrategiesThe Limitations of the site-level optimization

A Site certainly have limitations of their storage size, which means that the rate of data request locality is also limitedThere should be predictable file access patterns, but we do not know if there will be.

Page 8: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea8

GCC 2003 Presentation

Replication Strategy based Replication Strategy based on Bandwidth Hierarchy on Bandwidth Hierarchy (BHR)(BHR)Network-level Locality

A site is not the only possible source of localityAnother source of locality : Network-level locality

If the replica is located in a close site, not long delay would be taken to fetch this replica

Fast Replica Fast Replica TransmissionTransmission

Slow Replica Slow Replica TransmissionTransmission

Network Region Network Region (e.g., a country)(e.g., a country)

Page 9: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea9

GCC 2003 Presentation

Replication Strategy based on Replication Strategy based on Bandwidth Hierarchy (BHR)Bandwidth Hierarchy (BHR)Bandwidth Hierarchy

Grid Site

Router

Network Region Network Region

NetworkRegion

Contendingnetwork traffic

Data moving path betweensites within region

Data moving path betweensites across regions

Narrowbandwidth

Page 10: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea10

GCC 2003 Presentation

Replication Strategy based Replication Strategy based on Bandwidth Hierarchy on Bandwidth Hierarchy (BHR)(BHR)Maximizing Network-level locality

1. Avoiding Replica Duplication in a region

2. Considering popularity of file request at the region-level

XX XX

A RegionA Region

Receiving New Receiving New ReplicaReplica

a Sitea Site a Sitea Site

No space here!No space here!

We should remove some fileWe should remove some file

Delete this one!Delete this one!

Replica X is Replica X is duplicated here!duplicated here!AA

Page 11: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea11

GCC 2003 Presentation

SimulationSimulation

OptorSimData Grid Dynamic Replication Simulation toolDeveloped as part of European Data Grid ProjectImplemented in Java

Implemented Our own Region-based Optimizer in OptorSim

Page 12: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea12

GCC 2003 Presentation

SimulationSimulationSimulation Environment

Grid Site

Router

Region A Region B

Region D

Master Site

Intra-region link

Inter-region linkLink to master

Region C

Page 13: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea13

GCC 2003 Presentation

SimulationsSimulations

Parameters Values

Number of jobs 1000

Number of job types 50

Number of file accessed per job 15

Size of single file 1 GB

Total size of files 750 GB

Parameters Values

Intra-region bandwidth 1000 Mbps

Inter-region bandwidth 1000 Mbps

Master-router bandwidth 2000 Mbps

Storage space at site 50 GB

General configuration of parametersGeneral configuration of parameters

Bandwidth and Storage SizeBandwidth and Storage Size

Page 14: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea14

GCC 2003 Presentation

Simulation ResultsSimulation Results

33174.9

47962.2 50070.9

0

10000

20000

30000

40000

50000

60000

BHR Delete LRU Delete Oldest

Dynamic Replication Strategy

To

tal

Jo

b E

xe

cu

tio

n T

ime

s (

se

c)

Total Job times of three strategiesTotal Job times of three strategies

Page 15: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea15

GCC 2003 Presentation

Simulation ResultsSimulation Results

0

10000

20000

30000

40000

50000

60000

600 800 1000 1200 1400 1600 1800 2000

Bandwidth of Inter-region link

To

tal

Job

Tim

e (

se

c)

BHRDelete LRUDelete Oldest

0

10000

20000

30000

40000

50000

60000

70000

30 50 70 90 110 130

Size of SE

To

tal

Job

Tim

e (

se

c)

BHRDelete LRUDelete Oldest

Total job time with varying bandwidth and storage size

Page 16: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea16

GCC 2003 Presentation

ConclusionsConclusions

The existing dynamic replication strategies are based only on site-level locality of file requestBHR strategy is based on the network-localityBHR shows quite good performance when hierarchy of bandwidth clearly appears, and size of storage at a site is smallWe extend current site-level replica optimization study to more scalable way

Page 17: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea17

GCC 2003 Presentation

ReferencesReferencesWilliam H. Bell, David G. Cameron, Luigi Capozza, A. Paul Millar, Kurt Stockinger, and Floriano Zini.: Simulation of Dynamic Grid Replication Strategies in OptorSim. In Proc. of the 3rd Int'l. IEEE Workshop on Grid Computing (Grid'2002), Baltimore, USA, November 2002. Springer Verlag, Lecture Notes in Computer Science.

William H. Bell, David G. Cameron, Ruben Carvajal-Schiaffino, A. Paul Millar, Kurt Stockinger, and Floriano Zini.: Evaluation of an Economy-Based File Replication Strategy for a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at CCGrid 2003, Tokyo, Japan, May 2003. IEEE Computer Society Press.

Mark Carman, Floriano Zini, Luciano Serafini, and Kurt Stockinger.: Towards an Economy-Based Optimisation of File Access and Replication on a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at International Symposium on Cluster Computing and the Grid (CCGrid'2002), Berlin, Germany, May 2002. IEEE Computer Society Press.

Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke.: The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. Journal of Network and Computer Applications, 23:187-200, 2001.

EU Data Grid Project: http://www.eu-datagrid.org

Page 18: Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Ajou University, South Korea18

GCC 2003 Presentation

ReferencesReferencesI. Foster, C. Kesselman and S. Tuecke.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 15(3), 2001.Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger.: Data Management in an International Data Grid Project. 1st IEEE/ACM International Workshop on Grid Computing (Grid'2000), Bangalore, India, Dec 2000.OptorSim – A Replica Optimizer Simulation: http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.htmlSang-Min Park and Jai-Hoon Kim.: Chameleon: A Resource Scheduler in a Data Grid Environment. 2003 IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'2003), Tokyo, Japan, May 2003. IEEE Computer Society Press.Kavitha Ranganathan and Ian Foster.: Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid. International Conference on Computing in High Energy and Nuclear Physics, Beijing, September 2001.Kavitha Ranganathan and Ian Foster.: Identifying Dynamic Replication Strategies for a High Performance Data Grid. International Workshop on Grid Computing, Denver, November 2001.