optimizing of data access using replication technique
DESCRIPTION
Optimizing of data access using replication technique. Renata Słota 1 , Darin Nikolow 1 ,Łukasz Skitał 2 , Jacek Kitowski 1,2 1 Institute of Computer Science AGH-UST, Cracow 2 ACC CYFRONET AGH, Cracow. Agenda. Motivation of the work Why does today grid computing need replication? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/1.jpg)
Optimizing of data access using replication techniqueRenata SłotaRenata Słota11, Darin Nikolow, Darin Nikolow11,Łukasz Skitał,Łukasz Skitał22,, Jacek KitowskiJacek Kitowski1,21,2
11 Institute of Computer Science AGH-UST, Cracow Institute of Computer Science AGH-UST, Cracow2 2 ACC CYFRONET AGH, CracowACC CYFRONET AGH, Cracow
![Page 2: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/2.jpg)
Agenda
Motivation of the work Why does today grid computing need replication?
Replication basics Clusterix Data Management System
Architecture, optimization and replication algorithms
Optimization Example Replication Example Summary, conclusions
![Page 3: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/3.jpg)
Site-level vs. Grid-levelreplication
Site-level replication Replicas in one site Implementation examples:
RAID HSM
Grid-level replication Data management systems Replicas spread on many sites
![Page 4: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/4.jpg)
Motivation of the workWhy does today grid computing need replication?
Data protection and availability Malfunction of one storage does not
affect data itself, only performance is affected
Performance Low level optimization and replication
are not sufficient (RAID, HSM) Limited network bandwidth Limited storage performance
![Page 5: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/5.jpg)
Replication scenarios Static replication
Decision made by system administrator or user
Limited system support: replica selection, replica coherency, replica ordering
Dynamic replication Decision made by dedicated grid
component based on current data access pattern of users
Full system support
![Page 6: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/6.jpg)
Replication consequences Optimal replica selection algorithm Replica creation and removal
algorithm Cost of replica creation, update
and storage Replica coherency
![Page 7: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/7.jpg)
ClusterixNational Cluster of Linux Systems
Project aim: To develop set of tools and
procedures allowing to build productive Grid environment based on local PC clusters spread in independent supercomputing centers
Network Layer: Pionier – Polish optical networks
![Page 8: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/8.jpg)
Clusterix Data Management System
Architecture
![Page 9: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/9.jpg)
Optimization Algorithm Selects optimal storage element for:
data accessing replica creation
Takes under consideration current state of the System Optimal storage element is one with the maximal
weight W(s,d)
W(s,d)=min((1-NetLoad(s))bandwidth(s,d), (1-Sload(s))Sbandwidth(s))
s – storage elementd – destination nodeNetLoad(s) – s network interface loadBandwidth(s,d) – available bandwidth between s and dSload(s) – storage system loadSbandwidth(s) – storage system bandwidth
![Page 10: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/10.jpg)
Automatic replication algorithm Takes under consideration gain from
replication G(), cost of replica creation C(), cost of replicas update U() and administrative factor A().
Replication profit:P(d,R,S,f)=G(d,R,S,f)+C(d,R,f)+U(d,R,S,f)+A(d,f)
d – storage element, which profit is computed forR – set of storage elements containing replicas of fS – statistic data – history of file usagef – considered file
![Page 11: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/11.jpg)
Storage oriented problems
Data intensive applications for Clusterix
Simulation of transonic flow past a wings tips
Visualization of complex multidimensional structures
Ecosystem modeling and simulation
![Page 12: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/12.jpg)
Optimization Example Node A needs file F stored on SE1, SE2 and SE3
JIMS
SE1
NMS
Node A JIMS
SE2
NMS
JIMS
SE3
NMS
NMS CDMS
Optim
izer
NMS
F
F
F
F
![Page 13: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/13.jpg)
Optimization Example Node A sends request to CDMS
JIMS
SE1
NMS
Node A JIMS
SE2
NMS
JIMS
SE3
NMS
NMS CDMS
Optim
izer
NMS
F
F
F
![Page 14: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/14.jpg)
Optimization Example CDMS uses Optimizer to choice optimal SE
JIMS
SE1
NMS
Node A JIMS
SE2
NMS
JIMS
SE3
NMS
NMS CDMS
Optim
izer
NMS
F
F
F
![Page 15: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/15.jpg)
Optimization Example Optimizer is working…
JIMS
SE1
NMS
Node A JIMS
SE2
NMS
JIMS
SE3
NMS
NMS CDMS
Optim
izer
W(s2,d)=min((1-NetLoad(s2))bandwidth(s2,d), (1-Sload(s2))Sbandwidth(s2))
NMS
W(s1,d)=min((1-NetLoad(s1))bandwidth(s1,d), (1-Sload(s1))Sbandwidth(s1))
W(s3,d)=min((1-NetLoad(s3))bandwidth(s3,d), (1-Sload(s3))Sbandwidth(s3))
F
F
F
![Page 16: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/16.jpg)
Automatic replication exampleSituation
3 clusters
4 storage elements 2 contain replica of
Set of applications running on these clusters and accessing file
SE1
F
SE2 SE3 SE4
F
F
F
![Page 17: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/17.jpg)
Sleeping…Working…
Automatic replication example
CDMS
Optimizer
SE4
SE1
ReplicationModule
StatisticModuleSE2 SE3
F FGain
Cost of rep.Cost of update
Adm. factor
![Page 18: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/18.jpg)
Working…
Automatic replication example
CDMS
Optimizer
SE4
SE1
ReplicationModule
StatisticModuleSE2 SE3
F FDecision: SE2F SE4
FF F
F
F
FF
Sleeping…
![Page 19: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/19.jpg)
Sleeping…
Automatic replication example
CDMS
Optimizer
SE4
SE1
ReplicationModule
StatisticModuleSE2 SE3
F F
F
![Page 20: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/20.jpg)
Summary Architecture of CDMS with
Optimization and Replication modules has been designed
Replication and optimization algorithms has been specified
Modules interfaces has been specified
Future work Integration and tests
![Page 21: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/21.jpg)
Conclusions Simulation of replication vs. real
system implementation Replication should be designed to
meet specific Clusterix applications profile
Data availability Replication drawbacks
![Page 22: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/22.jpg)
Publications Extended functionality of Virtual Storage System
for gridRenata Słota, Darin Nikolow, Łukasz Skitał, Jacek KitowskiCracow Grid Workshop 2004, poster no. 13
Application of data replication methods in Clusterix project (in polish)Renata Słota, Darin Nikolow, Łukasz Skitał, Jacek KitowskiPionier 2004, 19-20 May, Poznań, electronic publication
Implementation of replication methods in the Grid EnvironmentRenata Słota, Darin Nikolow, Łukasz Skitał, Jacek KitowskiSubmitted to European Grid Conference
![Page 23: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/23.jpg)
Thank You!
![Page 24: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/24.jpg)
Clusterix Data Management System
ArchitectureReplication module• Responsible for:
– Automatic replica creation/removal
• Implementation– Java– Apache SOAP
• Cooperate with:– Optimization module – Statistic module
![Page 25: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/25.jpg)
Clusterix Data Management System
ArchitectureOptimization Module•Responsible for:
–storage element selection for newly created replica,
–optimal replica selection.
•Implementation–C/C++
–gSOAP
•Cooperates with:–Network Monitoring System (NMS)
–Information System •JMX-based Infrastructure Monitoring System (JIMS)
![Page 26: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/26.jpg)
Clusterix Data Management System
ArchitectureInformation System (JIMS)Department of Computer Science, AGH University of Science & Technology
Provides the following information for selected node:•Available storage capacity
•Total storage capacity
•Network interface load
•Network interface bandwidth
•Storage system load
•Average storage system load
•Maximal measured storage bandwidth
![Page 27: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/27.jpg)
Clusterix Data Management System
ArchitectureNetwork Monitoring SystemPoznan Supercomputing and Networking Center
Provides the following information:• Maximum bandwidth between two network nodes
• Current load between two network nodes
• Nodes availability
![Page 28: Optimizing of data access using replication technique](https://reader036.vdocuments.us/reader036/viewer/2022062500/568151be550346895dbff104/html5/thumbnails/28.jpg)
Clusterix Data Management System
Architecture
Statistic ModuleBiałystok Technical University
Responsible for gathering information about past data usage