shaun de witt, stfc maciej brzeźniak, psnc martin hellmich, cern federating grid and cloud storage...

Post on 20-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Shaun de Witt, STFCMaciej Brzeźniak, PSNCMartin Hellmich, CERN

Federating Grid and Cloud Storage in EUDAT

International Symposium on Grids and Clouds 2014,

23-28 March 2014

Agenda• Introduction

• …

• …

• …

• Test results

• Future work

3rd EUDAT Technical meeting in Bologna 7th February 2013

Introduction• We present and analyze the results

of Grid and Cloud Storage integration

• In EUDAT we used:– iRODS as Grid Storage federation mechanism

– OpenStack Swift as scalable object storage solution

• Scope:– Proof of concept

– Pilot OpenStack Swift installation in PSNC

– Production iRODS servers in PSNC (Poznan) and EPCC (Edinburgh)

3rd EUDAT Technuical meeting in Bologna 7th February 2013

EUDAT project introduction• pan-European Data Storage & mgmt infrastructure

• Long term data preservation:

• Storage safety, availability – replication, integrity control

• Data Accessibility – visibility, possibility to refer over years

3rd EUDAT Technuical meeting in Bologna 7th February 2013

• Partners: data center & communities:

EUDAT challenges:

3rd EUDAT Technuical meeting in Bologna 7th February 2013

• Federate heterogeneous data management systems:

• dCache, AFS, DMF, GPFS, SAM-FS

• File systems, HSMs, file servers

• Object Storage systems (!)

while ensuring:

• Performance, scalability,

• Data safety, durability, HA, fail-over

• Unique access, Federation transparency,

• Flexibility (rule engine)

• Implement the core services:

• safe and long-term storage: B2SAFE,

• efficient analysis: B2STAGE,

• easy deposit & sharing: B2SHARE,

• Data & meta-data exploration: B2FIND.

Picture showing various storagesystems federated under iRODS

EUDAT CDI domain of registered data:

Grid – Cloud storage integration

• Need to integrate Grids and Cloud/Object storage• Grids get another, cost-effective, scalable backend

• Many institutions and initiativesare testing & using in production object storage including

• Most Cloud Storage use Object Storage concept

• Object Storage solutions have limited supportfor federation that is well addressed in Grids

• In EUDAT we integrated:• object storage system – OpenStack Swift

• iRODS servers and federations

3rd EUDAT Technuical meeting in Bologna 7th February 2013

Context: Object Storage Concept

• The concept enables building low-cost, scalable, efficient storage:• Within data centre

• DR / distributed configurations

• Reliability thanks to redundancy of components:• Many cost-efficient storage servers w/ disk drives (12-60

HDD/SSD)

• Typical (cheap) network: 1/10 Gbit Ethernet

• Limitations of traditional appraoches:• High investment cost and maintenance

• Vendor lock-in, Closed architecture, Limited scalability

• Slow adoption of new technologies than in commodity market

Context: Object Storage importance• Many institutions and initiatives

(DCs, NRENs, companies, R&D projects)are testing & using in production object storage including:• Open source / private cloud:

• Open Stack Swift

• Ceph / RADOS

• Sheepdog, Scality…

• Commercial:• Amazon S3, RackSpace Cloud Files…

• MS Azzure Object Storage…

• Most promising open source: Open Stack Swift & Ceph

Object Storage: Architectures

OpenStack Swift

User Apps

Load balancer

ProxyNode

ProxyNode

ProxyNode

StorageNode

StorageNode

StorageNode

StorageNode

StorageNode

UploadDownload

CEPH

LibRados

RadosGW RBD CephFS

APP HOST / VM Client

Rados

MDS

MDS.1

MDS.n

......

MONs

MON.1

MON.n

......

OSDs

OSD.1

OSD.n

......

Object Storage: concepts:

OpenStack Swift Ring

Source:The Riak Project

Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

Ceph’s map

• No meta-data lookups, no meta-data DB!, data placement/location computed!

• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes

• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.

Object Storage concepts: no DB lookups!

OpenStack Swift Ring

Source:The Riak Project

Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

Ceph’s map

• No meta-data lookups, no meta-data DB!, data placement/location computed!

• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes

• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.

Grid – Cloud storage integration

• Most cloud/object storage solutions expose:• S3 interface

• Other native interfaces: OSS: Swift; Ceph: RADOS

• S3 (by Amazon) is de facto standard in cloud storage:• Many PetaBytes, Global systems

• Vendors use it (e.g. Dropbox) or provides it

• Large take up

• Similar concepts:• CDMI: Cloud Data Management Interface –

SNIA standard, not many implementationshttp://www.snia.org/cdmi

• Nimbus.IO: https://nimbus.io

• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/

• RackSpace Cloud Files:www.rackspace.com/cloud/files/

3rd EUDAT Technuical meeting in Bologna 7th February 2013

S3 and S3-like in commercial systems:

• S3 re-sellers:• Lots of services

• Including Dropbox

• Services similar to S3 concept:• Nimbus.IO:

https://nimbus.io

• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/

• RackSpace Cloud Files:www.rackspace.com/cloud/files/

• S3 implementations ‚in the hardware’:• Xyratex

• Amplidata

3rd EUDAT Technuical meeting in Bologna 7th February 2013

o

Why build PRIVATE S3-like storage?• Features/ benefits:

• Reliable storage on top of commodity hardware

• User still able to control the data

• Easy scalability, possible to grow the system• Adding resources and redistributing data possible in non-disruptive way

• Open source software solutions and standards available:

• e.g. OpenStack Swift: Open Stack Native API and S3 API

• Other S3-enabled storage: e.g. RADOS

• CDMI: Cloud Data Management Interface

3rd EUDAT Technuical meeting in Bologna 7th February 2013

Why to federate iRODS with S3/OpenStack?

• Some communities have data stored in OpenStack

• VPH is building reliable storage cloud on top of OpenStack Swift within pMedicine project (together with PSNC)

• These data should be available to EUDAT

• Data Staging: Cloud -> EUDAT -> PRACE HPC and back

• Data Replication: Cloud -> EUDAT -> other back-end storage

• We could apply rule engine to data in the cloud, assign PIDs

3rd EUDAT Technuical meeting in Bologna 7th February 2013

• We were asked to consider cloud storage:

• From EUDAT 1st year review report:

EUDAT’s iRODS federation

VPH case analysis:

iRODS server

S3 driver

S3 APIOSS API

iRODS server

other storage driver

Storage system

S3/OSS

client

iRODS client

HPC system

iRODS server

storage driver

Data access

Data ingestion

Regi-stration

Data Staging

EUDAT’s PID Service

Replication

Dataingestion

Dataaccess

Dataaccess

PIDassigned

Our 7.2 project

• Purpose:

• To examine existing iRODS-S3 driver

• (possibly) to improve it / provide another one

• Steps/status:

• 1st stage:• Play with what is there – done for OpenStack/S3 + iRODS

• Examine functionality

• Evaluate scalability – found some issues already

• Follow-up• Try to improve the existing S3 driver

• Functionality

• Performance

• Implement native Open Stack driver?

• Get in touch with iRODS developers

3rd EUDAT Technuical meeting in Bologna 7th February 2013

iRODS-OpenStack tests

TEST SETUP:• iRODS server:

• Cloud as compoundresources

• Disk cache in front of it

• Open Stack Swift:

• 3 proxies, 1 with S3

• 5 storage nodes

• Extensive functionality and perf. tests

• Amazon S3:

• Only limited functionality tests

3rd EUDAT Technuical meeting in Bologna 7th February 2013

S3/OpenStack API

S3 API

iRODS server(s)

iRODS-OpenStack test

TEST RESULTS:

• S3 vs native OSS overhead• Upload: ~0%

• Download: ~8%

• iRODS overhead:

• Upload: ~19%

• Download:• From compound S3: ~0%

• Cached: SPEEDUP: 230%

(cache resources faster than S3)

iRODS-OpenStack test

Conclusions and future plans:

• Conclusions

• Performance-wise iRODS does not bring much overhead – files <2GB

• Problems arise for files >2GB – no support for multipart upload in iRODS-S3 driver – this prevents iRODS from storing files >2GB in clouds

• Some functional limits (e.g. imv problem)

• Using iRODS to federate S3 clouds in large scalewould require improving the existing or developing a new driver

• Future plans:

• Test the integration with VPH’s cloud using existing driver

• Ask SAF for supporting the driver development

• Get in touch with iRODS developers to assure the sustainability of our work

EUDAT’s iRODS federation

Object storage on top of iRODS?

S3 driver

S3 API S3/OSS

client

iRODS client

Data Access/

ingest

Dataingestion

Dataaccess

iRODS server

Other storage

iRODS server

other storage driver

Storage system Storage system

iRODS API S3 API

Problems:• Data organisation mapping: * filesystem vs objects * big files vs fragments

• Identity mapping? * S3 keys/accounts vs X.509?

• Out of scope of EUDAT? * a lot of work needed

top related