federating grid and cloud s torage in eudat

23
Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds 2014, 23-28 March 2014

Upload: oberon

Post on 24-Feb-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Federating Grid and Cloud S torage in EUDAT. International Symposium on Grids and Clouds 2014, 23-28 March 2014. Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich , CERN. Agenda. Introduction … … … Test results Future work. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Federating Grid and  Cloud  S torage in  EUDAT

Shaun de Witt, STFCMaciej Brzeźniak, PSNCMartin Hellmich, CERN

Federating Grid and Cloud Storage in EUDAT

International Symposium on Grids and Clouds 2014,

23-28 March 2014

Page 2: Federating Grid and  Cloud  S torage in  EUDAT

Agenda• Introduction• …• …• …• Test results• Future work

3rd EUDAT Technical meeting in Bologna 7th February 2013

Page 3: Federating Grid and  Cloud  S torage in  EUDAT

Introduction• We present and analyze the results

of Grid and Cloud Storage integration• In EUDAT we used:

– iRODS as Grid Storage federation mechanism– OpenStack Swift as scalable object storage solution

• Scope:– Proof of concept– Pilot OpenStack Swift installation in PSNC– Production iRODS servers

in PSNC (Poznan) and EPCC (Edinburgh)

3rd EUDAT Technuical meeting in Bologna 7th February 2013

Page 4: Federating Grid and  Cloud  S torage in  EUDAT

EUDAT project introduction• pan-European Data Storage & mgmt infrastructure• Long term data preservation:

• Storage safety, availability – replication, integrity control

• Data Accessibility – visibility, possibility to refer over years

3rd EUDAT Technuical meeting in Bologna 7th February 2013

• Partners: data center & communities:

Page 5: Federating Grid and  Cloud  S torage in  EUDAT

EUDAT challenges:

3rd EUDAT Technuical meeting in Bologna 7th February 2013

• Federate heterogeneous data management systems:• dCache, AFS, DMF, GPFS, SAM-FS

• File systems, HSMs, file servers

• Object Storage systems (!)

while ensuring:• Performance, scalability,

• Data safety, durability, HA, fail-over

• Unique access, Federation transparency,

• Flexibility (rule engine)

• Implement the core services:• safe and long-term storage: B2SAFE,

• efficient analysis: B2STAGE,

• easy deposit & sharing: B2SHARE,

• Data & meta-data exploration: B2FIND.

Picture showing various storagesystems federated under iRODS

Page 6: Federating Grid and  Cloud  S torage in  EUDAT

EUDAT CDI domain of registered data:

Page 7: Federating Grid and  Cloud  S torage in  EUDAT

Grid – Cloud storage integration• Need to integrate Grids and Cloud/Object storage

• Grids get another, cost-effective, scalable backend• Many institutions and initiatives

are testing & using in production object storage including

• Most Cloud Storage use Object Storage concept• Object Storage solutions have limited support

for federation that is well addressed in Grids• In EUDAT we integrated:

• object storage system – OpenStack Swift• iRODS servers and federations

3rd EUDAT Technuical meeting in Bologna 7th February 2013

Page 8: Federating Grid and  Cloud  S torage in  EUDAT

Context: Object Storage Concept• The concept enables building

low-cost, scalable, efficient storage:• Within data centre• DR / distributed configurations

• Reliability thanks to redundancy of components:• Many cost-efficient storage servers w/ disk drives (12-60

HDD/SSD)• Typical (cheap) network: 1/10 Gbit Ethernet

• Limitations of traditional appraoches:• High investment cost and maintenance• Vendor lock-in, Closed architecture, Limited scalability• Slow adoption of new technologies than in commodity market

Page 9: Federating Grid and  Cloud  S torage in  EUDAT

Context: Object Storage importance• Many institutions and initiatives

(DCs, NRENs, companies, R&D projects)are testing & using in production object storage including:• Open source / private cloud:

• Open Stack Swift• Ceph / RADOS• Sheepdog, Scality…

• Commercial:• Amazon S3, RackSpace Cloud Files…• MS Azzure Object Storage…

• Most promising open source: Open Stack Swift & Ceph

Page 10: Federating Grid and  Cloud  S torage in  EUDAT

Object Storage: Architectures

OpenStack Swift

User Apps

Load balancer

ProxyNode

ProxyNode

ProxyNode

StorageNode

StorageNode

StorageNode

StorageNode

StorageNode

UploadDownload

CEPH

LibRados

RadosGW RBD CephFS

APP HOST / VM Client

Rados

MDSMDS.1

MDS.n

......

MONsMON.1

MON.n

......

OSDsOSD.1

OSD.n

......

Page 11: Federating Grid and  Cloud  S torage in  EUDAT

Object Storage: concepts:

OpenStack Swift Ring

Source:The Riak Project

Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

Ceph’s map

• No meta-data lookups, no meta-data DB!, data placement/location computed!

• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes

• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.

Page 12: Federating Grid and  Cloud  S torage in  EUDAT

Object Storage concepts: no DB lookups!

OpenStack Swift Ring

Source:The Riak Project

Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

Ceph’s map

• No meta-data lookups, no meta-data DB!, data placement/location computed!

• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes

• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.

Page 13: Federating Grid and  Cloud  S torage in  EUDAT

Grid – Cloud storage integration

• Most cloud/object storage solutions expose:• S3 interface• Other native interfaces: OSS: Swift; Ceph: RADOS

• S3 (by Amazon) is de facto standard in cloud storage:• Many PetaBytes, Global systems• Vendors use it (e.g. Dropbox) or provides it• Large take up

• Similar concepts:• CDMI: Cloud Data Management Interface –

SNIA standard, not many implementationshttp://www.snia.org/cdmi

• Nimbus.IO: https://nimbus.io

• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/

• RackSpace Cloud Files:www.rackspace.com/cloud/files/3rd EUDAT Technuical meeting in Bologna 7th

February 2013

Page 14: Federating Grid and  Cloud  S torage in  EUDAT

S3 and S3-like in commercial systems:• S3 re-sellers:

• Lots of services• Including Dropbox

• Services similar to S3 concept:• Nimbus.IO:

https://nimbus.io • MS-Azzure blob Storage:

http://www.windowsazure.com/en-us/manage/services/storage/• RackSpace Cloud Files:

www.rackspace.com/cloud/files/• S3 implementations ‚in the hardware’:

• Xyratex• Amplidata

3rd EUDAT Technuical meeting in Bologna 7th February 2013

o

Page 15: Federating Grid and  Cloud  S torage in  EUDAT

Why build PRIVATE S3-like storage?• Features/ benefits:

• Reliable storage on top of commodity hardware• User still able to control the data• Easy scalability, possible to grow the system

• Adding resources and redistributing data possible in non-disruptive way

• Open source software solutions and standards available:• e.g. OpenStack Swift: Open Stack Native API and S3 API• Other S3-enabled storage: e.g. RADOS• CDMI: Cloud Data Management Interface

3rd EUDAT Technuical meeting in Bologna 7th February 2013

Page 16: Federating Grid and  Cloud  S torage in  EUDAT

Why to federate iRODS with S3/OpenStack?

• Some communities have data stored in OpenStack• VPH is building reliable storage cloud on top of OpenStack

Swift within pMedicine project (together with PSNC)

• These data should be available to EUDAT• Data Staging: Cloud -> EUDAT -> PRACE HPC and back• Data Replication: Cloud -> EUDAT -> other back-end

storage• We could apply rule engine to data in the cloud,

assign PIDs

3rd EUDAT Technuical meeting in Bologna 7th February 2013

• We were asked to consider cloud storage:• From EUDAT 1st year review report:

Page 17: Federating Grid and  Cloud  S torage in  EUDAT

EUDAT’s iRODS federation

VPH case analysis:

iRODS serverS3 driver

S3 APIOSS API

iRODS serverother storage driver

Storage system

S3/OSS

client

iRODS client

HPC system

iRODS serverstorage driver

Data access

Data ingestion

Regi-stration

Data Staging

EUDAT’s PID Service

Replication

Dataingestion

Dataaccess

Dataaccess

PIDassigned

Page 18: Federating Grid and  Cloud  S torage in  EUDAT

Our 7.2 project• Purpose:

• To examine existing iRODS-S3 driver• (possibly) to improve it / provide another one

• Steps/status:• 1st stage:

• Play with what is there – done for OpenStack/S3 + iRODS• Examine functionality• Evaluate scalability – found some issues already

• Follow-up• Try to improve the existing S3 driver

• Functionality• Performance

• Implement native Open Stack driver?• Get in touch with iRODS developers

3rd EUDAT Technuical meeting in Bologna 7th February 2013

Page 19: Federating Grid and  Cloud  S torage in  EUDAT

iRODS-OpenStack testsTEST SETUP:• iRODS server:

• Cloud as compoundresources

• Disk cache in front of it• Open Stack Swift:

• 3 proxies, 1 with S3• 5 storage nodes• Extensive functionality and perf. tests

• Amazon S3:• Only limited functionality tests

3rd EUDAT Technuical meeting in Bologna 7th February 2013

S3/OpenStack API

S3 API

iRODS server(s)

Page 20: Federating Grid and  Cloud  S torage in  EUDAT

iRODS-OpenStack testTEST RESULTS:

• S3 vs native OSS overhead• Upload: ~0%• Download: ~8%

• iRODS overhead:• Upload: ~19%• Download:

• From compound S3: ~0%• Cached: SPEEDUP: 230% (cache resources faster than S3)

Page 21: Federating Grid and  Cloud  S torage in  EUDAT

iRODS-OpenStack test

Page 22: Federating Grid and  Cloud  S torage in  EUDAT

Conclusions and future plans:• Conclusions

• Performance-wise iRODS does not bring much overhead – files <2GB• Problems arise for files >2GB – no support for multipart upload

in iRODS-S3 driver – this prevents iRODS from storing files >2GB in clouds• Some functional limits (e.g. imv problem)• Using iRODS to federate S3 clouds in large scale

would require improving the existing or developing a new driver

• Future plans:• Test the integration with VPH’s cloud using existing driver• Ask SAF for supporting the driver development • Get in touch with iRODS developers to assure the sustainability of our

work

Page 23: Federating Grid and  Cloud  S torage in  EUDAT

EUDAT’s iRODS federation

Object storage on top of iRODS?

S3 driver

S3 API S3/OSS

client

iRODS client

Data Access/

ingest

Dataingestion

Dataaccess

iRODS server

Other storage

iRODS serverother storage driver

Storage system Storage system

iRODS API S3 API

Problems:• Data organisation mapping: * filesystem vs objects * big files vs fragments

• Identity mapping? * S3 keys/accounts vs X.509?

• Out of scope of EUDAT? * a lot of work needed