shaun de witt, stfc maciej brzeźniak, psnc martin hellmich, cern federating grid and cloud storage...
TRANSCRIPT
![Page 1: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/1.jpg)
Shaun de Witt, STFCMaciej Brzeźniak, PSNCMartin Hellmich, CERN
Federating Grid and Cloud Storage in EUDAT
International Symposium on Grids and Clouds 2014,
23-28 March 2014
![Page 2: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/2.jpg)
Agenda• Introduction
• …
• …
• …
• Test results
• Future work
3rd EUDAT Technical meeting in Bologna 7th February 2013
![Page 3: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/3.jpg)
Introduction• We present and analyze the results
of Grid and Cloud Storage integration
• In EUDAT we used:– iRODS as Grid Storage federation mechanism
– OpenStack Swift as scalable object storage solution
• Scope:– Proof of concept
– Pilot OpenStack Swift installation in PSNC
– Production iRODS servers in PSNC (Poznan) and EPCC (Edinburgh)
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 4: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/4.jpg)
EUDAT project introduction• pan-European Data Storage & mgmt infrastructure
• Long term data preservation:
• Storage safety, availability – replication, integrity control
• Data Accessibility – visibility, possibility to refer over years
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• Partners: data center & communities:
![Page 5: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/5.jpg)
EUDAT challenges:
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• Federate heterogeneous data management systems:
• dCache, AFS, DMF, GPFS, SAM-FS
• File systems, HSMs, file servers
• Object Storage systems (!)
while ensuring:
• Performance, scalability,
• Data safety, durability, HA, fail-over
• Unique access, Federation transparency,
• Flexibility (rule engine)
• Implement the core services:
• safe and long-term storage: B2SAFE,
• efficient analysis: B2STAGE,
• easy deposit & sharing: B2SHARE,
• Data & meta-data exploration: B2FIND.
Picture showing various storagesystems federated under iRODS
![Page 6: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/6.jpg)
EUDAT CDI domain of registered data:
![Page 7: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/7.jpg)
Grid – Cloud storage integration
• Need to integrate Grids and Cloud/Object storage• Grids get another, cost-effective, scalable backend
• Many institutions and initiativesare testing & using in production object storage including
• Most Cloud Storage use Object Storage concept
• Object Storage solutions have limited supportfor federation that is well addressed in Grids
• In EUDAT we integrated:• object storage system – OpenStack Swift
• iRODS servers and federations
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 8: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/8.jpg)
Context: Object Storage Concept
• The concept enables building low-cost, scalable, efficient storage:• Within data centre
• DR / distributed configurations
• Reliability thanks to redundancy of components:• Many cost-efficient storage servers w/ disk drives (12-60
HDD/SSD)
• Typical (cheap) network: 1/10 Gbit Ethernet
• Limitations of traditional appraoches:• High investment cost and maintenance
• Vendor lock-in, Closed architecture, Limited scalability
• Slow adoption of new technologies than in commodity market
![Page 9: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/9.jpg)
Context: Object Storage importance• Many institutions and initiatives
(DCs, NRENs, companies, R&D projects)are testing & using in production object storage including:• Open source / private cloud:
• Open Stack Swift
• Ceph / RADOS
• Sheepdog, Scality…
• Commercial:• Amazon S3, RackSpace Cloud Files…
• MS Azzure Object Storage…
• Most promising open source: Open Stack Swift & Ceph
![Page 10: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/10.jpg)
Object Storage: Architectures
OpenStack Swift
User Apps
Load balancer
ProxyNode
ProxyNode
ProxyNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
UploadDownload
CEPH
LibRados
RadosGW RBD CephFS
APP HOST / VM Client
Rados
MDS
MDS.1
MDS.n
......
MONs
MON.1
MON.n
......
OSDs
OSD.1
OSD.n
......
![Page 11: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/11.jpg)
Object Storage: concepts:
OpenStack Swift Ring
Source:The Riak Project
Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Ceph’s map
• No meta-data lookups, no meta-data DB!, data placement/location computed!
• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes
• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.
![Page 12: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/12.jpg)
Object Storage concepts: no DB lookups!
OpenStack Swift Ring
Source:The Riak Project
Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Ceph’s map
• No meta-data lookups, no meta-data DB!, data placement/location computed!
• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes
• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.
![Page 13: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/13.jpg)
Grid – Cloud storage integration
• Most cloud/object storage solutions expose:• S3 interface
• Other native interfaces: OSS: Swift; Ceph: RADOS
• S3 (by Amazon) is de facto standard in cloud storage:• Many PetaBytes, Global systems
• Vendors use it (e.g. Dropbox) or provides it
• Large take up
• Similar concepts:• CDMI: Cloud Data Management Interface –
SNIA standard, not many implementationshttp://www.snia.org/cdmi
• Nimbus.IO: https://nimbus.io
• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/
• RackSpace Cloud Files:www.rackspace.com/cloud/files/
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 14: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/14.jpg)
S3 and S3-like in commercial systems:
• S3 re-sellers:• Lots of services
• Including Dropbox
• Services similar to S3 concept:• Nimbus.IO:
https://nimbus.io
• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/
• RackSpace Cloud Files:www.rackspace.com/cloud/files/
• S3 implementations ‚in the hardware’:• Xyratex
• Amplidata
3rd EUDAT Technuical meeting in Bologna 7th February 2013
o
![Page 15: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/15.jpg)
Why build PRIVATE S3-like storage?• Features/ benefits:
• Reliable storage on top of commodity hardware
• User still able to control the data
• Easy scalability, possible to grow the system• Adding resources and redistributing data possible in non-disruptive way
• Open source software solutions and standards available:
• e.g. OpenStack Swift: Open Stack Native API and S3 API
• Other S3-enabled storage: e.g. RADOS
• CDMI: Cloud Data Management Interface
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 16: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/16.jpg)
Why to federate iRODS with S3/OpenStack?
• Some communities have data stored in OpenStack
• VPH is building reliable storage cloud on top of OpenStack Swift within pMedicine project (together with PSNC)
• These data should be available to EUDAT
• Data Staging: Cloud -> EUDAT -> PRACE HPC and back
• Data Replication: Cloud -> EUDAT -> other back-end storage
• We could apply rule engine to data in the cloud, assign PIDs
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• We were asked to consider cloud storage:
• From EUDAT 1st year review report:
![Page 17: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/17.jpg)
EUDAT’s iRODS federation
VPH case analysis:
iRODS server
S3 driver
S3 APIOSS API
iRODS server
other storage driver
Storage system
S3/OSS
client
iRODS client
HPC system
iRODS server
storage driver
Data access
Data ingestion
Regi-stration
Data Staging
EUDAT’s PID Service
Replication
Dataingestion
Dataaccess
Dataaccess
PIDassigned
![Page 18: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/18.jpg)
Our 7.2 project
• Purpose:
• To examine existing iRODS-S3 driver
• (possibly) to improve it / provide another one
• Steps/status:
• 1st stage:• Play with what is there – done for OpenStack/S3 + iRODS
• Examine functionality
• Evaluate scalability – found some issues already
• Follow-up• Try to improve the existing S3 driver
• Functionality
• Performance
• Implement native Open Stack driver?
• Get in touch with iRODS developers
3rd EUDAT Technuical meeting in Bologna 7th February 2013
![Page 19: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/19.jpg)
iRODS-OpenStack tests
TEST SETUP:• iRODS server:
• Cloud as compoundresources
• Disk cache in front of it
• Open Stack Swift:
• 3 proxies, 1 with S3
• 5 storage nodes
• Extensive functionality and perf. tests
• Amazon S3:
• Only limited functionality tests
3rd EUDAT Technuical meeting in Bologna 7th February 2013
S3/OpenStack API
S3 API
iRODS server(s)
![Page 20: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/20.jpg)
iRODS-OpenStack test
TEST RESULTS:
• S3 vs native OSS overhead• Upload: ~0%
• Download: ~8%
• iRODS overhead:
• Upload: ~19%
• Download:• From compound S3: ~0%
• Cached: SPEEDUP: 230%
(cache resources faster than S3)
![Page 21: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/21.jpg)
iRODS-OpenStack test
![Page 22: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/22.jpg)
Conclusions and future plans:
• Conclusions
• Performance-wise iRODS does not bring much overhead – files <2GB
• Problems arise for files >2GB – no support for multipart upload in iRODS-S3 driver – this prevents iRODS from storing files >2GB in clouds
• Some functional limits (e.g. imv problem)
• Using iRODS to federate S3 clouds in large scalewould require improving the existing or developing a new driver
• Future plans:
• Test the integration with VPH’s cloud using existing driver
• Ask SAF for supporting the driver development
• Get in touch with iRODS developers to assure the sustainability of our work
![Page 23: Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International Symposium on Grids and Clouds](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bfa31a28abf838c96825/html5/thumbnails/23.jpg)
EUDAT’s iRODS federation
Object storage on top of iRODS?
S3 driver
S3 API S3/OSS
client
iRODS client
Data Access/
ingest
Dataingestion
Dataaccess
iRODS server
Other storage
iRODS server
other storage driver
Storage system Storage system
iRODS API S3 API
Problems:• Data organisation mapping: * filesystem vs objects * big files vs fragments
• Identity mapping? * S3 keys/accounts vs X.509?
• Out of scope of EUDAT? * a lot of work needed