![Page 1: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/1.jpg)
Ceph Storage in OpenStackPart 2
Click icon to add clip art
openstack-ch, 6.12.2013
Jens-Christian [email protected]@jcfischer@switchpeta
![Page 2: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/2.jpg)
© 2013 SWITCH 2
Distributed, redundant storage
Open Source Software, commercial support
http://inktank.com
Ceph
![Page 3: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/3.jpg)
© 2013 SWITCH 3
https://secure.flickr.com/photos/38118051@N04/3853449562
![Page 4: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/4.jpg)
© 2013 SWITCH 4
• Every component must scale• There can be no single point of failure• Software based, not an appliance• Open Source• Run on commodity hardware• Everything must self-manage wherever possible
http://www.inktank.com/resource/end-of-raid-as-we-know-it-with-ceph-replication/
Ceph Design Goals
![Page 5: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/5.jpg)
© 2013 SWITCH 5
• Object– Archival and backup storage– Primary data storage– S3 like storage– Web services and platforms– Application Development
• Block– SAN replacement– Virtual block devices, VM Images
• File– HPC– Posix compliant shared file system
Different Storage Needs
![Page 6: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/6.jpg)
© 2013 SWITCH 6
Ceph
Ceph Object Storage
Ceph Gateway
Objects
Ceph Block Device
Virtual Disks
Ceph File System
Files & Directories
![Page 7: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/7.jpg)
© 2013 SWITCH 7
• Glance– Image and volume snapshot storage (metadata, uses available
storage for actual files)
• Cinder– Block Storage that is exposed as volumes to the virtual machines
• Swift– Object Storage (think S3)
Storage in OpenStack
![Page 8: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/8.jpg)
© 2013 SWITCH 8
Ceph as Storage in OpenStack
http://www.inktank.com/resource/complete-openstack-storage/
![Page 9: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/9.jpg)
© 2013 SWITCH 9
• Volumes for Virtual machines• Backed by RBD• Persistent (unlike VMs)
Block devicesCeph @ SWITCH
https://secure.flickr.com/photos/83275239@N00/7122323447
![Page 10: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/10.jpg)
© 2013 SWITCH 10
Serving the Swiss university and research community
4 major product releases planned• “Academic Dropbox” early Q2 2014• “Academic IaaS” mid 2014• “Academic Storage as a Service” and• “Academic Software Store” later in 2014
Built with OpenStack / Ceph
Cloud @ SWITCH
![Page 11: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/11.jpg)
© 2013 SWITCH 11
• 1 Controller Node• 5 Compute Nodes (expected 8)• Total 120 Cores• 640 GB RAM• 24 * 3 TB SATA Disks: ~72 TB Raw Storage
• OpenStack Havana Release• Ceph Dumpling
“Current” Preproduction Cluster
![Page 12: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/12.jpg)
© 2013 SWITCH 12
• Havana/Icehouse Release• Ceph Emperor/Firefly
• 2 separate data centers• Ceph cluster distributed as well (we’ll see how that goes)
• Around 50 Hypervisors with 1000 cores• Around 2 PB of raw storage
First Planned Production Cluster
![Page 13: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/13.jpg)
© 2013 SWITCH 13
• Dropbox: 50 – 250’000 users => 50 TB – 2.5 PB
• IaaS: 500 – 1000 VMs => 5 TB – 50 TB
• Storage as a Service: 100 TB – ?? PB
There’s a definitive need for scalable storage
Storage numbers
![Page 14: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/14.jpg)
© 2013 SWITCH 14
• Glance images in Ceph• Cinder volumes in Ceph• Ephemeral disks in Ceph
Thanks to the power of Copy on Write• “Instant VM creation” • “Instant volume creation”• “Instant snapshots”
OpenStack & Ceph
![Page 15: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/15.jpg)
© 2013 SWITCH 15
• Almost there – basic support for Glance, Cinder, Nova– Edit config file, create pool and things work (unless you use CentOS)
• Not optimized: “Instant Copy” is really– download from Glance (Ceph) to disk– upload from disk to Cinder (Ceph)
• Patches available, active development, should be integrated in Icehouse
Ceph Support in Havana
![Page 16: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/16.jpg)
© 2013 SWITCH 16
Use RadosGW (S3 compatible)
Current use cases:• A4Mesh: Storage of hydrological scientific data• SWITCH: Storage and Streaming of Video Data
Some weird problems with interruption of large streaming downloads
Object Storage
![Page 17: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/17.jpg)
© 2013 SWITCH 17
Investigating NFS servers, backed either by RBD (Rados Block Device) or by Cinder Volumes
Not our favorite option, but currently a viable option.
Questions about scalability and performance
Shared Storage for VMs
![Page 18: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/18.jpg)
© 2013 SWITCH 18
Cinder volumes
Boot from Volume
Nicely works for Live Migration
Very fast to spawn new volumes from snapshots
Block Devices
![Page 19: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/19.jpg)
© 2013 SWITCH 19
CephFS as shared instance storage
This page intentionally left blank
Don’t
![Page 20: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/20.jpg)
© 2013 SWITCH 20
Be careful about Linux kernel versions (3.12 is about right)
Works under light load
Be prepared for surprises
Or wait for another 9 months (according to word from Inktank)
CephFS for shared file storage
![Page 21: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/21.jpg)
© 2013 SWITCH 21
• Ceph is extremely stable and has been very good to us• Except for CephFS (which for the time being is being de-
emphasized by Inktank)• Software in rapid development – some functionality “in
flux” – difficult to keep up. However: Gone through 2 major Ceph upgrades without downtime
• The Open{Source|Stack} problem: Documentation and experience reports strewn all over the Interwebs (in varying states of being wrong)
Experience
![Page 22: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/22.jpg)
© 2013 SWITCH 22
• Yes!
• Ceph is incredibly stable – unless you do stupid things to it – or use it in ways the developers tell you not to
• Responsive developers, fast turnaround on features
Would we do it again?
![Page 23: Ceph Storage in OpenStack Part 2 openstack-ch, 6.12.2013 Jens-Christian Fischer jens-christian.fischer@switch.ch @jcfischer @switchpeta](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649e795503460f94b78e8c/html5/thumbnails/23.jpg)
© 2013 SWITCH 23
• http://ceph.com/docs/master/rbd/rbd-openstack/• https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd• https://review.openstack.org/#/c/56527/• http://techs.enovance.com/6424/back-from-the-summit-cep
hopenstack-integration
Nitty gritty