Download - vBACD July 2012 - Scaling Storage with Ceph
![Page 1: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/1.jpg)
SCALING STORAGE WITH CEPH
Ross Turk, Inktank
![Page 2: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/2.jpg)
![Page 3: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/3.jpg)
![Page 4: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/4.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 5: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/5.jpg)
IN THE BEGINNING Magic Madzik, Flickr / CC BY 2.0
![Page 6: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/6.jpg)
EARLY INFORMATION STORAGE Chico.Ferreira, Flickr / CC BY 2.0
![Page 7: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/7.jpg)
WRITING > CAVE PAINTINGS kevingessner, Flickr / CC BY-SA 2.0
![Page 8: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/8.jpg)
x1000
== x1
![Page 9: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/9.jpg)
PEOPLE BEGIN WRITING A LOT Moyan_Brenn, Flickr / CC BY-ND 2.0
![Page 10: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/10.jpg)
WRITING IS T IME-‐CONSUMING trekkyandy, Flickr / CC BY 2.0
![Page 11: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/11.jpg)
THE INDUSTRIALIZATION OF WRITING FateDenied, Flickr / CC BY 2.0
![Page 12: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/12.jpg)
x1000
== x1
+ magnet = tape magnetic tape
![Page 13: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/13.jpg)
STORAGE BECOMES MECHANICAL Erik Pitti, Wikipedia / CC BY-ND 2.0
![Page 14: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/14.jpg)
HUMAN COMPUTER TAPE
HUMAN ROCK
HUMAN
INK
PAPER
![Page 15: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/15.jpg)
COMPUTERS NEED PEOPLE TO WORK USDAgov, Flickr / CC BY 2.0
![Page 16: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/16.jpg)
HUMAN COMPUTER TAPE
![Page 17: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/17.jpg)
11101011 10110110 10110101 10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010 01010110 01010011
==
![Page 18: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/18.jpg)
THROUGHPUT BECOMES IMPORTANT Zane Luke, Flickr / CC BY-ND 2.0
![Page 19: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/19.jpg)
LAZ0R B3AMS CHANGE EVERYTHING!! Jeff Kubina, Flickr / CC-BY-SA 2.0
![Page 20: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/20.jpg)
HARD DRIVES ARE TOTALLY BETTER
amazing spinny hard drives sucky stupid tape slow
![Page 21: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/21.jpg)
EVERYTHING GETS MESSY Rob!, Flickr / CC BY 2.0
![Page 22: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/22.jpg)
000
aa
ac ab
ba
111010
bb bc
110
010 111
dc
101
da 000
110 001
010 011 db
![Page 23: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/23.jpg)
owner: rturk created: aug12
last viewed: aug17 size: 42025 perms: 644 11101011 10110110 10110101
10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010
file
![Page 24: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/24.jpg)
000
aa
ac ab
ba
111010
bb bc
110
010 111
dc
101
da 000
110 001
010 db 01 10
![Page 25: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/25.jpg)
WE OUTGROW THE HARD DRIVE Mr. T in DC, Flickr / CC BY 2.0
![Page 26: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/26.jpg)
HUMAN COMPUTER DISK
DISK
DISK
DISK
DISK
DISK
DISK
![Page 27: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/27.jpg)
PEOPLE NEED S IMULTANEOUS ACCESS wFourier, Flickr / CC BY 2.0
![Page 28: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/28.jpg)
HUMAN COMPUTER DISK
DISK
DISK
DISK
DISK
DISK
DISK
HUMAN
HUMAN
![Page 29: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/29.jpg)
(COMPUTER)
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
HUMAN
HUMAN
HUMAN
HUMAN HUMAN
HUMAN
HUMAN HUMAN
HUMAN HUMAN
HUMAN
HUMAN HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN (actually more like this…)
![Page 30: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/30.jpg)
DISK COMPUTER
HUMAN
HUMAN
HUMAN
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
![Page 31: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/31.jpg)
000
aa
ac ab
ba
111010
bb bc
110
010 111
dc
101
da 000
110 001
010 011 db X
![Page 32: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/32.jpg)
pace: quick driver: frog
license: expired expression: agog
11101011 10110110 10110101 10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010
object
![Page 33: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/33.jpg)
DISK COMPUTER
APP
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
![Page 34: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/34.jpg)
DISK
COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
COMPUTER
DISK
![Page 35: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/35.jpg)
DISK
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
COMPUTER
VM
VM
VM
![Page 36: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/36.jpg)
STORAGE THROUGHOUT H ISTORY Time-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is.
Writing
Computers
Shared storage
Distributed storage
Cloud computing
Ceph
Painting
![Page 37: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/37.jpg)
DISK COMPUTER
HUMAN
HUMAN
HUMAN
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
![Page 38: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/38.jpg)
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
![Page 39: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/39.jpg)
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
![Page 40: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/40.jpg)
HUMAN
HUMAN
HUMAN
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
![Page 41: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/41.jpg)
STORAGE APPLIANCES Michael Moll, Wikipedia / CC BY-SA 2.0
![Page 42: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/42.jpg)
6.4 MILL ION SQFT OF FACTORIES Dude94111, Flickr / CC BY 2.0
![Page 43: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/43.jpg)
STORAGE VENDORS HAVE BIG BILLS CarbonNYC, Flickr / CC BY 2.0
![Page 44: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/44.jpg)
STORAGE APPLIANCES ARE EXPENSIVE 401K 2012, Flickr / CC BY-SA 2.0
![Page 45: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/45.jpg)
TECHNOLOGY IS A COMMODITY RaeAllen, Flickr / CC-BY 2.0
![Page 46: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/46.jpg)
COMMODITY PRICES FLUCTUATE
May-07 May-08 May-09 May-10 May-11 May-12
![Page 47: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/47.jpg)
GROWING WITH HARDWARE APPLIANCES
§ First PB § Proprietary
storage hardware
§ Well-known storage vendor
§ $14 b’zillion
§ Second PB § Proprietary
storage hardware
§ Same storage vendor
§ Another $14 b’zillion
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
![Page 48: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/48.jpg)
APPLIANCES ARE OLD TECHNOLOGY Paul Keller, Flickr / CC BY 2.0
![Page 49: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/49.jpg)
Source: http://www.cpubenchmark.net/high_end_cpus.html
![Page 50: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/50.jpg)
FLAGSHIP HARDWARE APPLIANCE
![Page 51: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/51.jpg)
Hardware Appliances are Mysterious Black Boxes Abode of Chaos, Flickr / CC BY 2.0
![Page 52: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/52.jpg)
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++
![Page 53: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/53.jpg)
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++ X
![Page 54: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/54.jpg)
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
HUMAN [DEVELOPER]
!!
![Page 55: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/55.jpg)
THE WORLD NEEDS
A STORAGE TECHNOLOGY THAT
SCALES INFINITELY
![Page 56: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/56.jpg)
THE WORLD NEEDS
A STORAGE TECHNOLOGY THAT DOESN’T REQUIRE
AN INDUSTRIAL
MANUFACTURING PROCESS
![Page 57: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/57.jpg)
SAGE WEIL
§ Co-founder of DreamHost
§ Inventor of Ceph
§ CEO of Inktank
![Page 58: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/58.jpg)
OPEN SOURCE
philosophy design
![Page 59: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/59.jpg)
OPEN SOURCE SPREADS IDEAS orchidgalore, Flickr / CC BY 2.0
![Page 60: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/60.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
philosophy design
![Page 61: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/61.jpg)
WE ARE SMARTER TOGETHER rturk, Linkedin Inmap
![Page 62: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/62.jpg)
CEPH BELONGS TO ALL OF US wackybadger, Flickr / CC BY 2.0
![Page 63: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/63.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
philosophy design
![Page 64: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/64.jpg)
CEPH IS BUILT TO SCALE
Too much for a book
Too much for a drive
Too much for a computer
Too much for a room
Ceph
Too much for a cave
![Page 65: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/65.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
NO SINGLE POINT OF FAILURE
philosophy design
![Page 66: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/66.jpg)
ARILOMAX CALIFORNICUS aroid, Flickr / CC BY 2.0
![Page 67: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/67.jpg)
THE OCTOPUS (A METAPHOR) I love speaking in metaphors.
single point of failure
highly-available replicated
![Page 68: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/68.jpg)
THE BEEHIVE (ANOTHER METAPHOR) blumenbiene, Flickr / CC BY 2.0
![Page 69: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/69.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
NO SINGLE POINT OF FAILURE
SOFTWARE BASED
philosophy design
![Page 70: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/70.jpg)
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++
![Page 71: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/71.jpg)
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++ ✔
![Page 72: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/72.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
NO SINGLE POINT OF FAILURE
SOFTWARE BASED
SELF-MANAGING
philosophy design
![Page 73: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/73.jpg)
DISKS = JUST T INY RECORD PLAYERS jon_a_ross, Flickr / CC BY 2.0
![Page 74: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/74.jpg)
D
55 times / day
= D
D D
x 1 MILLION
D D
D D
![Page 75: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/75.jpg)
![Page 76: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/76.jpg)
IT ALL STARTED WITH A DREAM
![Page 77: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/77.jpg)
+
![Page 78: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/78.jpg)
NEW MONTHLY CODE COMMITS
0
100
200
300
400
500
600
700
2004-06 2005-07 2006-07 2007-07 2008-07 2009-07 2010-07 2011-07
![Page 79: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/79.jpg)
CEPH STARTS POPPING UP!
(sorry about all the logo tampering)
![Page 80: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/80.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 81: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/81.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 82: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/82.jpg)
DISK
FS
DISK DISK
OSD
DISK DISK
OSD OSD OSD OSD
FS FS FS FS btrfs xfs ext4
M M M
![Page 83: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/83.jpg)
M
M
M
HUMAN
![Page 84: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/84.jpg)
Monitors: § Maintain cluster map § Provide consensus for
distributed decision-making
§ Must have an odd number § These do not serve stored
objects to clients
M
OSDs: § One per disk
(recommended) § At least three in a cluster § Serve stored objects to
clients § Intelligently peer to perform
replication tasks § Supports object classes
![Page 85: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/85.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 86: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/86.jpg)
LIBRADOS
M
M
M
APP
native
![Page 87: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/87.jpg)
L
87
LIBRADOS § Provides direct access to
RADOS for applications § C, C++, Python, PHP,
Java § No HTTP overhead
![Page 88: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/88.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 89: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/89.jpg)
M
M
M
native
REST
APP
LIBRADOS RADOSGW
LIBRADOS RADOSGW
APP
![Page 90: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/90.jpg)
RADOS Gateway: § REST-based interface to
RADOS § Supports buckets,
accounting § Compatible with S3 and
Swift applications
![Page 91: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/91.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
![Page 92: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/92.jpg)
M
M
M
VM
LIBRADOS LIBRBD
VIRTUALIZATION CONTAINER
![Page 93: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/93.jpg)
LIBRADOS
M
M
M
LIBRBD CONTAINER
LIBRADOS LIBRBD
CONTAINER VM
![Page 94: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/94.jpg)
LIBRADOS
M
M
M
KRBD (KERNEL MODULE) HOST
![Page 95: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/95.jpg)
RADOS Block Device: § Storage of virtual disks in
RADOS § Allows decoupling of VMs
and containers § Live migration!
§ Images are striped across the cluster
§ Boot support in QEMU, KVM, and OpenStack Nova
§ Mount support in the Linux kernel
![Page 96: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/96.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 97: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/97.jpg)
M
M
M
CLIENT
01 10
data metadata
![Page 98: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/98.jpg)
Metadata Server § Manages metadata for a
POSIX-compliant shared filesystem § Directory hierarchy § File metadata (owner,
timestamps, mode, etc.) § Stores metadata in RADOS § Does not serve file data to
clients § Only required for shared
filesystem
![Page 99: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/99.jpg)
WHAT MAKES CEPH UNIQUE?
![Page 100: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/100.jpg)
HOW DO YOU F IND YOUR KEYS? azmeen, Flickr / CC BY 2.0
![Page 101: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/101.jpg)
APP ??
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
![Page 102: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/102.jpg)
APP
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
A-G
H-N
O-T
U-Z
F*
![Page 103: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/103.jpg)
I ALWAYS PUT MY KEYS ON THE HOOK vitamindave, Flickr / CC BY 2.0
![Page 104: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/104.jpg)
APP
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
![Page 105: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/105.jpg)
DEAR DIARY: KEYS = IN THE KITCHEN Barnaby, Flickr / CC BY 2.0
![Page 106: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/106.jpg)
HOW DO YOU FIND YOUR KEYS
WHEN YOUR HOUSE IS
INFINITELY BIG AND
ALWAYS CHANGING?
![Page 107: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/107.jpg)
THE ANSWER: CRUSH!! pasukaru76, Flickr / CC SA 2.0
![Page 108: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/108.jpg)
10 10 01 01 10 10 01 11 01 10
10 10 01 01 10 10 01 11 01 10
hash(object name) % num pg
CRUSH(pg, cluster state, rule set)
![Page 109: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/109.jpg)
10 10 01 01 10 10 01 11 01 10
10 10 01 01 10 10 01 11 01 10
![Page 110: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/110.jpg)
CRUSH § Pseudo-random placement
algorithm § Ensures even distribution § Repeatable, deterministic § Rule-based configuration
§ Replica count § Infrastructure topology § Weighting
![Page 111: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/111.jpg)
CLIENT
??
![Page 112: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/112.jpg)
![Page 113: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/113.jpg)
![Page 114: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/114.jpg)
CLIENT
??
![Page 115: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/115.jpg)
LIBRADOS
M
M
M
VM
LIBRBD VIRTUALIZATION CONTAINER
![Page 116: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/116.jpg)
HOW DO YOU SPIN UP
THOUSANDS OF VMs INSTANTLY
AND EFFICIENTLY?
![Page 117: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/117.jpg)
144 0 0 0 0
instant copy
= 144
![Page 118: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/118.jpg)
4 144
CLIENT
write
write
write
= 148
write
![Page 119: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/119.jpg)
4 144
CLIENT read
read
read
= 148
![Page 120: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/120.jpg)
HOW DO YOU MANAGE
DIRECTORY HEIRARCHY WITHOUT
A SINGLE POINT OF FAILURE?
![Page 121: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/121.jpg)
FILESYSTEMS REQUIRE METADATA Barnaby, Flickr / CC BY 2.0
![Page 122: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/122.jpg)
M
M
M
CLIENT
01 10
![Page 123: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/123.jpg)
M
M
M
![Page 124: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/124.jpg)
one tree
three metadata servers
??
![Page 125: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/125.jpg)
![Page 126: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/126.jpg)
![Page 127: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/127.jpg)
![Page 128: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/128.jpg)
![Page 129: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/129.jpg)
DYNAMIC SUBTREE PARTITIONING
![Page 130: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/130.jpg)
AND NOW BACKPEDALING
![Page 131: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/131.jpg)
ALMOST EVERYTHING
WORKS
![Page 132: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/132.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
NEARLY AWESOME
AWESOME AWESOME
AWESOME
AWESOME
![Page 133: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/133.jpg)
LAN SCALE!! *
* OR REALLY REALLY SCARY FAST WAN
![Page 134: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/134.jpg)
CEPH AND CLOUDSTACK tableatny, Flickr / CC BY 2.0
![Page 135: vBACD July 2012 - Scaling Storage with Ceph](https://reader033.vdocuments.us/reader033/viewer/2022051513/54567f22af795950098b4b55/html5/thumbnails/135.jpg)
RBD SUPPORT IN CLOUDSTACK
§ Just announced two weeks ago! § Allows storage of virtual disks inside RADOS
§ Works with KVM only right now § No volume snapshots yet
§ Requires the latest version of, um, everything § More information can be found on the mailing list:
§ ceph-devel / incubator-cloudstack-dev: http://article.gmane.org/gmane.comp.file-systems.ceph.devel/7505