ceph intro & architectural overview - red hat

Post on 16-Feb-2022

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ceph Intro & Architectural OverviewFederico LucifrediProduct Management Director, Ceph StorageVancouver & Guadalajara, May 18th, 2015

2

CLOUD SERVICES

COMPUTE NETWORK STORAGE

the future of storage™

3

HUMANHUMAN COMPUTERCOMPUTER TAPETAPE

HUMANHUMAN ROCKROCK

HUMANHUMAN

INKINK

PAPERPAPER

4

HUMANHUMAN COMPUTERCOMPUTER TAPETAPE

5

YOUYOU TECHNOLOGYTECHNOLOGY YOUR DATAYOUR DATA

6

How Much Store Things All Human History?!writing

paper

computers

distributed storage

cloud computing

gaaaaaaaaahhhh!!!!!!

carving

7

HUMANHUMAN COMPUTERCOMPUTER DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

HUMANHUMAN

HUMANHUMAN

8

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMANHUMANHUMAN

HUMANHUMAN

HUMANHUMANHUMANHUMAN

HUMANHUMANHUMANHUMAN

HUMANHUMAN

HUMANHUMANHUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

COMPUTERCOMPUTER

9

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

DISKDISK

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMANHUMANHUMAN

HUMANHUMAN

HUMANHUMANHUMANHUMAN

HUMANHUMANHUMANHUMAN

HUMANHUMAN

HUMANHUMANHUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

GIANT SPENDY

COMPUTER

GIANT SPENDY

COMPUTER

10

DISKDISKCOMPUTERCOMPUTER

HUMANHUMAN

HUMANHUMAN

HUMANHUMANDISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

11

HUMANHUMAN

HUMANHUMAN

HUMANHUMAN

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

12

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

“STORAGE APPLIANCE”

Storage ApplianceMichael Moll, Wikipedia / CC BY-SA 2.0 13

SUPPORT AND MAINTENANCESUPPORT AND MAINTENANCE

PROPRIETARY SOFTWARE

PROPRIETARY SOFTWARE

14

PROPRIETARY HARDWARE

PROPRIETARY HARDWARE

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

34% of revenue(5.7 billion dollars)

1.3 billion in R&DSpent in a year

1.6+ million square feetof manufacturing space

$NYSE:EMC, FY2014 10K

15

1010100110

1010110011

1001100101

1001101011

1001100111

1001010011

THE CLOUD

SUPPORT AND MAINTENANCESUPPORT AND MAINTENANCE

PROPRIETARY SOFTWARE

PROPRIETARY SOFTWARE

16

PROPRIETARY HARDWARE

PROPRIETARY HARDWARE

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

STANDARD HARDWARESTANDARD HARDWARE

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

DISKDISKCOMPUTERCOMPUTER

OPEN SOURCE SOFTWARE

OPEN SOURCE SOFTWARE

ENTERPRISE SUBSCRIPTION

ENTERPRISE SUBSCRIPTION

(optional)

17

18

OPEN SOURCEOPEN SOURCE

COMMUNITY-FOCUSEDCOMMUNITY-FOCUSED

SCALABLESCALABLE

NO SINGLE POINT OF FAILURENO SINGLE POINT OF FAILURE

SOFTWARE BASEDSOFTWARE BASED

SELF-MANAGINGSELF-MANAGING

philosophy design

19

8 years & 20,000 commits later…

20

21

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APPAPP APPAPP HOST/VMHOST/VM CLIENTCLIENT

22

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APPAPP APPAPP HOST/VMHOST/VM CLIENTCLIENT

23

DISKDISK

FSFS

DISKDISK DISKDISK

OSDOSD

DISKDISK DISKDISK

OSDOSD OSDOSD OSDOSD OSDOSD

FSFS FSFS FSFSFSFS btrfsxfsext4

MMMMMM

24

MM

MM

MM

HUMANHUMAN

25

Monitors:• Maintain cluster membership and state• Provide consensus for distributed decision-making• Small, odd number• These do not serve stored objects to clients

MM

OSDs:• 10s to 10000s in a cluster• One per disk• (or one per SSD, RAID group…)• Serve stored objects to clients• Intelligently peer to perform replication and recovery tasks

26

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APPAPP APPAPP HOST/VMHOST/VM CLIENTCLIENT

LIBRADOSLIBRADOS

MM

MM

MM

27

APPAPP

socket

LLLIBRADOS• Provides direct access to

RADOS for applications• C, C++, Python, PHP, Java,

Erlang• Direct access to storage nodes• No HTTP overhead

29

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APPAPP APPAPP HOST/VMHOST/VM CLIENTCLIENT

30

MM

MM

MM

LIBRADOSLIBRADOS

RADOSGWRADOSGW

APPAPP

socket

REST

31

RADOS Gateway:• REST-based object storage

proxy• Uses RADOS to store objects• API supports buckets,

accounts• Usage accounting for billing• Compatible with S3 and

Swift applications

32

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APPAPP APPAPP HOST/VMHOST/VM CLIENTCLIENT

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

33

MM

MM

MM

VMVM

LIBRADOSLIBRADOSLIBRBDLIBRBD

VIRTUALIZATION CONTAINERVIRTUALIZATION CONTAINER

LIBRADOSLIBRADOS

34

MM

MM

MM

LIBRBDLIBRBD

CONTAINERCONTAINER

LIBRADOSLIBRADOSLIBRBDLIBRBD

CONTAINERCONTAINERVMVM

LIBRADOSLIBRADOS

35

MM

MM

MM

KRBD (KERNEL MODULE)KRBD (KERNEL MODULE)

HOSTHOST

36

RADOS Block Device:• Storage of disk images in RADOS• Decouples VMs from host• Images are striped across the cluster (pool)• Snapshots• Copy-on-write clones• Support in:• Mainline Linux Kernel (2.6.39+)• Qemu/KVM• OpenStack, CloudStack

37

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APPAPP APPAPP HOST/VMHOST/VM CLIENTCLIENT

38

MM

MM

MM

CLIENTCLIENT

01100110

datametadata

39

Metadata Server• Manages metadata for a POSIX-compliant shared filesystem• Directory hierarchy• File metadata (owner,

timestamps, mode, etc.)• Stores metadata in RADOS• Does not serve file data to clients• Only required for shared filesystem

Questions?

40

Federico LucifrediPM Director, Ceph

federico@redhat.com@0xF2

redhat.com | ceph.com

top related