-
Resilient and Fast Persistent Container StorageLeveraging Linux’s Storage Functionalities
Philipp Reisner, CEO LINBIT
-
31
COMPANY OVERVIEW
REFERENCES
• Developer of DRBD
• 100% founder owned
• Offices in Europe and US
• Team of 30 highlyexperienced Linux experts
• Partner in Japan
TECHNOLOGY OVERVIEW
LINBIT - the company behind it
-
Linux Storage GemsLVM, RAID, SSD cache tiers, deduplication, targets & initiators
-
31
Linux's LVM
Volume Group
physical volume physical volumephysical volume
logical volumelogical volume snapshot
-
31
Linux's LVM
• based on device mapper• original objects
• PVs, VGs, LVs, snapshots• LVs can scatter over PVs in multiple segments
• thinlv• thinpools = LVs• thin LVs live in thinpools• multiple snapshots became efficient!
-
31
Linux's LVM
VG
PV PVPV
thinpoolLV snapshot
thin-LV thin-LV thin-sLV
-
31
Linux's RAID
• original MD code• mdadm command
• Raid Levels: 0,1,4,5,6,10
• Now available in LVM as well• device mapper interface for MD code
• do not call it ‘dmraid’; that is software for hardware fake-raid
• lvcreate --type raid6 --size 100G VG_name
A4
A3
A2
A1
A4
A3
A2
A1
RAID1
-
31
SSD cache for HDD
• dm-cache• device mapper module
• accessible via LVM tools
• bcache• generic Linux block device
• slightly ahead in the performance game
-
31
Linux’s DeDupe• Virtual Data Optimizer (VDO) since RHEL 7.5
• Red hat acquired Permabit and is GPLing VDO
• Linux upstreaming is in preparation
• in-line data deduplication
• kernel part is a device mapper module
• indexing service runs in user-space
• async or synchronous writeback
• Recommended to be used below LVM
-
31
Linux’s targets & initiators
• Open-ISCSI initiator
• Ietd, STGT, SCST• mostly historical
• LIO• iSCSI, iSER, SRP, FC, FCoE
• SCSI pass through, block IO, file IO, user-specific-IO
• NVMe-OF• target & initiator
Initiator Target
IO-requests
data/completion
-
31
ZFS on Linux• Ubuntu eco-system only
• has its own
• logic volume manager (zVols)
• thin provisioning
• RAID (RAIDz)
• caching for SSDs (ZIL, SLOG)• and a file system!
-
Put in simplest form
-
31
DRBD – think of it as ...
Target
A4
A3
A2
A1
Initiator
IO-requests
data/completion
RAID1
A4
A3
A2
A1
-
31
DRBD Roles: Primary & Secondary
SecondaryPrimaryreplication
-
31
DRBD – multiple Volumes• consistency group
SecondaryPrimaryreplication
-
31
DRBD – up to 32 replicas• each may be synchronous or async
Primary
Secondary
Secondary
-
31
DRBD – Diskless nodes• intentional diskless (no change tracking bitmap)
• disks can fail
Primary
Secondary
Secondary
-
31
DRBD - more about• a node knows the version of the data is exposes
• automatic partial resync after connection outage
• checksum-based verify & resync
• split brain detection & resolution policies
• fencing
• quorum
• multiple resouces per node possible (1000s)
• dual Primary for live migration of VMs only!
-
31
DRBD Roadmap• performance optimizations (2018)
• meta-data on PMEM/NVDIMMS
• zero copy receive on diskless (RDMA-transport)
• no context switch send (RDMA & TCP transport)
• Eurostars grant: DRBD4Cloud• erasure coding (2019)
-
The combination is more than the sum of its parts
-
31
LINSTOR - goals• storage build from generic (x86) nodes
• for SDS consumers (K8s, OpenStack, OpenNebula)
• building on existing Linux storage components
• multiple tenants possible
• deployment architectures• distinct storage nodes
• hyperconverged with hypervisors / container hosts
LVM, thin LVM or ZFS for volume management (stratis later)
Open Source, GPL
-
LINSTOR
DRBD
storage nodestorage node
hypervisor
VM VM
storage node storage node
hypervisor
VM VM
DRBD
hypervisor
-
LINSTOR w. failed Hypervisor
DRBD
storage nodestorage node storage node storage node
hypervisor
VM VM
DRBD
hypervisor
VM VM
-
LINSTOR w. failed storage node
DRBD
storage node
hypervisor
VM VM
storage node storage node
hypervisor
VM VM
DRBD
hypervisor
-
LINSTOR - Hyperconverged
hypervisor & storage
VM VM
hypervisor & storage hypervisor & storage
hypervisor & storage hypervisor & storage hypervisor & storage
-
LINSTOR - VM migrated
hypervisor & storage
VM
hypervisor & storage hypervisor & storage
hypervisor & storage hypervisor & storage hypervisor & storage
VM
-
LINSTOR - add local storage
hypervisor & storage
VM
hypervisor & storage hypervisor & storage
hypervisor & storage hypervisor & storage hypervisor & storage
VM
-
LINSTOR - remove 3rd copy
hypervisor & storage
VM
hypervisor & storage hypervisor & storage
hypervisor & storage hypervisor & storage hypervisor & storage
VM
-
31
LINSTOR Architecture
-
31
LINSTOR Roadmap• Swordfish API
• volume management
• access via NVMe-oF
• inventory sync from Redfish/Swordfish
• support for multiple sites & DRBD-Proxy (Dec 2018)
• north bound drivers• Kubernetes, OpenStack, OpenNebula, Proxmox, XenServer
-
31
Case study - intel
LINBIT working together with Intel
LINSTOR is a storage orchestration technology that brings storage from generic Linuxservers and SNIA Swordfish enabled targets to containerized workloads as persistentstorage. LINBIT is working with Intel to develop a Data Management Platform thatincludes a storage backend based on LINBIT’s software. LINBIT adds support for theSNIA Swordfish API and NVMe-oF to LINSTOR.
Intel® Rack Scale Design (Intel® RSD)is an industry-wide architecture for disaggregated,composable infrastructure that fundamentally changes theway a data center is built, managed, and expanded over time.
-
Thank youhttps://www.linbit.com
1 - Folie12 - LINBIT - the company behind it3 - Linux Storage Gems4 - Linux's LVM5 - Linux's LVM6 - Linux's LVM7 - Linux's RAID8 - SSD cache for HDD9 - Linux’s DeDupe10 - Linux’s targets & initiators11 - ZFS on Linux12 - Put in simplest form13 - DRBD – think of it as ...14 - DRBD Roles: Primary & Secondary15 - DRBD – multiple Volumes16 - DRBD – up to 32 replicas17 - DRBD – Diskless nodes18 - DRBD - more about19 - DRBD Roadmap20 - The combination is more than the sum of its parts
21 - LINSTOR - goals22 - LINSTOR23 - LINSTOR w. failed Hypervisor24 - LINSTOR w. failed storage node25 - LINSTOR - Hyperconverged26 - LINSTOR - VM migrated27 - LINSTOR - add local storage28 - LINSTOR - remove 3rd copy29 - LINSTOR Architecture30 - LINSTOR Roadmap31 - Case study - intel32 - Folie34