brief introduction of drbd in sle12sp2

40
Introduction of DRBD Nick Wang HA Team [email protected]

Upload: nick-wang

Post on 22-Jan-2018

136 views

Category:

Software


3 download

TRANSCRIPT

Page 1: brief introduction of drbd in SLE12SP2

Introduction of DRBD

Nick WangHA Team

[email protected]

Page 2: brief introduction of drbd in SLE12SP2

2

Overview

• What is DRBD

• Development status

• How to use DRBD

• Key features of DRBD

• Packages & Environment

• State of DRBD

• Basic structure

• MD

• What happening when resource starting

Page 3: brief introduction of drbd in SLE12SP2

What is DRBD?

Page 4: brief introduction of drbd in SLE12SP2

Distributed Replicated Block Device

Page 5: brief introduction of drbd in SLE12SP2

Distributed Replicated Block Device

Page 6: brief introduction of drbd in SLE12SP2

Distributed Replicated Block Device

Page 7: brief introduction of drbd in SLE12SP2

Dual primary(Need shard FS support: OCFS2/gfs)

Page 8: brief introduction of drbd in SLE12SP2

Development status

Page 9: brief introduction of drbd in SLE12SP2

DRBD & Kernel

• drbd.ko – already built in kernel, but fall behind our dist Kernel 2.6.33 → 8.3.7 Kernel 3.12 → 8.4.6 (SLE12 SP1 as KMP) Kernel 4.2 → 8.4.X Kernel 4.4 → 9.0.1 (SLE12 SP2 as KMP)

• DRBD – Dev and maintain by Linbit. Ver8.0~8.3.x, Ver8.4.x, Ver9.0.x – Other tools like: drbd-utils, drbd-doc, drbd-test, drbdmanage

Page 10: brief introduction of drbd in SLE12SP2

How to use DRBD

Page 11: brief introduction of drbd in SLE12SP2

Demo time!! - DRBD8 (147.2.207.59/154) - DRBD9 (147.2.212.220/144/107) - DRBD with HA cluster

Page 12: brief introduction of drbd in SLE12SP2

Preparation

• 1) You need to create/provide block device for DRBD

2) You need to distribute DRBD config files.

3) Enable the ports DRBD needed.

4) Need to create meta-data.

5) Trigger the initial synchronization.

Page 13: brief introduction of drbd in SLE12SP2

Configuration in DRBD8

• “test.res” in /etc/drbd.d/ resource test { protocol C; disk { on-io-error pass_on; } on node-1 { address 147.2.207.187:7792; device /dev/drbd0; disk /dev/vdb; meta-disk internal; } on node-2 { address 147.2.207.199:7792; device /dev/drbd0; disk /dev/vdb; meta-disk internal; }}

Page 14: brief introduction of drbd in SLE12SP2

Configuration in DRBD9• “test.res” in /etc/drbd.d/

resource test { net { protocol C; } connection-mesh { hosts node-1 node-2 node-3; } on node-1 { address 10.161.155.151:7788; device /dev/drbd0; disk /dev/sdb1; meta-disk internal; node-id 0; } on node-2 { address 10.161.155.158:7788; device /dev/drbd0; disk /dev/sdb1; meta-disk internal; node-id 1; } on node-3 { address 10.161.155.159:7788; device /dev/drbd0; disk /dev/sdb1; meta-disk /dev/sdc1; node-id 2; }}

Page 15: brief introduction of drbd in SLE12SP2

Crm configuration

• crm configurecrm(live)configure# primitive drbd_test ocf:linbit:drbd \ params drbd_resource="test" \ op monitor interval="29s" role="Master" \ op monitor interval="31s" role="Slave"crm(live)configure# ms ms_drbd_test drbd_test \ meta master-max="1" master-node-max="1" \ clone-max="2" clone-node-max="1" \ notify="true"crm(live)configure# commitcrm(live)configure# exit

Page 16: brief introduction of drbd in SLE12SP2

Key features of DRBD

Page 17: brief introduction of drbd in SLE12SP2

Replication modes

• ...net { protocol C;}…

Fully synchronous mode (LAN): Protocol CAsynchronous mode(WAN): Protocol A and Protocol B (Normally used in Geo scenario)

Page 18: brief introduction of drbd in SLE12SP2

Online device verification

• DRBD permits the verification of local and peer devices in an online fashion.

DRBD doesn't move data between nodes to validate but instead moves cryptographic digests of the data (hash). In this way, a node computes a hash of a block; transfers the much smaller signature to the peer node, which also calculates the hash; and then compares them. If the hashes are the same, the blocks are properly replicated. But if the hashes differ, the out-of-date block is marked as out of sync, and subsequent synchronization ensures that the block is properly synchronized.

Page 19: brief introduction of drbd in SLE12SP2

Automatic recovery

• Automatic resync after node or connectivity failure, direction, amount. DRBD can also recover from a wide variety of errors, but one of the most insidious is the so-called "split brain" situation.

1) Discarding modifications made on the younger primary.2) Discarding modifications made on the older primary.3) Discarding modifications on the primary with fewer changes.4) Graceful recovery from split brain if one host has had no intermediate changes. (Recommended)

...handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root" ...}net { after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; ...}...

Page 20: brief introduction of drbd in SLE12SP2

Optimizing synchronization

• Two of the schemes that DRBD uses are activity logs and the quick-sync bitmap.

The activity log stores blocks that were recently written to and define which blocks need to be synchronized after a failure is resolved. The quick-sync bitmap defines the blocks that are in sync (or out of sync) during a time of disconnection. When the nodes are reconnected, synchronization can use this bitmap to quickly synchronize the nodes to be exact replicas of one another.

Page 21: brief introduction of drbd in SLE12SP2

New features of DRBD9

• 1) Multi-Node replication.

2) Up to 31 connections per resource, that means support 32 nodes cluster.

3) Auto promote.

4) Transport abstraction layer. eg. drbd_transport_tcp.ko All for RDMA on Ethernet/InfiniBand.

5) New manage tools: drbdmanage

Page 22: brief introduction of drbd in SLE12SP2

Packages & Environment

Page 23: brief introduction of drbd in SLE12SP2

DRBD Packages in SLE12SP2

• Project drbd: drbd (COPYING, ChangeLog) drbd-kmp-default (drbd.ko, drbd_transport_tcp.ko ) Project drbd-utils: drbd-utils (drbdadm, drbdmeta, drbdsetup, etc...)

Project yast2-drbd: yast2-drbd

Page 24: brief introduction of drbd in SLE12SP2

Threads of DRBD

• After ko loaded kthread drbd_reissue PR: 0

Per resources started and after connected: drbd<minor>_submit PR: 0 drbd_w(orker)_<res> PR: 20 drbd_r(eceiver)_<res> PR: 20 drbd_a(ck_receiever)_<res> PR: -3 drbd_s(ender)_<res> PR: 20

Page 25: brief introduction of drbd in SLE12SP2

State

Page 26: brief introduction of drbd in SLE12SP2

Resource roles

• Primary: may be read from and written to

Secondary: normally receives updates from its peer, but may neither be read from nor written to

Unknown: It is only displayed for the peer’s resource role, and only in disconnected mode

Page 27: brief introduction of drbd in SLE12SP2

Disk states

• Diskless: No local block device has been assigned to the DRBD driver

Attaching: Reading meta data. Next → Consistent/Inconsistent/…

Failed: I/O failure reported by local block device. Next → Diskless

Consistent/Inconsistent: Consistent data of a node/need sync

UpToDate/Outdated: It is decided when connection is establised.

Dunknown: Used for the peer disk if no network connection.

Page 28: brief introduction of drbd in SLE12SP2

Connection states

• StandAlone: The resource has not yet been connected.

Disconnecting: Temporary state, Next → StandAlone.

Unconnected: Temporary state, Next → WFConnection.

Timeout/NetworkFailure/ProtocolError: Connection Errors.

Teardown: Temporary state, Next → Unconnected.

WFConnection: waiting until the peer node become visible.

Connected: connection has been established.

Others: StartingSyncS/StartingSyncT, WFBitMapS/WFBitMapT, SyncSource/SyncTarget, PausedSyncS/PausedSyncT, VerifyS/VerifyT

Page 29: brief introduction of drbd in SLE12SP2

Basic data structure

Page 30: brief introduction of drbd in SLE12SP2

DRBD resources

• A node has a number of DRBD resources. Each such resource has a number of devices (volumes) and connections to other nodes. Each device has a unique minor device number.

This relationship is represented by the global variable drbd_resources, thedrbd_resource, drbd_connection, drbd_device, and drbd_peer_device objects, and their interconnections.

| resource | device | … | device | | connection | peer_device | … | peer_device | | … | … | ... | … | | connection | peer_device | … | peer_device |

All in lru-safe way, protected by the resource->conf_update mutex.

Page 31: brief introduction of drbd in SLE12SP2

Metadata

Page 32: brief introduction of drbd in SLE12SP2

Metadata includes:

• Information like size of the DRBD device

Generation Identifier

Activity Log

Quick-sync bitmap

Page 33: brief introduction of drbd in SLE12SP2

Activity log

• Considering write operation to the local backing device and the data block send over though the network at the same time, the primary node fail and fail-over being initiated… this data block is out of sync

“The Activity log” , keeps track of those blocks that have "recently" been written to.

So only the blocks in the Activity log need to be synchronized after connection resume.

Page 34: brief introduction of drbd in SLE12SP2

Quick sync bitmap (per node)

• On a per-resource per-peer basis, to keep track of blocks being out-of sync.

One bit represents a 4-KiB chunk of on-disk data

Bitmap is changed in memory, unless changes out of the activity log or the resource is prepare to down.

Page 35: brief introduction of drbd in SLE12SP2

Generation Identifier

• Determining whether the two nodes are in the same cluster

Determining whether need sync and the direction

Identifying split brain

A list consist of:Current UUIDBitmap UUIDsHistorical UUIDs * 2

Three main ways to generate GI:1) Initial sync happen, both side using the GI of SyncSource.2) Promote Secondary to Primary when connection state is disconnected. 3) Original Primary generate new GI when disconnecting, secondary stay unchanged.

Others like disconnecting during state changing...

Page 36: brief introduction of drbd in SLE12SP2

$ drbdadm up <res> What happening?

Page 37: brief introduction of drbd in SLE12SP2

Stages:

• CFG_PREREQCFG_RESOURCECFG_DISK_PREP_DOWN/CFG_DISK_PREP_UPCFG_NET_DISCONNECT/CFG_NET_CONNECTCFG_NET_PREP_DOWN/CFG_NET_PREP_UPCFG_NET_PATHCFG_NET…

For drbdadm up <res>, scheduled stages are:CFG_NET_PREP_UPCFG_NET_PATHCFG_NET_CONNECTCFG_PEER_DEVICECFG_DISK_PREP_UPCFG_DISK

Page 38: brief introduction of drbd in SLE12SP2

Appendices

Page 39: brief introduction of drbd in SLE12SP2

Links

• Linbit homepage: http://www.drbd.org/en/

Source code in tarball: http://www.drbd.org/en/community/download

Git repos: http://git.linbit.com/

Page 40: brief introduction of drbd in SLE12SP2

40