introduction to drbd
DESCRIPTION
Talk about DRBD in Sudoers Barcelona's October meeting.TRANSCRIPT
Sudoers BarcelonaOctubre 2013
alba ferrer
What is it?
Distributed Replicated Block Device
What is it?
Distributed Replicated Block Device
Software-based, shared-nothing replicated storage solution mirroring the contents of block devices
What is it?
Distributed Replicated Block Device
Software-based, shared-nothing replicated storage solution mirroring the contents of block devices
• In real time
What is it?
Distributed Replicated Block Device
Software-based, shared-nothing replicated storage solution mirroring the contents of block devices
• In real time• Transparently
What is it?
Distributed Replicated Block Device
Software-based, shared-nothing replicated storage solution mirroring the contents of block devices
• In real time• Transparently• Synchronously/asynchronously
Kernel module
User space admin tools• drbsetup
• Used to configure the kernel module• All parameters in command-line
User space admin tools• drbsetup
• Used to configure the kernel module• All parameters in command-line
• drbdmeta• Create/dump/restore/modify DRBD metadata
User space admin tools• drbsetup
• Used to configure the kernel module• All parameters in command-line
• drbdmeta• Create/dump/restore/modify DRBD metadata
• drbdadm• High-level, frontend for drbdsetup/drbdmeta• Reads from /etc/drbd.conf• Has a dry-run option (-d)
Resources
• A particular replicated storage device
Resources
• A particular replicated storage device
• Resource name• DRBD device: virtual block device (major=147).
The associated block device is always /dev/drbdm (m=minor)
• Disk configuration: local copy of the data• Network configuration: comms with peer
ConfigurationPer resource (/etc/drbd.d/mysql.res):
resource mysql {device minor 0; # /dev/drbd0disk /dev/sdb;meta-disk internal;
on alice {address 192.168.133.111:7000;
}on bob {
address 192.168.133.112:7000;}
syncer {rate 10M; # static resync rate of
10MByte/s}
}
Configuration
Global (/etc/drbd.d/global_common.conf):global {
usage-count yes;}
common {protocol C;disk {
on-io-error detach;}syncer {
al-extents 3833;}
}
Resource roles
• Primary: read and write ops• Secondary: receives updates from primary,
disallows any other access.
• Promotion: from secondary to primary drbdadm primary all
• Demotion: from primary to secondarydrbdadm secondary all
Modes
• Single-primary• Dual-primary (>= 8.0)
Modes
• Single-primary• Dual-primary (>= 8.0)
• Replication modes:• Protocol A: asynchronous• Protocol B: memory synchronous• Protocol C: synchronous
Features: efficient synchronization
• Synchronization != replication
Features: efficient synchronization
• Synchronization != replication• Inconsistent remote dataset during sync
• Useless
Features: efficient synchronization
• Synchronization != replication• Inconsistent remote dataset during sync
• Useless• Service in active node unaffected
Features: efficient synchronization
• Synchronization != replication• Inconsistent remote dataset during sync
• Useless• Service in active node unaffected• Synchronization and replication happen at the
same time
Features: efficient synchronization
• Only one write op per several successive writes in active node in a block
Features: efficient synchronization
• Only one write op per several successive writes in active node in a block
• Linear access to blocks
Features: efficient synchronization
• Only one write op per several successive writes in active node in a block
• Linear access to blocks• Configure rate of sync
Features: efficient synchronization
• Only one write op per several successive writes in active node in a block
• Linear access to blocks• Configure rate of sync
• Checksum-based synchronization
Features: data verification
• On-line device verification• block-by-block data integrity check
between nodes
Features: data verification
• On-line device verification• block-by-block data integrity check
between nodes• Replication traffic integrity checking
• end-to-end message integrity checking using cryptographic message digest algorithms
Features: disk
• Support for disk flushes
Features: disk
• Support for disk flushes• Disk error handling strategies
• Passing• Masking• DIY
Features: disk
• Support for disk flushes• Disk error handling strategies
• Passing• Masking• DIY
• Deal with outdated data• DRBD won't promote an outdated
resource -> fencing
Features: replication• Three-way replication
Features: replication
• Long distance replication with DRBD Proxy• Not free
• Truck based replication
Split-brain
Split brain is a situation where, due to temporary failure of all network links between cluster nodes, and possibly due to intervention by a cluster management software or human error, both nodes switched to the primary role while disconnected.
Split-brain
• Configurable notifications
Split-brain
• Configurable notifications• Automatic recovery methods
• Discard modifications on 'younger' primary.• Discard modifications on 'older' primary.• Discard modifications on primary with
fewer changes.• Graceful recovery if one primary had no
changes.
Metadata
• Various pieces of information about the data DRBD keeps in a dedicated area• The size of the DRBD device• The generation identifier• The activity log• The quick-sync bitmap
Metadata
• Can be stored internally or externally
Metadata
• Can be stored internally or externally• Size
root@bob:~ # blockdev --getsz /dev/drbd0root@bob:~ # 8388280
(8388280/2^18) * 8 + 72 = 328 sectors328 sectors = 0,16MB
What it’s not/What it can’t do
• It’s not a backup system
What it’s not/What it can’t do
• It’s not a backup system
• It can’t add features to upper layers
What it’s not/What it can’t do
• It’s not a backup system
• It can’t add features to upper layers• DRBD cannot auto-detect file system
corruption • DRBD cannot add active-active clustering
capability to file systems like ext3 or XFS.
Limitations
• Only two nodes• Stacked resources• Version 9
Limitations
• Only two nodes• Stacked resources• Version 9
• There is no automatic failover.
Limitations
• Only two nodes• Stacked resources• Version 9
• There is no automatic failover.• Promotion/demotion is manual.
Limitations
• Only two nodes• Stacked resources• Version 9
• There is no automatic failover.• Promotion/demotion is manual.• Needs a CRM to be useful
PACEMAKER FTW
Funcionament
root@alice:/etc/drbd.d # cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)GIT-hash: 234a142f7cf5bb21ffa1e95afa4f31608089c8b8 build by buildsystem@linbit, 2012-09-12 14:27:28 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:152 nr:4 dw:156 dr:4017 al:5 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
More info
• drbd.org
• www.drbd.org/home/mailinglists
• www.linbit.com