eva architecture introduction
DESCRIPTION
EVA Architecture IntroductionTRANSCRIPT
© 2004 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice
EVA Architecture Overview
Rodger DanielsEVA Replication Architect
Apr 8, 2023 2
Disclosure information
The following information is “HP Confidential” and is intended only for a limited audience within HP who fulfill a “need to know” requirement. The information contained is to be handled accordingly with HP’s policy for handling this classification of information.
http://legalweb.corp.hp.com/legal/files/labels.asp
This information may NOT be shared outside HP.
04/08/23 00:02 3
On LineHP Confidential NSSNSS
• EVA– Excellent Random R/W performance– Excellent cache read hit number– Fault tolerant, scaleable virtualization mapping scheme (Garbage collection free)– Mirrored write cache– Volatile read cache– Metadata in volatile memory (Policy Memory)
• Backend disks provide non volatile metadata store– Replication features
• Snapclones, Snapshots (fully allocated, space efficient)• CA disaster tolerant remote replication
– RAID0, RAID1, RAID5– Active mirroring between controllers through FC Mirror Port(s)– GL - On the fly XOR– XL – Inline parity calculation
EVA Features
04/08/23 00:02 4
On LineHP Confidential NSSNSS
• NSC (Network Storage Controller)– Refers to a controller from a VCS perspective
• VCS– Virtual Controller Software (firmware)– Becomes XCS for XL family
• Physical Store– Unused Drive (In the process of becoming a useable part of the system, but needs to become
incorporated into an RSS)
• Volume– A used disk drive, can accept customer data at this point
• Storage Cell– EVA Controllers, Shelves and Disks that have been initialized by the firmware. Can be logically
constructed into Disk Groups (LDADs), Logical Disks, Virtual Disks and then used for customer data
• Disk Group (LDAD – Logical Disk Address Domain)– A group of disks that function as a separate storage pool. A virtual disk is contained within a
single disk group and can not span disk groups. A disk group is made up of one or more redundant store sets. User data for a virtual disk is striped across the entire disk group.
Architectural Discussion Objects
04/08/23 00:02 5
On LineHP Confidential NSSNSS
• Quorum– A set of disks that contains copies of the SCS data base
• Logical Disk– Logical representation of a virtual disk. At the CS component level the
representation of a virtual disk
• Virtual Disk– A virtual representation of a logical disk, for external use by a host
• Presented Unit– The presentation of a virtual disk, ie. its mounted and useable by a host
• RSS (Redundant Store Set)– A subset of disks within a disk group that represents a smaller fault domain
then the disk group..
Architectural Discussion Objects
04/08/23 00:02 6
On LineHP Confidential NSSNSS
2C12D: One disk group containing all 64 drives
Eight RSSs:RSS 1RSS 2RSS 3RSS 4RSS 5RSS 6RSS 7RSS 8
RSS Example
04/08/23 00:02 7
On LineHP Confidential NSSNSS
Host Port
CacheManager
RaidServices
FCServices
DRM Core
DRM Log
DRM FC
HP Tachyons
Device Tachyons Mirror Tachyon
DRM Copy
EMUENVIRONMENTAL
MONITOR UNIT
SCMIServices
2
HTB
HTB XD
HTB
EETB
XD XD
XD
SEST,ERQ,IMQ
FED
MFCD
FED
SEST,ERQ,IMQSEST,IMQ ERQ
TDCB
DTD
TDCB
ALLOC DEALLOC
EXECRTOS
CNODE
CODE HIGHWAY
FaultManager
EIRP,TEIRP
EIP
OCPOPERATIONAL
CONTROL PANEL
ALLCOMPONENTS
SCSSTORAGE
CELLSTATE
TDSD, ELSD, MFCD,FED
CONFIG/STATE
CONFIG/STATE
SCSCB
2
CONFIGSTATE
CONFIGSTATE
3CONFIGSTATE
COMMANDSTATE
4 4
11
XD 3CS
CONTAINERSERVICES
XD
XD
5
3
5
CONFIG
STATE
ALLOC
DEALLOC
6
6 CSIO
CSLDREADY
CSIO
04/08/23 00:02 8
On LineHP Confidential NSSNSS
• Host Port– Front end FC services, decodes and sequences instructions, controller responses to host, assigns work to code
highway, passes commands to SCMI, supports SCSI interface, handles AAA logic (V4)
• (SCMI) Storage Cell Management Interface– Architected interface to allow external management agents (Command View/Bridge) to manage the EVA
• (SCS) Storage Cell State– Inoperative/Operative unit handling, SCMI requests to add/remove objects from the system, return info about
objects, unit presentation, pullover, failover, meltdowns, meltdown recovery, ILF disk management, system database, RSS management, add/remove devices, cell mastership, error reporting
• Cache Manager– Read/Write cache management, full stripe writes, assigns work to RAID services, RAID5 write recovery/parity
recovery
• (DRM) Data Replication Manager – Continuous Access– Remote disaster tolerant replication
• (Container Services)– Virtualization (Map management), local replication (snapclone/snapshot), sparing, leveling
• RAID Services– Services supporting RAID0, RAID1 and RAID5
• FCS– Backend FCS, Mirroring and DRM FCS support, Disk Drive handling
• FM (Fault Manager)– Manage event logs, termination codes, etc.
Architecture Component Overview
04/08/23 00:02 9
On LineHP Confidential NSSNSS
• Host Port
• (SCMI) Storage Cell Management Interface
• (SCS) Storage Cell State
• Cache Manager
• Container Services
• Data Replication Manager (DRM) – Continuous Access
• FM – Fault Manager
Architecture Component Overview
Apr 8, 2023 10
Host Port
04/08/23 00:02 11
On LineHP Confidential NSSNSS
• EVA is made up of a controller pair– 2 host ports per controller module
• One controller is the master and the other slave– Actions affecting storage cell structures and database are restricted to the master controller– Example is VDisk (LUN) creation
• EVA GL (VCS3.XXX and earlier) is an Asymmetrical Virtual RAID controller– Asymmetrical LUN access
• Unit is ready read/write on one controller while it is not ready on the other controller• Simultaneous access to LUN only supported via ports on same controller
– One queue for LUN, ordering based on command arrival
• Host Ports only support Fabric connnection– 1Gb, 2Gb switches supported– Highest available link speed is auto negotiated
Host Port and EVA GL Operation
04/08/23 00:02 12
On LineHP Confidential NSSNSS
• EVA Controller Pair– Defined as a single node
• Assigned a SCSI-3 WWID
– Two control units each containing two host ports
• Each host port defined by unique port WWID
– Node and Port Identifiers are 64 bit IEEE registered numbers, with a portion assigned by a company ID and the rest by a HP specific method to ensure uniqueness of the identifiers.
EVA GL and Host Port IDs
04/08/23 00:02 13
On LineHP Confidential NSSNSS
• Handles front end FC services
• Decodes and sequences instructions
• Controller responses to Host
• Assigns work to Code Highway
• Passes along SCMI commands to SCMI module
• Supports SCSI interface
Host Port Module
04/08/23 00:02 14
On LineHP Confidential NSSNSS
Host Port
CacheManager
RaidServices
FCServices
DRM Core
DRM Log
DRM FC
HP Tachyons
Device Tachyons Mirror Tachyon
DRM Copy
EMUENVIRONMENTAL
MONITOR UNIT
SCMIServices
2
HTB
HTB XD
HTB
EETB
XD XD
XD
SEST,ERQ,IMQ
FED
MFCD
FED
SEST,ERQ,IMQSEST,IMQ ERQ
TDCB
DTD
TDCB
ALLOC DEALLOC
EXECRTOS
CNODE
CODE HIGHWAY
FaultManager
EIRP,TEIRP
EIP
OCPOPERATIONAL
CONTROL PANEL
ALLCOMPONENTS
SCSSTORAGE
CELLSTATE
TDSD, ELSD, MFCD,FED
CONFIG/STATE
CONFIG/STATE
SCSCB
2
CONFIGSTATE
CONFIGSTATE
3CONFIGSTATE
COMMANDSTATE
4 4
11
XD 3CS
CONTAINERSERVICES
XD
XD
5
3
5
CONFIG
STATE
ALLOC
DEALLOC
6
6 CSIO
CSLDREADY
CSIO
Apr 8, 2023 15
SCMI
04/08/23 00:02 16
On LineHP Confidential NSSNSS
• Architected interface used by external management agents (Command View/Bridge) to communicate with the EVA
• Communication via SCSI Send Receive Diagnostics– All SCMI commands made through LUN0
– Commands come in via SCMI command packet
– Response via SCMI response packet
– Original design limited response to a single attribute
• In order to reduce message traffic super SCMI commands developed which return a lot on information via a single response
SCMIStorage Cell Management
Interface
04/08/23 00:02 17
On LineHP Confidential NSSNSS
• External management agent uses SCMIApi or RealSCMI to communicate with the EVA
• SCMI Server processes the command inside of VCS
• SEND DIAGNOSTIC command - use page code 90 (vendor specific). Contains SCMI command packet, and command buffers(2). 64KB max buffer size.
• RECEIVE DIGNOSTIC command - returns the result in SCMI response packet and response buffers(2).
• Host Port layer handles matching of the send/receive pair and rejecting illegal combination.
• Built in security mechanism by establishing password (encrypted password is transmitted).
• The agent (client) must “log-in” using the correct password to be able to send SCMI commands for execution
SCMIStorage Cell Management
Interface
04/08/23 00:02 18
On LineHP Confidential NSSNSS
• Limitations– The system processes one send/receive diagnostic at a time
– This means when the system is synchronously executing a command via send receive diagnostic, until that command completes the next management command is held up
– When a management command is held up the management agent loses manageability of the array for that time
– Asynchronous background delete example
• Designing commands that take along time to execute
• See SCMI Spec section 6.7, 5.2.5, 4.57.1
SCMIStorage Cell Management
Interface
Apr 8, 2023 19
State (SCS)
04/08/23 00:02 20
On LineHP Confidential NSSNSS
Host Port
CacheManager
RaidServices
FCServices
DRM Core
DRM Log
DRM FC
HP Tachyons
Device Tachyons Mirror Tachyon
DRM Copy
EMUENVIRONMENTAL
MONITOR UNIT
SCMIServices
2
HTB
HTB XD
HTB
EETB
XD XD
XD
SEST,ERQ,IMQ
FED
MFCD
FED
SEST,ERQ,IMQSEST,IMQ ERQ
TDCB
DTD
TDCB
ALLOC DEALLOC
EXECRTOS
CNODE
CODE HIGHWAY
FaultManager
EIRP,TEIRP
EIP
OCPOPERATIONAL
CONTROL PANEL
ALLCOMPONENTS
SCSSTORAGE
CELLSTATE
TDSD, ELSD, MFCD,FED
CONFIG/STATE
CONFIG/STATE
SCSCB
2
CONFIGSTATE
CONFIGSTATE
3CONFIGSTATE
COMMANDSTATE
4 4
11
XD 3CS
CONTAINERSERVICES
XD
XD
5
3
5
CONFIG
STATE
ALLOC
DEALLOC
6
6 CSIO
CSLDREADY
CSIO
04/08/23 00:02 21
On LineHP Confidential NSSNSS
• Storage Cell State (SCS – State)– Inoperative/Operative unit handling– SCMI requests to add/remove objects from the system– Return info about objects– Unit presentation– Pullover– Failover– Meltdowns– Meltdown recovery– ILF disk management– State database (Object Store Management)– RSS management– add/remove devices– Cell mastership– Error reporting
SCS Functionality
04/08/23 00:02 22
On LineHP Confidential NSSNSS
• Cell State Manager (CSM)– Makes all State decisions, controls state of EVA– Active only on the master controller– Manages Quorum Disks– Owns SCS data base– SCMI command processing– Cell realization– Unit failover
• Cell Volume Manager (CVM)– Volume transitions– RSS membership– Meltdown level
• Cell State Agent (CSA)– Manipulates volatile data structures on behalf of CSM
• Device Discovery
SCS Components
04/08/23 00:02 23
On LineHP Confidential NSSNSS
• Quorum Disks– RSS0 is a special RSS that tracks the quorum disks
• It is the only RSS that has disks from multiple disk groups• It is the only RSS that has disks that are all members of other RSSs
– At least 5 disks mirrored, max 16, 1 per disk group, 1 per shelf• Master owns, slave cannot access quorum drives• Read one, write all – nway write• User notified when all quorum disks are lost• Special quorum disks called golden quorum, used in single controller
configuration• Kept in synch using an incarnation number
– In event of crash check all incarnation numbers– SCS data base resides on quorum disks– SCS data base keeps information about the current storage cell configuration
• Storage Cell, Disk Groups, VDisks, DR Groups– Journals for Metadata Updates (Can be a performance issue)
SCS Components
04/08/23 00:02 24
On LineHP Confidential NSSNSS
• RSS Membership– A disk is not available for storage if it is not a member of an RSS– When new drives are added to the system they must be added to
existing RSSs or new RSSs must be created– When drives are removed from the system it may require that RSSs
are merged
• RSS Size– RSSs are 6 to 12 drives– When an RSS drops below 6 drives it will merge with another RSS to
create a larger RSS– When an RSS grows beyond 11 drives it will be split to create 2
RSSs– A merge can force a split– Optimal size targeted by the system is 8 drives
RSS Management
04/08/23 00:02 25
On LineHP Confidential NSSNSS
• RSS Goals– Size is important
• Optimal size targeted by system is 8• Must be greater then 5 and less then 12• When an RSS goes to 5 or less it is merged with another RSS is
another RSS is available• When an RSS grows to 12 or greater it is split into two smaller
RSSs of size 6 or greater– Every member has a mirror partner
• Talk about VA R1 geometry vs EVA R1 geometry– Mirror partners should be on different shelves– RSS Members should be on different shelves– Mirror partners same size– RSS members same size
RSS Management
04/08/23 00:02 26
On LineHP Confidential NSSNSS
• Adding a Single Drive to an LDAD– Add a single disk then add to RSS with smallest odd membership
• If more than 1 to choose from then select based on shelf numbers and disk sizes
• Adding Multiple Drives to an LDAD– Try to mate all unpaired disks
– Try to make it so everyone has a partner on a different shelf
– If more than 5 disks try to create as many new RSSs of size 8 and a new smaller RSS with what’s left
• Things Not Guaranteed– Mirror partners will be on a different shelf
– All RSS members will be on a different shelf
– Don’t tear apart good RSSs to make RSSs with drives on different shelves
– Don’t make 4 6 member RSSs into 3 8 member RSSs
RSS Management
Apr 8, 2023 27
Cache and Battery
04/08/23 00:02 28
On LineHP Confidential NSSNSS
Cache and Battery State
• Cache Policy:– The battery capacity (i.e., write cache holdup time) is a major input for
determining what is called the Cache Policy
– Cache Policy determines whether or not a unit is presented to hosts, which controller it is presented through, and whether it operates in write-back or write-through mode
04/08/23 00:02 29
On LineHP Confidential NSSNSS
Battery Holdup and Cache Policy
04/08/23 00:02 30
On LineHP Confidential NSSNSS
The Storage Cell and Cache Policy
Storagecell Slave Battery System Bad
Storagecell Slave Battery System Low
Storagecell Slave Battery System Good
Storagecell Master Battery System Bad
No unit presentation except SACD
All units writethrough on Storagecell Slave
All units writeback on Storagecell Slave
Storagecell Master Battery System Low
All units writethrough on Storagecell Master
All units writethrough on both Storagecell Master and Slave
All units writeback on Storagecell Slave
Storagecell Master Battery System Good
All units writeback on Storagecell Master
All units writeback on Storagecell Master
All units writeback on both Storagecell Master and Slave
Adapted from “VCS Battery Manager Overview” by Bryan Walder (Aug 29, 02).
• When one controller’s battery system is no longer good, units move to the other controller, if its battery state is better
04/08/23 00:02 31
On LineHP Confidential NSSNSS
Battery Holdup Times
• GL–Two batteries–Low holdup time96 hours
• XL Lite (4000, 6000)–One battery–Low Holdup Time in Write Through is about 96 hours–Normal Holdup Time in Write Back mode is up to 242 hours
• XL (8000)–Two batteries–Low Holdup Time in Write Through is about 96 hours–Normal Holdup Time in Write Back mode is up to 244 hours
04/08/23 00:02 32
On LineHP Confidential NSSNSS
Cache Management for Dummies
• Terminology:– Dirty Data
• Write cache data that has not been flushed to disk
– Write-back caching
• Committing data when it reaches write cache and is mirrored on the other controller to reduce write latencies
– Write-through caching
• Disabling write cache and forcing a write to successfully write to disk before returning successful status
– Atomic Write
• Guarantee that for any write up to 128K that does not cross a 128K boundary that a read of the data will either return all old data or all new data
04/08/23 00:02 33
On LineHP Confidential NSSNSS
Cache Management for Dummies
• Terminology:– Fail-over
• Process of failing over a controllers write cache to the other controller
– Crash-over
• The process of reconstructing local cache data structures following a controller power cycle
– Volatile Memory
• Non battery backed memory – assumed to not survive a power cycle
– Non-volatile Memory
• Battery backed memory – assumed to survive a power cycle
– SACD (Storage Array Control Device)
04/08/23 00:02 34
On LineHP Confidential NSSNSS
Cache Benefits
• Benefits of Caching:– The cache acts as a holding point between front and back end
operations for a given piece of data
– Reduced host port command latency (disk v. electronic speed):
• Read hits to already cached data
• Write-back for absorbing bursty write data at electronic speed—can achieve electronic speed for absorbing new host writes as long as the cache doesn’t fill up, and over time, the average host write data rate is less than the rate at which the media can absorb the data.
04/08/23 00:02 35
On LineHP Confidential NSSNSS
Cache Buffers
• Cache Buffers:–Block = 512 bytes–GL Buffer = 2048 bytes (populated with 1 to 4 blocks of user
data)–XL Buffer = 8192 bytes (populated with 1 to 16 blocks of user
data)–Cache Page = 128 kilo bytes
04/08/23 00:02 36
On LineHP Confidential NSSNSS
Cache Layout GL and XL (4000, 6000)
A Write Primary256MB Non-volatile
B Write Mirror256MB Non-volatile
A Read512MB Volatile
B Write Primary256MB Non-volatile
A Write Mirror256MB Non-volatile
B Read512MB Volatile
Cache-A Cache-B
04/08/23 00:02 37
On LineHP Confidential NSSNSS
XL (8000)
A Write Primary512MB Non-volatile
B Write Mirror512MB Non-volatile
A Read1024MB Volatile
B Write Primary512MB Non-volatile
A Write Mirror512MB Non-volatile
B Read1024MB Volatile
Cache-A Cache-B
04/08/23 00:02 38
On LineHP Confidential NSSNSS
Host Port
CacheManager
RaidServices
FCServices
DRM Core
DRM Log
DRM FC
HP Tachyons
Device Tachyons Mirror Tachyon
DRM Copy
EMUENVIRONMENTAL
MONITOR UNIT
SCMIServices
2
HTB
HTB XD
HTB
EETB
XD XD
XD
SEST,ERQ,IMQ
FED
MFCD
FED
SEST,ERQ,IMQSEST,IMQ ERQ
TDCB
DTD
TDCB
ALLOC DEALLOC
EXECRTOS
CNODE
CODE HIGHWAY
FaultManager
EIRP,TEIRP
EIP
OCPOPERATIONAL
CONTROL PANEL
ALLCOMPONENTS
SCSSTORAGE
CELLSTATE
TDSD, ELSD, MFCD,FED
CONFIG/STATE
CONFIG/STATE
SCSCB
2
CONFIGSTATE
CONFIGSTATE
3CONFIGSTATE
COMMANDSTATE
4 4
11
XD 3CS
CONTAINERSERVICES
XD
XD
5
3
5
CONFIG
STATE
ALLOC
DEALLOC
6
6 CSIO
CSLDREADY
CSIO
04/08/23 00:02 39
On LineHP Confidential NSSNSS
Cache Manager Operations
• Host Port Reads/Writes (HP Interface)
• Mirroring write data to other controller
• Cooperation with DRM for order preservation
• Full stripe write aggregation for RAID5 to avoid RMW penalty
• R5 parity recovery
• World Peace
Apr 8, 2023 41
Active-Active Controller Support on EVA
EVA 3000, 5000 VCS 4.XXXEVA 4000, 6000, 8000
Apr 8, 2023 42
Active-Active Controller Support
− Active-active multi-pathing− Vdisk failover− Controller failover
Apr 8, 2023 43
Active-Active Multi-Pathing
What is active-active multi-pathing?
− On the EVA 3000/5000 a Vdisk is preferred to a controller and it can only be accessed by that preferred controller• To read or write the vdisk from the other controller it must
be moved to that other controller
Apr 8, 2023 44
Active-Active Multi-Pathing
What is active-active multi-pathing?
− On the EVA 4000/6000/8000 a Vdisk is “mastered” by one controller in the controller pair but it can be read from and written to via the “slave” controller in the controller pair
− This ability to access the Vdisk through either controller allows for active-active load balancing, path failover, and the support of native failover software on the servers
Apr 8, 2023 45
Active-Active Multi-Pathing
Vdisk access via the master controller
− All read and write requests are sent to the master controller
− The only data that moves across the mirror port between controllers is write data being mirrored to the slave controller’s mirror write cache
Apr 8, 2023 46
Active-Active Multi-Pathing
Read cachePrimary
write cache
Mirrorwrite cache
Read cachePrimary
write cache
Mirrorwrite cache
Host write
Master controllerVirtualDisk
Vdisk access via Master controller
Host read
Server
Transfers across the mirror ports
Slave controller
Apr 8, 2023 47
Active-Active Multi-Pathing
Vdisk access via the slave controller
− All read and write requests are sent to the master controller via the controller mirror ports
− Both read data and write data moved between the controllers via the controller mirror ports
Apr 8, 2023 48
Active-Active Multi-Pathing
Vdisk reads via the slave controller
− Read and write requests are received by the slave controller
− All requests are sent to the master controller− Reads are fulfilled from the read cache on the master
controller via the mirror port between the controller− A performance penalty is paid for read requests on
the slave controller
Apr 8, 2023 49
Active-Active Multi-Pathing
Vdisk writes via the slave controller
− Writes are fulfilled by first putting the data in the mirror half of the write cache on the slave controller and then sending the data to the master controller via the mirror port where it goes into the primary write cache
− Vdisks being replicated by Continuous Access can be written via the slave controller
− Minimal performance penalty for write requests to the slave controller
Apr 8, 2023 50
Active-Active Multi-Pathing
Read cachePrimary
write cache
Mirrorwrite cache
Read cachePrimary
write cache
Mirrorwrite cache
Master controllerVirtualDisk
Vdisk read via non-mastering controller
Host read request
Server
Transfers across the mirror ports
Slave controller
Apr 8, 2023 51
Active-Active Multi-Pathing
Read cachePrimary
write cache
Mirrorwrite cache
Read cachePrimary
write cache
Mirrorwrite cache
Host write
Master controllerVirtualDisk
Vdisk write via non-mastering controller
Server
Transfers across the mirror ports
Slave controller
Apr 8, 2023 52
Vdisk Failover
Vdisk failover on XL
− Vdisk failover results in the slave controller becoming the master controller for a vdisk(s)
− Can occur in one of two manners• Implicit failover – EVA decides to change the Vdisk master• Explicit failover – Administrator or host based software decides to
change the Vdisk master− Causes
• HBA failure• SAN failure• Controller failure• Administrative decision
Apr 8, 2023 53
Vdisk Failover
Implicit Vdisk failover
− Implicit transition of a vdisk between controllers is initiated by the EVA and is based on which controller the majority of read IO requests are being received
− Measurements are taken on an hourly basis− Implicit failover occurs if >= 2/3 of the reads
occur on the slave controller
Apr 8, 2023 54
Vdisk Failover
Implicit Vdisk failover
− Based on reads because reading through the slave controller incurs a fairly large performance penalty
− Almost no performance penalty when writing through slave controller so writes are ignored
− Considered giving the administrator control of the measurement window but decided in the end not to provide this access
Apr 8, 2023 55
Vdisk Failover
Explicit Vdisk failover
− Explicit transition of a vdisk between controllers is performed either by the storage administrator or host path failover software• True64• OVMS
− Not allowed if the controller is in write-through mode during a fully allocated snapshot or snapclone creation
Apr 8, 2023 56
Vdisk Failover
Vdisk failover
− Can failover about 1TB per second regardless of the number of Vdisks being failed over
− Failover is done by group (DR Group for CA or Vdisk and snaps for other Vdisk) one vdisk at a time
Apr 8, 2023 57
Vdisk Failover
Vdisk failover
− When a Vdisk failover occurs, the Vdisk is first put into write-through mode and dirty cache entries for the Vdisk are flushed
− Metadata from the mastering controller’s policy memory is then written to the disk group reserve metadata area (a hidden Vraid 1 disk owned by the controllers) • Metadata changes are also written to a journal that reside on
the quorum disk but it is faster to use the disk group metadata area than the journal
− The metadata is then read from the reserved metadata area by the new mastering controller
− New controller takes over the Vdisk
Apr 8, 2023 58
Vdisk Failover
Read cache
Primary write cache
Mirrorwrite cache
Read cache
Primary write cache
Mirrorwrite cache
Master controllerVirtualDisk
Slave controllerVirtualDisk Master controller
Slave controller
Hidden Vraid 1
Metadata
Policy memory
Policy memory
Cache write-through mode - dirty cache entries for Vdisk are flushed
Apr 8, 2023 59
Controller Failure
Controller failure
− When a controller failure occurs all Vdisks mastered on the controller are failed over
− Can failover about 1TB per second regardless of the number of Vdisks being failed over
− Failover is done by group (DR Group for CA or Vdisk and snaps for other Vdisk) one vdisk at a time
Apr 8, 2023 60
Controller Failure
Controller failure
− Metadata changes in the controller’s policy memory do not get written to hidden metadata area or the quorum drive because the controller has failed
− The new master controller reads the metadata from the hidden metadata area and it reads metadata journal entries from the quorum drive
− The metadata journal entries from the quorum drive are applied to the metadata from the hidden metadata area to recover any metadata changes that were in-process at the time of the controller failure
Apr 8, 2023 61
Controller Failover
Read cache
Primary write cache
Mirrorwrite cache
Read cache
Primary write cache
Mirrorwrite cache
Master controllerVirtualDisk
Slave controllerVirtualDisk Master controller
Hidden VRaid 1
Metadata
Controller Failure
Policy memory
Policy memory
Quorumdrive
MetadataJournal