tr4146aggrrelocateoverview_v1.2

Upload: trey-davis

Post on 28-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    1/15

    Technical Report

    TR-4146: Aggregate Relocate (ARL) Overviewand Best Practices for Clustered Data ONTAP

    Controller Hardware UpgradesCharlotte Brooks, NetAppDecember 2013 | TR-4146

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    2/15

    2 Aggregate Relocate Overview and Best Practices

    TABLE OF CONTENTS

    1 Introduction ........................................................................................................................................... 3

    1.1 Scope .............................................................................................................................................................. 3

    2 ARL Foundation for the Innovation of Business Processes ............................................................ 3

    2.1 Options for Cluster Technology Refresh ......................................................................................................... 3

    2.2 Cost Comparison of Vol Move versus ARL for Tech Refresh ......................................................................... 5

    3 Aggregate Relocate Overview ............................................................................................................. 6

    4 Controller Head Upgrades ................................................................................................................... 7

    4.1 Double-Hop Upgrades .................................................................................................................................... 7

    4.2 Supported Upgrades ....................................................................................................................................... 9

    4.3 Command Line Interface ............................................................................................................................... 10

    4.4 Graphical User Interface ............................................................................................................................... 12

    4.5 Best Practices ............................................................................................................................................... 12

    4.6 Considerations for Failure Scenarios ............................................................................................................ 14

    References ................................................................................................................................................ 15

    Version History ......................................................................................................................................... 15

    LIST OF TABLES

    Table 1) Cost analysis breakdown.................................................................................................................................. 5

    Table 2) Cost differential between data copy and ARL solutions for controller hardware upgrades. ............................. 5

    Table 3) Supported nondisruptive head upgrades using ARL. ....................................................................................... 9Table 4) Supported platforms with ARL. ......................................................................................................................... 9

    Table 5) Command line interface options for ARL. ............ .............. .............. .............. ............... .............. .............. ...... 10

    Table 6) Compatibility of ARL between Data ONTAP releases. ............. .............. ............... .............. .............. ............. 11

    Table 7) Recommended values for cifs-ndo-duration with small block size ................................................................. 12

    LIST OF FIGURES

    Figure 1) Controller HW lifecycle chart when using physical data migration versus ARL solutions. .............................. 6

    Figure 2) Double-hop upgrade transition steps. ............................................................................................................. 8

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    3/15

    3 Aggregate Relocate Overview and Best Practices

    1 Introduction

    Todays business environments require 24/7 data availability. The storage industry delivers the basebuilding block for IT infrastructures, providing data storage for all business and objectives. Therefore,constant data availability begins with architecting storage systems that facilitate nondisruptive operations(NDOs). Nondisruptive operations have three main uses: in hardware resiliency, hardware and software

    lifecycle operations, and hardware and software maintenance operations. For the purposes of this paperwe focus on aggregate relocate (ARL) as a solution for lifecycle operations, specifically for refreshing astorage controller.

    NetAppstorage controllers are the physical component of the logical node entity that services the client

    and host requests from the upper layers of the IT infrastructure. The continuity of service during thetransition from storage controllers that are approaching an end-of-life state to shipping versions of thereplacement controller has been a cumbersome task regardless of the storage vendor. The aggregaterelocate feature eliminates the need to physically move any data. Aggregate relocate reassigns disks tothe partner node, making the partner node the owner of the aggregates. The relocation of the aggregatesmakes the partner node the point of entry for all requests from the client or host applications without clientdisruption.

    1.1 ScopeThe focus of this paper is on the aggregate relocate solution for the purpose of upgrading storagecontrollers. The initial version of this paper does not go into detail regarding other solutions for ARL. Inthe clustered Data ONTAP

    8.2 architecture, ARL is qualified for the purpose of controller hardware

    upgrades and maintenance operations. The primary points of discussion in this document include:

    Comparison of the ARL and vol move methods for cluster technology refresh

    An overview of the ARL feature and the end-to-end process of the ARL feature

    Use of the ARL command

    Continuity of data availability throughout the phases of ARL

    Best practices and considerations when planning to use ARL

    2 ARL Foundation for the Innovation of Business Processes

    The simplicity and minimal overhead of the aggregate relocate feature allow the lifecycle of a storagecontroller to be extended several months longer than a typical controller lifecycle. Aggregate relocate,being a no data copy feature, completes in a matter of seconds, whereas a data copy solution can takeweeks or months. This reduces the overall time of transition from an old controller to a new controller andlengthens the production time of each controller. The cost of ownership per controller as a function of timeis therefore reduced. The total cost of ownership is reduced by:

    Yielding a longer production period for the hardware

    Reducing the number of people required to transfer data from the old controller to the newcontroller

    Reducing the time for data migration (no physical data copy with ARL)

    2.1 Options for Cluster Technology Refresh

    There are essentially two primary methods for technology refresh in clustered Data ONTAP. The first is

    via DataMotion for Volumes (vol move) and involves moving all the data to existing or newly added nodesin the cluster to evacuate the old nodes, then removing the nodes from the cluster. This process has been

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    4/15

    4 Aggregate Relocate Overview and Best Practices

    supported in all versions of clustered Data ONTAP. The second method is using ARL and is new inclustered Data ONTAP 8.2.

    The following considerations apply to using the vol move method for tech refresh of the cluster.

    If the node root or data aggregates are using internal drives, the vol move method is required.

    If the customer has purchased new storage shelves in addition to new controllers, then vol move

    is the best solution. In the ARL method, the data remains on the same shelves. If the customer cannot tolerate disabling HA for the duration of ARL, vol move is preferred. When

    vol move is used, all controllers remain up and serving data throughout, as opposed to the ARLmethod when each controller in the HA pair will be shutdown for the duration of the controllerupgrade.

    The vol move method takes substantially longer than the ARL method, since a full data copy isrequired. The amount of time may be in the order of days or even weeks as it is directlydependent on the amount of data to be evacuated. The vol moves can, however, be staggeredover time as required. The ARL method, by contrast is in the order of several hours for eachcontroller pair since no data copy is required, and it must be completed as a single maintenanceevent.

    If the controllers being refreshed are not on Data ONTAP 8.2, the ARL method cannot be used.

    If new nodes are being added to the cluster, the cluster size and controller mix must remaincompliant with the Cluster Platform Mixing rules.

    The following considerations apply to using the ARL inplace upgrade for tech refresh of the cluster.

    If neither the node root or data aggregates are using internal drives, then ARL can be used

    If the customer is only upgrading the storage controllers, then ARL can be used, since the diskstorage remains the same

    The new controllers must support the existing shelf technology, since the controllers will beattached to the old storage.

    The original nodes must already be running Data ONTAP 8.2 or higher. If the controllers cannotbe upgraded to Data ONTAP 8.2, then the vol move method is the only option.

    If the cluster is already at the maximum supported size, or the new controllers are not supported

    in the same cluster as the old controllers (e.g. 6200 and 2200 nodes cannot exist in the samecluster), then either ARL inplace controller upgrade, or vol move to existing nodes in the clustermust be used. If a cluster is already at its maximum size and ARL is not possible for any reason,the summary process to refresh the controllers is

    1. Evacuate data from the nodes being replaced to other cluster nodes using vol move. If thereis insufficient free capacity for the data elsewhere in the cluster, additional storage will beneed to be added to the existing nodes.

    2. Remove LIFs from the old nodes

    3. Unjoin the evacuated nodes, thereby reducing the cluster size.

    4. In the new cluster configuration it may now be possible to perform ARL inplace controllerupgrade. If this is not possible, the new controllers can be joined to the cluster as additionalnodes.

    A final consideration is process complexity. The vol method is conceptually simpler since it uses only

    standard clustered ONTAP commands and does not require advanced administration skills. It is alsoavailable as a WFA workflow for the case where a new HA pair of controllers and storage will be added tothe cluster and the old controllers and storage are evacuated. In comparison, the ARL method requires anumber of different clustered ONTAP commands including commands run in maintenance mode, and isalmost exclusively CLI driven. This may be a consideration in the choice of which method to use.

    Nevertheless, it is expected going forward that ARL will become the preferred method for the majority oftechnology refresh scenarios.

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    5/15

    5 Aggregate Relocate Overview and Best Practices

    2.2 Cost Comparison of Vol Move versus ARL for Tech Refresh

    In addition to the considerations given in the previous section, the impact of system cost should be takeninto account.

    Table 1 and Table 2 show various values associated with a typical controller hardware upgrade. Twovalues are given in the tables: values associated with the data copy (vol move) method and values

    associated with the ARL method. The values are theoretical values; exact values would depend on theimpact to the business. For example, the amount of data that is being migrated for a physical datamigration solution impacts the duration of the transition time.

    The objective of the tables is to demonstrate that aggregate relocate decreases the cost associated witheach controller.

    Table 1) Cost analysis breakdown.

    Cost Variable Cost Variable Description Theoretical Value

    Controller_cost Controller price paid $50,000.00

    Production_time Number of months the controller is inproduction (life span of controller lesstransition time)

    42 months

    (48 months 6 months) Data Copy Move

    47.75 months

    (48 months !month) ARL

    Transition_time Number of months data is being copiedto upgrade the controller

    6 months (Data Copy Move)

    1 week (ARL)

    Transition_cost Cost per month for additional hardware,personnel to do the data move, overtime

    hours

    $1,000.00

    Table 2) Cost differential between data copy and ARL solutions for controller hardware upgrades.

    General Formula Using Data Copy Using ARL

    Cost ofControllerMonthly

    (Controller_cost)/(Production_time) ($50,000.00)/(42months)

    $1,190.48

    ($50,000.00)/(47.75months)

    $1,047.12

    TotalTransitionCosts

    (Transition_time)*(Transition_cost) (6 months)*($1,000.00)

    $6,000

    (1/4 month)*($1,000.00)

    $250.00

    The following figure represents the lifetime of a controller using a physical data copy solution versus anaggregate relocate solution. The physical data copy solution requires more time to complete and the newcontrollers will go into production earlier than if aggregate relocate were being used. Aggregate relocate

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    6/15

    6 Aggregate Relocate Overview and Best Practices

    allows the new controllers to be purchased and put in-place later in the cycle, extending the lifetime ofeach controller.

    Figure 1) Controller HW lifecycle chart when using physical data migration versus ARL solutions.

    3 Aggregate Relocate Overview

    Aggregate relocate (ARL) is a nondisruptive process that moves ownership of aggregates between nodesthat share storage (HA pair controllers). This data migration does not require any data to be physicallycopied. Rather, aggregate ownership is reassigned with no dependency on the HA interconnect.

    ARL is a new feature in clustered Data ONTAP 8.2 and relies on nodes with direct access to the storagehousing the identified aggregates. For the purposes of ARL, the node within an HA pair configurationtaking over ownership of the aggregates is referred to as the destination node, and the node originallyowning the aggregates is referred to as the source node. Both the source and destination nodes must berunning clustered Data ONTAP 8.2 or later to take advantage of this feature. In addition, both the sourceand destination nodes need to be:

    Fully booted

    Joined to the same cluster and within the same HA pair

    Directly cable connected to the storage shelves (HA pair configurations)

    Running the same release of Data ONTAP 8.2.x.

    The nondisruptive nature of ARL is facilitated by the clustered Data ONTAP architecture, which providesvirtualization of the networking interface from the storage resources. This virtualized infrastructure allowsa client or host request to be serviced through any node port within the cluster regardless of where thestorage resource that contains the data actually resides. During ARL, there is no change in the availabilityof the aggregate to any incoming host or application requests. During the reassignment of aggregatesfrom the source node to the destination nodem, the aggregates are offlined and then returned to theonline state once the storage resource is reassigned to the destination node. This period when theaggregate is offline and then brought back online constitutes a small window of time in which I/O will beretried until the aggregate is brought back online. This period of time is similar to the time taken forstorage failover to complete.

    Before initiating an aggregate relocate, several conditions must be true for the source and destinationnodes as well as for the aggregates identified for the relocation.

    The aggregate(s) have SFO policy (rather than CFO policy, which is traditionally assigned to rootaggregates and 7-Mode aggregates). Refer to TR-3450for SFO and CFO policy information.

    The aggregate is in the online state. An aggregate in an offline or degraded state will not beeligible for ARL.

    General Phases of ARL

    The ARL process consists of these phases.

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    7/15

    7 Aggregate Relocate Overview and Best Practices

    1. Validation Phase: Checks the conditions of the source and destination nodes as well as theaggregates to be relocated.

    2. Precommit Phase: The period for execution of any prerquisite processing required before the relocateis executed. This can include preparing the aggregate(s) for being relocated, setting flags, andtransferring certain noncritical subsystem data. Any processing performed at this stage can be simplyreverted or cleaned up.

    3. Commit Phase: This is when the actual processing associated with relocating the aggregate to thedestination node is done. Once the commit phase is entered, the ARL cannot be aborted. Thiscommit stage is time bound within a period of time acceptable to the client or host application,meaning that the time between the aggregate being offline on the source node and the aggregatecoming online on the destination node does not exceed 60 seconds.

    4. Abort Phase (Optional): An abort is performed only if the validation phase or precommit phase isabandoned by conditional checks not being met. A series of cleanup processes revert any processingactivity that happened during the validation or precommit phases.

    4 Controller Head Upgrades

    The longevity of a controller is between three and five years. As a controller nears end of life, a process is

    initiated to upgrade the hardware. Traditionally, for lack of a better solution, the process requires anextensive period of time to migrate the data on the controllers storage to other hardware. Aggregaterelocate eliminates the need to perform a physical data migration in order to upgrade the controllerhardware. Instead, the aggregates are simply logically relocated to an alternative controller for theduration of the upgrade. The data stays intact on its original storage; client and host I/O requests will beserviced by the alternative controller for the duration of the ARL.

    4.1 Double-Hop Upgrades

    There are several ways ARL can be used to upgrade storage controllers. NetApp recommends a double-hop upgrade for simplicity as the preferred method for controller hardware upgrades.

    A double-hop upgrade uses ARL to relocate aggregates between the heads of the HA pair. The high-levelprocedure for the double-hop upgrade is as follows. Note this is a summary of the process; for detailed

    execution steps, please refer to the product documentation, available currently at Using ARL to upgradecontroller hardware on a pair of nodes running clustered Data ONTAP 8.2.

    1. If the replacement nodes are running a later release of Data ONTAP (for example, 8.2.1),upgrade the source nodes to that release. A head upgrade using aggregate relocate cannot becombined with a Data ONTAP version upgrade.

    2. Use ARL to migrate aggregates from node A to node B

    3. Migrate data LIFs from node A to node B (or other nodes within the cluster)

    4. Disable SFO.

    5. Replace node A with node C (execute all setup, disk reassign, and licensing administration fornode C).

    6. Migrate data LIFs from node B to node C.

    7. Use ARL to relocate aggregates from node B to node C.

    8. Migrate data LIFs from node B to node C (or from other nodes in the cluster)

    9. Replace node B with node D (execute all setup, disk reassign of the root aggregate and sparedrives, and licensing administration for node D).

    10. Enable SFO.

    11. Migrate selected aggregates and LIFs to node D.

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    8/15

    8 Aggregate Relocate Overview and Best Practices

    Figure 2) Double-hop upgrade transition steps.

    Upgrade for Controllers with Internal Disks

    Controllers with internal disks contained within the controller chassis require the use of volume move tophysically relocate the data on the internal drives to storage connected to the new heads. ARL is notsupported for aggregates on internal disks. For example, for a FAS2200 storage controller HA pair thathas three volumes on the internal drives, volume moves would required to move the volumes to disks thatare attached to the new controllers.

    To perform a controller upgrade on this cluster, join the new controllers to the cluster, provided the clusterplatform mixing rules are complied with. As per the clustered Data ONTAP Platform Mixing Rules,clusters containing FAS22x0 controllers are limited to a maximum of 4 nodes. Therefore if internal driveswill be used in the FAS22x0s, the cluster should be limited to two nodes so that two new nodes can beadded to perform a technology refresh. The new controllers require additional storage sufficient to containthe existing data volumes on the FAS22x0 controllers. Move all the data volumes from the internal drives(as well as any volumes on external shelves) in the FAS22x0 controllers to the new heads using volumemove. Migrate LIFs and adjust failover groups, delete the aggregates, then unjoin the FAS22x0 nodesfrom the cluster.

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    9/15

    9 Aggregate Relocate Overview and Best Practices

    .

    To summarize, for controllers with internal drives and externally attached shelves, volume move

    (DataMotion for volumes) should be used for controller hardware upgrades. This process is documentedin Upgrading controller hardware on a pair of nodes by moving volumes. OnCommand WorkflowAutomation 2.2 includes a workflow for technology refresh of an HA pair controller and attached storage.

    4.2 Supported Upgrades

    Aggregate relocate is a feature introduced in clustered Data ONTAP 8.2; therefore, all controllersparticipating in the hardware controller upgrade must be running Data ONTAP 8.2 or later, and theoriginal and replacement controllers must be running the same 8.2.x release. Only platforms that aresupported in Data ONTAP 8.2 (or later) will be qualified for nondisruptive hardware controller upgradeswith ARL. Upgrades from platforms that are not supported on Data ONTAP 8.2, will not supportnondisruptive head upgrades with ARL. ARL is supported on V-series with the same guidelines as FAScontrollers and can be used to upgrade a V-series controller with another V-series, or to a FAS controller.Note that a V-series controller can be upgraded to a FAS controller providing that only NetApp shelvesare configured on the V-series. In all cases, the upgraded controllers in the HA pair must match exactly(FAS with FAS, V-series with V-series).

    The supported and qualified controller upgrades are listed in the documentation available at Using ARL to

    upgrade controller hardware on a pair of nodes running clustered Data ONTAP 8.2. Only controllerupgradesare supported controller downgrades (for example from a 62x0 to a 32x0 platform) are notsupported.

    A dual-controller enclosure refers to a single-chassis enclosure containing two controller heads. A single-controller enclosure refers to a single-chassis enclosure with a single controller head.

    Table 3) Supported nondisruptive head upgrades using ARL.

    Destination

    Source

    Controller NotSupported in DataONTAP 8.2.x

    Single-ControllerEnclosure HA

    Dual-ControllerEnclosure HA

    Controller Not

    Supported in DataONTAP 8.2.x

    No No No

    Single-ControllerEnclosure HA

    No Yes Yes

    Dual-ControllerEnclosure HA

    No Yes Yes

    Table 4) Supported platforms with ARL.

    Platform Supports ARL

    FAS2020/FAS2040/FAS2050 No not supported in clustered Data ONTAP 8.2

    FAS2220/FAS2240 Yes*

    FAS3020/FAS3040/FAS3050/FAS3070 No not supported in clustered Data ONTAP 8.2

    FAS3140/FAS3160/FAS3170 Yes

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    10/15

    10 Aggregate Relocate Overview and Best Practices

    *Note: ARL is not supported for internal drives. Volume move needs to be used to physically relocate anydata on internal drives to storage attached to the nodes that will replace the original controllers. ARLdoes not relocate the root aggregate, and cannot be used to perform an in-place controller upgrade if theroot aggregate is hosted on internal drives.

    Single Node Cluster

    A cluster consisting of a single node does not support ARL. ARL is qualif ied for support on HA pairconfigurations for the purpose of an upgrade or maintenance of the controller(s).

    4.3 Command Line Interface

    ARL is implemented as the storage aggregate relocationCLI command.

    Table 5) Command line interface options for ARL.

    storage aggregate relocation start

    Command Options Description

    -aggregate-list List of aggregates to relocate to HA partner node.For all aggregates, use the asterisk symbol.

    -node The name of the source node for the aggregatesbeing relocated.

    -destination The name of the destination node that theaggregates will be relocated to.

    -override-vetoes This vetoes some checks on the source that wouldprevent the relocation attempt when set to true.Refer to ARL Upgrade documentation on theNetApp support site for conditions which can bevetoed.

    -relocate-to-higher-version This allows aggregates to be relocated to a nodethat is running a higher version of Data ONTAP.Currently this flag has not been qualified as part ofan in-place controller upgrade process; hence the

    original and replacement nodes must be runningthe same release of Data ONTAP 8.2.x beforestarting the process.

    -override-destination-checks This overrides certain checks on the destination ifset to true. Refer to ARL Upgrade documentationon the NetApp support site for checks which canbe vetoed.

    FAS3210/FAS3220/FAS3240/FAS3250/FAS3270 Yes

    FAS6030/FAS6070 No not supported in clustered Data ONTAP 8.2

    FAS6040/FAS6080 Yes

    FAS6210/FAS6240/FAS6280/FAS6290 Yes

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    11/15

    11 Aggregate Relocate Overview and Best Practices

    storage aggregate relocation start

    -ndo-controller-upgrade This specifies if the relocation operation is beingdone as a part of non-disruptive controller upgradeprocess. Aggregate relocation will not change thehome ownerships of the aggregates while

    relocating as part of controller upgrade. The defaultvalue is false. Requires advanced privilege.

    Temporary Reassign Parameter

    When doing a controller upgrade via ARL, the ndo-controller-upgradeadvanced privilege option

    is required, so that ownership of the aggregates is not reassigned d to the partner node. The relocation tothe partner node is temporary during a controller hardware upgrade and the aggregates will be returnedto their home node at the end of the process.

    Relocating to a Higher Version of Data ONTAP

    Upgrading a new controller type may introduce a new version of Data ONTAP into the cluster. Forexample, suppose a FAS3200 series HA pair running Data ONTAP 8.2 is to be upgraded with controllersformatted with Data ONTAP 8.2.1, using the aggregate relocate nondisruptive process. It is not supportedto relocate aggregates from a Data ONTAP 8.2 node to a Data ONTAP 8.2.1 node; it is required toupgrade the existing FAS3200 HA pair to Data ONTAP 8.2.1 before starting the process;

    A source node formatted for a later version of Data ONTAP cannot relocate aggregates to a destinationnode running an older version of Data ONTAP. Generally, this will not be an issue since ARL is supportedonly for the purposes of a controller upgrade.

    Table 6 shows the supported combinations of clustered Data ONTAP versions with ARL. In summary, inall cases, the source and replacement nodes must be running the same major/minor version of DataONTAP.

    Table 6) Compatibility of ARL between Data ONTAP releases.

    Destination

    Source

    Data ONTAP 8.1 Data ONTAP 8.2 Data ONTAP 8.2.1

    Data ONTAP 8.1 N/A No No

    Data ONTAP 8.2 No Yes No

    Data ONTAP 8.2.1 No No Yes

    Forcing Aggregate Relocate

    A set of checks is carried out on both the source and destination nodes participating in the aggregate

    relocate. An option is provided to bypass the conditional checks done on the destination node, therebyforcing the aggregate relocate. Using the override-destination-checks parameter may result in aprolonged time to complete for the ARL due to the nonoptimized conditions on the destination node. Theprolonged time for the ARL to complete may exceed the allowable window of time for an IO to besuccessful and cause a disruption to the client.

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    12/15

    12 Aggregate Relocate Overview and Best Practices

    Two sets of checks are executed on the aggregates: common checks and aggregate granular checks.The common checks will have the same result for all aggregates while the aggregate granular checks arespecific to each aggregate. If any of these conditional checks fails, then the ARL may fail completely orfor a specific aggregate.

    If a common check fails, the entire ARL job will fail. For example, some of the common checks are:

    Are the source and destination nodes for the ARL job in the same cluster? Is the destination node in quorum?

    Are compatible versions of clustered Data ONTAP (8.2 or higher) on the source and destinationnodes?

    If the checks done granularly at the aggregate level fail, then relocation of one or more aggregates mayfail. For example, some of the more common aggregate checks are:

    Will the relocation of the aggregate cause any hard limits (such as a FlexVollimit) to be

    exceeded?

    Does the destination node see all the disks assigned to the aggregate(s) being relocated?

    4.4 Graphical User InterfaceAggregate relocate is currently available only from the CLI.

    4.5 Best Practices

    Initiating Aggregate Relocate in a Degraded Cluster

    Do not use ARL in parallel with storage failover or disaster recovery events. During a storage failover or

    disaster recovery event, the system is already at an increased risk of failure due to the vulnerability of theinfrastructure in a degraded state. NetApp does not recommend introducing additional processing suchas an ARL job unless completely necessary. In addition, NetApp does not recommend making anychanges to the aggregate or the contained volumes during an aggregate relocate. It is essential tocomplete the ARL in a defined period of time to prevent disruption to the client or host application.

    Therefore, limiting additional processing on the resources being relocated allows the ARL completion tobe more deterministic.

    SMB File Shares and Small Random Workloads

    NDO, including ARL, is supported for customers with Hyper-V with Continuously Available shares andSMB 3.0 in Data ONTAP 8.2, as documented in !"#$%&&' )*+,-./012-34 514/62-*+. 7*/ 89: ;-

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    13/15

    13 Aggregate Relocate Overview and Best Practices

    Workload Percentage having 4KB (or smaller) IOsize

    Setting for cifs-ndo-duration

    50-75% medium

    75-100% low

    For example, in a mixed workload of 5% 16KB IO, 50% 8KB IO, and 45% 4KB no change would need tobe made. For a mixed workload of 55% 4KB IO and 45% 8KB IO then the option would be set to medium.The option can be set using the -storage failover modifycommand, using advanced privilege.The CIFS license must be enabled.

    FlexVol Limits and Concurrent Aggregate Relocates

    An administrator can issue multiple aggregate relocate commands in parallel in a cluster. For situations in

    which concurrent aggregate relocate jobs are initiated, it is necessary to consider limits on the destinationnode. The aggregate relocate validation phase checks to make sure that there is adequate space toaccommodate the increase in the number of FlexVol volumes after the relocation. However, if severalrelocates are happening in parallel, the relocation of several aggregates to the same destination is notaccounted for. For example, if a node has 450 volumes and the limit is 500 volumes, if aggregate A andaggregate B being moved each have 35 volumes, the FlexVol limit would not be flagged during thevalidation phase. However, once the aggregates are both relocated to the destination node, the FlexVollimit would be exceeded and volumes would be taken offline. As a best practice, it is advisable to moveaggregates in sequence when limits are at risk of being exceeded.

    Aggregate State

    Aggregate relocate will only succeed for aggregates that are in an online state. Prior to initiating a

    controller hardware upgrade, verify that all aggregates are online and in a healthy state. Any aggregate ina degraded or offline state will not be relocated.

    Volume Move and Aggregate Relocate

    A volume move job can be issued concurrently with an ARL. For example, a customer may issue a

    volume move on a set of internal drives while at the same time relocating an external aggregate to thepartner node in preparation for a controller hardware upgrade of a low-end platform. The expectedbehavior of a volume move, done in parallel with an ARL, depends on the phase of the volume move.During the iterative phase, when data is being physically migrated to the destination aggregate, ARL canproceed as necessary. However, during the cutover phase, only limited processing by other jobs canoccur to allow the volume move to complete in the defined cutover period. NetApp recommends that youexecute ARL jobs separately from a volume move job when all resources are on the same controllers. Formore information on DataMotion

    software for Volumes refer to TR-4075 (see details in the References

    section of this document).

    Maximum Number of Aggregates and Volumes

    When relocating aggregates between HA partners, consider the variance in aggregate size when moving

    between dissimilar controller types. In general, customers will move to a controller with equal or higherlimits for the aggregate size. The supported controller upgrades are listed in the documentation and thesecombinations should take the supported aggregate size and number of volumes into account. As a bestpractice, verify the limits on the original and replacement controllers on Hardware Universe as part of theupgrade planning process to ensure that the replacement controllers has equal or higher limits than theoriginal controllers.

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    14/15

    14 Aggregate Relocate Overview and Best Practices

    Configuring Timeouts

    There is a short period during the relocation of the aggregate when I/O requests are retried whileaggregates are brought online on the partner node. The client I/O retry would be based on the retrymethod instantiated on the client. Configure client or host response windows to exceed the amount oftime they may take. NetApp recommends that these retry windows be set to 60 seconds at a minimum.NetApp also recommends increasing the retry window to 120 seconds for protocols that will support it.

    Mutually Exclusive Operations

    There are several activities that will prevent an ARL from proceeding. If any of the following activities are

    in progress when an ARL is initiated, ARL will not start.

    The source node is either executing a takeover of a partners disks or being taken over.

    The source node is executing a giveback of a partners disks.

    The source node is in the process of shutting down.

    The source node is out of quorum in the cluster.

    The source node is in the process of reverting the clustered Data ONTAP version.

    When any of the above conditions are true, the ARL job would need to be initiated once the activitieshave completed. For the destination node, the following operations are handled differently.

    While a destination node is executing a giveback, an ARL will proceed, but NetApp does notrecommend doing this as a best practice.

    4.6 Considerations for Failure Scenarios

    Storage Failover During Aggregate Relocate

    Storage failover and ARL have dependencies that impact the overall system if both events are triggered

    concurrently. In general, if the SFO event occurs when SFO is enabled, then failover will occur asexpected. However, if SFO is disabled, which is required during the ARL process for head upgrade, as

    described in section 4.1, then failover would not occur and some level of system resiliency and availabilitywould be impacted.

    Aggregate Relocate Failure of an Aggregate

    The ARL process will relocate each individual aggregate serially. Use the storage aggregaterelocate showcommand, to display successful aggregate relocates as well as any that incurrederrors. Any aggregate that incurred an error will have an associated cause for the failure. Whenapplicable a message will indicate a course of action to correct the error; for example, if an aggregatewas not relocated due to an aggregate-level check. You may use the override-vetoesoption toavoid this check.

  • 7/25/2019 TR4146AggrRelocateOverview_V1.2

    15/15

    15 Aggregate Relocate Overview and Best Practices

    References

    The following references were used in this technical report.

    TR-3450: High-Availability Controller Configuration Guide and Best Practiceshttp://media.netapp.com/documents/tr-3450.pdf

    TR-4075: DataMotion for Volumes Overview, Best Practices and Optimizationhttps://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=72418&contentID=75392

    Using ARL to upgrade controller hardware on a pair of nodes running clustered Data ONTAP 8.2

    https://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=103946&contentID=158025

    Version History

    Version Date Document Version History

    Version 1.0 May 2013 Initial release

    Version 1.1 June 2013 Minor updates

    Version 1.2 December 2013 Minor updates

    NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of anyinformation or recommendations provided in this publication, or with respect to any results that may beobtained by the use of the information or observance of any recommendations provided herein. Theinformation in this document is distributed AS IS, and the use of this information or the implementation ofany recommendations or techniques herein is a customers responsibility and depends on the customersability to evaluate and integrate them into the customers operational environment. This document andthe information contained herein may be used solely in connection with the NetApp products discussedin this document.

    2014 NetApp, Inc. All rights reserved. No portions of th is document may be reproduced without prior written consent of NetApp,Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, DataMotion, Data ONTAP, andFlexVol are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands orproducts are trademarks or registered trademarks of their respective holders and should be treated as such. TR-XXX-MMYR

    Refer to the Interoperability Matrix Tool(IMT) on the NetApp Support site to validate that the exact productand feature versions described in this document are supported for your specific environment. The NetAppIMT defines the product components and versions that can be used to construct configurations that aresupported by NetApp. Specific results depend on each customer's installation in accordance with publishedspecifications.