uptime bulletin q1 · 2020-03-19 · uptime bulletin a newsletter from mres, a division of dell emc...

4
Search the VNX Series page for Uptime Bulletin here: https://support.emc.com/prod ucts/12781 WED LIKE YOUR FEEDBACK ABOUT THE UPTIME BULLETIN. SEND US YOUR IDEAS FOR FUTURE TOPICS AT : [email protected] Customer Documentation https://mydocuments.emc.com/ VNX http://emc.com/vnxesupport Unity EFDs may overheat on array power down 1 ILC with replication bug is fixed in Unity 4.3.0.1522077968 1 Unity release 4.3.0.1522077968 2 Backing up D@RE encryption keys 2 Bare minimum rec- ommended code levels for VNXe 2 Did you know? 2 Latest code releas- es and targets 3 CloudIQ update 4 UPTIME BULLETIN A Newsletter from MRES, a Division of Dell EMC For Unity/VNX/VNXe Q1 VOLUME 24 March, 2018 Certain EFDs may overheat causing DL when some Unity storage systems are shut down Unity X50 all-flash storage systems recently began shipping with a newly supported power sup- ply, model number 071-000-712-01 (Gen2 CFF 1100W AC DC PSU). Dell EMC is researching a potential deficiency in this power supplys firmware which may, under some conditions, result in the power supply still supplying power to the DPE after the power supply has been told to shut down. As a result, sometimes when the user powers down their storage system through Unisphere, it is possible that most of the system including the fans will shut down, but the disks in the DPE may still be drawing power. This would cause the disks to heat up, and one type of EFD in particular may get so hot that it can permanently fault from an overheat condition. This can result in a potentially significant number of these EFDs faulting when the storage system is powered back up, and if multiple such faulted drives are in the vault, then the storage proces- sors may remain in service mode. The drives can not be recovered from this particular scenar- io, and in some cases the storage system may have to be re-imaged with new replacement drives, resulting in data loss. Systems are not at risk while in normal operation. EFDs have been found to be most vulnerable to overheating from this condition. If you have a Unity storage system that has one or more power supplies model 071-000-712-01 as well as any EFDs in your DPE, you may be vulnerable to this condition. If you have this power supply model and EFDs in the DPE of your system, please consult knowledge base solution KB/ETA 518863 for the latest information about upcoming remedial solutions. Until/unless this issue is resolved, avoid powering down the storage system. If the storage system must be powered down, customers are recommended to follow existing power down procedures and then remove the power cables to the power supplies as well. Dell EMC has released Unity OE version 4.2.3.9670635 which is functionally the same as 4.2.2.9632250 OE with the addition of the preventative fix for this issue. That release went GA on March 15th, and it only adds this one fix. EMC has already cut in this fix on all new Unity shipments. The upcoming Unity release, version 4.3.x also fully fixes this issue. Unity customers using in-line compression (ILC) and replication should prioritize upgrading to OE 4.3.0.1522077968 If you are using in-line compression (ILC) and replicating your file systems and you are running Unity OE versions prior to 4.3.0.1522077968, you are potentially exposed to an issue that may cause latent corrup- tion within a file system in the event that your replicated, ILC file system has ever experienced a replication failover. Latent file system corruption may result in the file system going offline and requiring a recovery at some point in the future. Unity customers in this configuration should consider prioritizing an upgrade to 4.3.0.1522077968 when it becomes GA. Any new file systems created after upgrading 4.3.0.1522077968 will no longer have any exposure to this issue. Customers with file systems in this configuration that were created prior to upgrading to 4.3.0.1522077968 need to contact technical support for recommended proce- dures to detect, and if necessarily, recover from the corruption. Customers may reference KB article 519323 when opening a case in reference to this particular issue.

Upload: others

Post on 30-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UPTIME BULLETIN Q1 · 2020-03-19 · UPTIME BULLETIN A Newsletter from MRES, a Division of Dell EMC For Unity/VNX/VNXe Q1 VOLUME 24 March, 2018 Certain EFDs may overheat causing DL

Search the VNX Series page for Uptime Bulletin here:

https://support.emc.com/products/12781

WE’D LIKE YOUR FEEDBACK

ABOUT THE UPTIME BULLETIN. SEND US YOUR

IDEAS FOR FUTURE TOPICS AT :

[email protected]

Customer

Documentation

https://mydocuments.emc.com/VNX

http://emc.com/vnxesupport

Unity EFDs may overheat on array power down

1

ILC with replication bug is fixed in Unity 4.3.0.1522077968

1

Unity release 4.3.0.1522077968

2

Backing up D@RE encryption keys

2

Bare minimum rec-ommended code levels for VNXe

2

Did you know? 2

Latest code releas-es and targets

3

CloudIQ update 4

UPTIME BULLETIN A Newsletter from MRES, a Division of Dell EMC For Unity/VNX/VNXe Q1

VOLUME 24 March, 2018

Certain EFDs may overheat causing DL when some Unity storage systems are shut down

Unity X50 all-flash storage systems recently began shipping with a newly supported power sup-ply, model number 071-000-712-01 (Gen2 CFF 1100W AC DC PSU). Dell EMC is researching a potential deficiency in this power supply’s firmware which may, under some conditions, result in the power supply still supplying power to the DPE after the power supply has been told to shut down. As a result, sometimes when the user powers down their storage system through Unisphere, it is possible that most of the system including the fans will shut down, but the disks in the DPE may still be drawing power. This would cause the disks to heat up, and one type of EFD in particular may get so hot that it can permanently fault from an overheat condition. This can result in a potentially significant number of these EFDs faulting when the storage system is powered back up, and if multiple such faulted drives are in the vault, then the storage proces-sors may remain in service mode. The drives can not be recovered from this particular scenar-io, and in some cases the storage system may have to be re-imaged with new replacement drives, resulting in data loss. Systems are not at risk while in normal operation.

EFDs have been found to be most vulnerable to overheating from this condition. If you have a Unity storage system that has one or more power supplies model 071-000-712-01 as well as any EFDs in your DPE, you may be vulnerable to this condition.

If you have this power supply model and EFDs in the DPE of your system, please consult knowledge base solution KB/ETA 518863 for the latest information about upcoming remedial solutions. Until/unless this issue is resolved, avoid powering down the storage system. If the storage system must be powered down, customers are recommended to follow existing power down procedures and then remove the power cables to the power supplies as well.

Dell EMC has released Unity OE version 4.2.3.9670635 which is functionally the same as 4.2.2.9632250 OE with the addition of the preventative fix for this issue. That release went GA on March 15th, and it only adds this one fix. EMC has already cut in this fix on all new Unity shipments. The upcoming Unity release, version 4.3.x also fully fixes this issue.

Unity customers using in-line compression (ILC) and replication should prioritize upgrading to OE 4.3.0.1522077968 If you are using in-line compression (ILC) and replicating your file systems and you are running Unity OE versions prior to 4.3.0.1522077968, you are potentially exposed to an issue that may cause latent corrup-tion within a file system in the event that your replicated, ILC file system has ever experienced a replication failover. Latent file system corruption may result in the file system going offline and requiring a recovery at some point in the future. Unity customers in this configuration should consider prioritizing an upgrade to 4.3.0.1522077968 when it becomes GA. Any new file systems created after upgrading 4.3.0.1522077968 will no longer have any exposure to this issue. Customers with file systems in this configuration that were created prior to upgrading to 4.3.0.1522077968 need to contact technical support for recommended proce-dures to detect, and if necessarily, recover from the corruption. Customers may reference KB article 519323 when opening a case in reference to this particular issue.

Page 2: UPTIME BULLETIN Q1 · 2020-03-19 · UPTIME BULLETIN A Newsletter from MRES, a Division of Dell EMC For Unity/VNX/VNXe Q1 VOLUME 24 March, 2018 Certain EFDs may overheat causing DL

Did you know?

Dell EMC offers ESRS remote connectivity. There are many benefits to having your system remotely connected. Even so, not all of our install base takes advantage of these numerous benefits.

ESRS offers users secure, real-time proactive wellness monitoring, remote problem resolution, and data reporting. Some of the many benefits

of ESRS include:

73% faster problem resolution (based on cases reported and closed in 2016)

Automated issue detection and prevention, notification, and case creation

Advanced intelligence and personalized

insights through MyService360

Remote troubleshooting and Release Certi-fication Matrix analysis on (hyper)-converged systems

Proactive and predictive health analysis and recommendations through Unity Cloud-IQ

Page 2 UPTIME BULLETIN

Back up your encryption keys frequently if you are using Data at Rest Encryption (D@RE) If you are using Data At Rest Encryption, (D@RE), despite the fact that the storage system maintains redundant copies of your en-cryption keys, it is still vitally important that you back up the encryption keys in case something happens to all copies on the storage system. Instructions on how to make a backup of the encryption keys are provided in the following D@RE whitepaper: https://

support.emc.com/docu54820_VNX2:-Data-at-Rest-Encryption.pdf and here for Unity: 497410 : Unity: DARE keystore backup best practic-es (User Correctable) https://support.emc.com/kb/497410

Dell EMC recommends keeping a copy of the current key stored off-array as well. Your encryption keys need to be saved every time the configuration is changed on the storage system.

Unity releases OE version 4.3.0.1522077968 This is a major new release featuring many new features, enhancements, and fixes including:

Data reduction enhancements: Enhanced data reduction to include savings from deduplication and compression.

Data protection and mobility enhancements: Code page translation of NFSv3 and FTP clients.

Enhanced ESRS configuration options (see release notes).

Automatically generated Unix UIDs for a multiprotocol configuration.

Dynamic LDAP.

Security enhancements.

Support for TLS 1.2 protocol.

Many bug fixes including but not limited to:

Changed behavior to keep certain EFDs offline if they self-reset and lose drive write cache (DL).

Fixes an issue with new power supplies that could cause EFDs to overheat when array is powered off.

Fixes an inode exhaustion issue that could impact file systems and cause data to become unavailable.

Fixes an issue that could cause pool expansions to fail.

Fixes an ILC issue that could bring file systems offline due to metadata corruption.

Unity customers with flash drives (EFDs) may wish to prioritize systems with large numbers of EFDs for upgrades to this

code level to take advantage of a fix noted above that can prevent DL in the very unlikely event that an EFD takes a fatal

error and initiates a self-reset that results in drive write cache being lost.

VNXe1600 and VNXe3200 product lines should be running at tar-get code levels to avoid DUDL If you have either VNXe1600 or VNXe3200 storage systems deployed in your environment, the most impactful known issue on both of these platforms was an issue whereby SP cache could become lost during certain repeated power down scenarios, such as when you have repeated brown out conditions in close succession. Dell EMC has had software fixes available for this condition for some time now, but it remains the most common outage scenario on these platforms due to the number of customers still running old code.

The code levels you should be running on VNXe1600 and VNXe3200 that have the fix for this DU/DL issue are shown below:

VNXe1600: 3.1.9.9570028

VNXe3200: 3.1.8.9340299

Page 3: UPTIME BULLETIN Q1 · 2020-03-19 · UPTIME BULLETIN A Newsletter from MRES, a Division of Dell EMC For Unity/VNX/VNXe Q1 VOLUME 24 March, 2018 Certain EFDs may overheat causing DL

Dell EMC Unity/VNX/VNXe target versions DELL EMC has established target revisions for each product to ensure stable and reliable environments. As a best practice, DELL EMC recommends that you operate at target code levels or above to benefit from the latest enhancements and fixes available. Search using the term “adoption rates” in http://support.emc.com for current Dell EMC Unity/VNX/VNXe target code adoption rates.

VNXe2 OE VERSION RELEASE DATE STATUS

3.1.8.9340299 06/06/17 Target

3.1.8.9340299 06/06/17 Latest Release

VNXe1600 OE VERSION RELEASE DATE STATUS

3.1.9.9570228 11/30/17 Target

3.1.9.9570228 11/30/17 Latest Release

Dell EMC UNITY OE VERSION RELEASE DATE STATUS

4.2.3.9670635 03/15/18 Target

4.3.0.1522077968 03/30/18 Latest Release

UNIFIED VNX2 OE VERSIONS (8.1 & R33) RELEASE DATE STATUS

8.1.9.217 (VNX for File) 12/06/17 Target

8.1.9.217 (VNX for File) 12/06/17 Latest Release

05.33.009.5.218 (VNX for Block) 01/31/18 Target

05.33.009.5.218 (VNX for Block) 01/31/18 Latest Release

See Product Release Notes for a full list of enhancements per new code release. VNXe2 OE code enhancements in release 3.1.8.9340299 48 Bug and 33 Security Fixes.

Fixes an issue where MCC cache could get lost which would cause a DU/DL situation after some power loss situations.

Fixed an issue with large snap counts that could cause a buffer overflow.

MS SMB2/3 Support for secure communi-cation.

SAS firmware change to address ECC/Parity errors.

mSATA enhancements to support new attributes.

VNX2 OE and File OE enhance-ments in releases 05.33.009.5.217 and 8.1.9.217 This release was pulled due to a potentially

serious bug impacting D@RE. See KB 515666 for more information about the issue and for instructions on what to do if you are running this release and have D@RE enabled. If you installed the re-lease and are not running D@RE, you are fine. For a list of fixes in this release, refer-ence the fixes listed below for

05.33.009.5.218.

VNX2 OE and File OE enhance-ments in releases 05.33.009.5.218 and 8.1.9.217 Contains enhancements to the audit log.

Contains enhancements to the audit tool.

Offers a “request inodes alert” which can be based on subdirectories under a directory.

Provides a session management timeout.

Offers over 100 bug and security fixes.

Fixes an MCR bug that only affects sys-tems running D@RE on 05.33.009.5.217 that could result in DU/DL.

Dell EMC Unity OE enhance-ments in release 4.2.3.9670635

Functionally the same as 4.2.2.x release with

the addition of one critical fix to prevent disks from overheating due to a power supply flaw on certain power down scenarios. See ETA 518863 for further details.

Dell EMC Unity OE enhance-ments in release 4.3.0.1522077968 Data reduction enhancements: Enhanced

data reduction to include savings from deduplication and compression.

Data protection and mobility enhance-ments: Code page translation of NFSv3 and FTP clients.

Enhanced ESRS configuration options (see release notes.)

Automatically generated Unix UIDs for a multiprotocol configuration.

Dynamic LDAP.

Security enhancements.

Support for TLS 1.2 protocol.

Many bug fixes including but not limited to:

Changed behavior to keep certain EFDs offline if they self-reset and lose drive write cache (DL).

Fixes an issue with new power supplies that could cause EFDs to overheat when array is powered off.

Fixes an inode exhaustion issue that could impact file systems and cause data to be-come unavailable.

Fixes an issue that could cause pool ex-pansions to fail.

Fixes an ILC 4K flush issue that could bring file systems offline due to metadata corrup-tion.ontains important security fixes.

Fixes an issue where MCC cache could get lost which would cause a DU/DL situation after some power loss situations.

Page 3 UPTIME BULLETIN

Page 4: UPTIME BULLETIN Q1 · 2020-03-19 · UPTIME BULLETIN A Newsletter from MRES, a Division of Dell EMC For Unity/VNX/VNXe Q1 VOLUME 24 March, 2018 Certain EFDs may overheat causing DL

Dell Inc. believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL INC. MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN

THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any Dell EMC software described in this publication requires an applicable software license. Dell EMC and other trademarks are trademarks of Dell, Inc. or its subsidiar-

ies. Other trademarks may be trademarks of their respective owners. Copyright © 2018 Dell EMC Corporation. All rights reserved. Published in the USA, March, 2018.

Recent enhancements in CloudIQ

Page 4 UPTIME BULLETIN

Allow Customer Support to View Your CloudIQ In order to better address Service Requests, Customer Sup-port personnel can view the given system in CloudIQ for the SR they have been assigned. This is enabled by default and can be managed on a per system or site level, or turned off completely.

Metrics Anomalies We’ve added support for anomaly detection in the metrics navigator. For each active object and breakdown in the graph, a corresponding anomaly graph can be viewed for the most recent 24 hours. This feature can be accessed from the drop-down menu in the upper right-hand corner of any metrics graph, where you will also find controls to select objects and breakdowns, and remove the graph from the page.

Host Details You can now view details for a specific Host including configu-ration information, associated health issues, current and his-torical capacity, and storage objects and initiators. Note that Host Details provides data for a specific Host attached to a single Storage System, and is not a consolidated view across Storage Systems.

Health Change Email Notification We have added the ability to send email notifications for each system when its health issues change. You can configure the subscription to this health notification email in your Settings page.

Predicted date to full for Pools Pool details page has been updated with date and time to full prediction range for more precise capacity planning and risk avoidance.

System Hotfixes We've added a "Hotfixes" field to the Configuration tab of the System details page. This field lists all hotfixes that have been applied to the current OE version.