the data center chaos remediation tip sheet

2
Data Centers Business-wise, Future-driven TM The Brands You Trust. The Data Center Chaos Remediation Tip Sheet There are six main causes of chaos and instability in the data center, and the good news is that they can be remediated to improve data center efficiency and reduce downtime. This tip sheet will help you understand the problems, the solutions and how you can make the changes that will have real benefit. Understanding the Cause Helps Drive the Cure The key reason for problems and chaotic infrastructure in the data center is too many years of unplanned, short-term focused change to the data center. As individual problems are solved without regard to a holistic plan, decisions and actions are made that drive chaos. What specifically does chaos look like and what causes it? 1. Inconsistent cabling with no color standards and random routing – this makes it very difficult to diagnose problems and may eventually lead to downtime when the wrong cable is unplugged. 2. Lack of a documented change management process – while change management is common practice in IT, its scope rarely includes changes to the physical infrastructure where the impact of downtime can be more widespread than, say, one application. 3. Unnecessary and overbuilt UPSs – In the fear that some equipment may be unprotected, UPSs may be added without a clear plan and consistent hardware coverage. 4. Uncalculated reactions to hot spots – when hot spots are found, some operations managers react by reducing the temperature set points on cooling units, which often creates more problems than it fixes. Turning set points down spontaneously pulls moisture out of the air, reduces the cooling unit’s capacity, and increases energy consumption to re-humidify the air. 5. Air flow issues and inconsistent use of blanking panels – airflow problems compound as the data center grows, leading to a higher power bill and higher risk of downtime. 6. Inconsistent or missing documentation for data center infrastructure – when technicians perform maintenance, inadequate documentation can lead to mistakes, downtime and other costs. There are, of course, a number of other attributes that can describe some level of chaos. In fact there have been contests that showcase some of the more extreme examples of this phenomenon. However, the focus isn’t finding ways to describe the problem; it’s how to create more order!

Upload: kingfin-enterprises-limited

Post on 19-Jan-2015

382 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: The Data Center Chaos Remediation Tip Sheet

Data Centers

Business-wise, Future-drivenTM

The Brands You Trust.The Brands You Trust.

The Data Center Chaos Remediation Tip SheetThere are six main causes of chaos and instability in the data center, and the good news is that they can be remediated to improve data center efficiency and reduce downtime. This tip sheet will help you understand the problems, the solutions and how you can make the changes that will have real benefit.

Understanding the Cause Helps Drive the Cure

The key reason for problems and chaotic infrastructure in the data center is too many years of unplanned, short-term focused change to the data center. As individual problems are solved without regard to a holistic plan, decisions and actions are made that drive chaos.

What specifically does chaos look like and what causes it?

1. Inconsistent cabling with no color standards and random routing – this makes it very difficult to diagnose problems and may eventually lead to downtime when the wrong cable is unplugged.

2. Lack of a documented change management process – while change management is common practice in IT, its scope rarely includes changes to the physical infrastructure where the impact of downtime can be more widespread than, say, one application.

3. Unnecessary and overbuilt UPSs – In the fear that some equipment may be unprotected, UPSs may be added without a clear plan and consistent hardware coverage.

4. Uncalculated reactions to hot spots – when hot spots are found, some operations managers react by reducing the temperature set points on cooling units, which often creates more problems than it fixes. Turning set points down spontaneously pulls moisture out of the air, reduces the cooling unit’s capacity, and increases energy consumption to re-humidify the air.

5. Air flow issues and inconsistent use of blanking panels – airflow problems compound as the data center grows, leading to a higher power bill and higher risk of downtime.

6. Inconsistent or missing documentation for data center infrastructure – when technicians perform maintenance, inadequate documentation can lead to mistakes, downtime and other costs.

There are, of course, a number of other attributes that can describe some level of chaos. In fact there have been contests that showcase some of the more extreme examples of this phenomenon. However, the focus isn’t finding ways to describe the problem; it’s how to create more order!

Page 2: The Data Center Chaos Remediation Tip Sheet

Data Centers

Business-wise, Future-drivenTM

The Brands You Trust.The Brands You Trust.

SUMMARYChaos in the data center is not just something that makes your day to day life more frustrating, it will cause downtime and lead to far higher operating costs. While building a new data center is one route to eliminate chaos, it’s not a fiscal reality in many cases. You can, however, use these five tips to start reducing chaos and improve your operating reality today. Steady improvement will yield compelling results.

When the wiring in your server room or data center looks like a modern hair style gone bad, you not only have chaos, but you also likely have imminent downtime. Start with going to a documented and consistent color scheme where there are no exceptions and adherence is mandatory. Next, develop a standard wiring routing approach where all cabling is put on one side of a rack, and ties are used to keep cable bundles neat and out of the way of airflow. Many data center managers are moving to overhead cabling and power distribution for improved cable management and to maximize air flow under the raised floor.

There is no quicker way to chaos than making undocumented and untracked changes to data center or server room infrastructure. It’s important to do this within teams, but don’t forget to get out of department “silos” and coordinate between IT and facilities (and third party vendors!). Use automation tools and document management systems to enforce change control and ensure all documentation is accounted for and is easily retrievable.

Fight chaos by moving from a view of only a single rack, server or device to a holistic view of how the overall datacenter is impacted by changes. This includes air flow, UPS usage, power distribution, network cabling, and rack/area density. Another important aspect of this comprehensive view is to ensure that you take into account servers, storage, and networking as a combined entity, not a piecemeal approach. DCIM tools can help you plan, operate and analyze your data center holistically.

There is a cogent argument that the single largest cause of chaos in a data center or server room is the use of unique, customized, and varied physical infrastructure. As different styles of racks have inconsistent cable routing, power locations and capabilities, IT has to build what amounts to a number of “mini” data centers that all behave differently. It’s easy to see how this adds complexity to IT operations, but is also causes other problems, like “knowledge walk,” when highly knowledgeable staff leave the company or change roles. Using a standardized infrastructure provides consistency that not only reduces the time needed to deal with repairs or issues, but also reduces chaos substantially.

Taking Control – Tips to Resolve Chaos

Tip #1

Tip #2

Tip #3

Tip #4

Tip #5

Organize the Wiring Plant

Be Serious About Change Management

Create a Holistic View of the Data Center or Server Room

Use Standardized Infrastructure

Clean Sheet ContrastOne of the most useful analytic tools is the contrasting of a “Clean Sheet of Paper” of what your data center or server room might look like, and what it is today. Admittedly it’s an unfair comparison, but the idea is not to look at “what might have been;” it’s to use the contrast to identify the areas that are most likely creating chaos or other problems. This approach gives a very different and useful perspective that is hard to have when you’re focused on individual, tactical problems.