essential elements of data center facility operations

27
Essential Elements of Data Center Facility Operations Schneider Electric Data Center Science Center Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 White Paper 196

Upload: schneider-electric

Post on 23-Jan-2015

363 views

Category:

Documents


7 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Essential elements of data center facility operations

Essential Elements of Data Center Facility Operations

Schneider Electric Data Center Science Center

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

Data Center Science Center White Paper 196

Page 2: Essential elements of data center facility operations

70% of data center outages are directly attributable to human error according to the Uptime Institute’s analysis of their “abnormal incident” reporting (AIR) database1. This figure highlights the critical importance of having an effective operations and maintenance (O&M) program. This presentation describes

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

and maintenance (O&M) program. This presentation describes unique management principles and provides a comprehensive, high-level overview of the necessary program elements for operating a mission critical facility efficiently and reliably throughout its life cycle. Practical management tips and advice are also given.

Page 3: Essential elements of data center facility operations

Introduction

Importance of operations and maintenance (O&M) program

• Most facility outages attributable to human (operator) error• Majority of data center facility TCO is in OPEX, not CAPEX, where greatest

potential cost savings reside• Largest portion of OPEX are energy costs, which are rising

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

• Drive for energy efficiency reducing capacity safety margins and system redundancy, increasing importance of proactive maintenance and data center infrastructure management (DCIM)

• High levels of facility automation and equipment performance data have created new opportunities for enhancing reliability while reducing costs, when properly managed

Page 4: Essential elements of data center facility operations

Mission Critical Mentality

● Focuses on risk mitigation● Grasps interconnectedness of facility

and IT systems● Data center availability is paramount

Failure is not an option

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Data center availability is paramount● Highly complex, fast-paced changes

in mission critical facility● Challenging to manage

● Unique outside pressures● Government regulations● Customer audits

NOTE: In this paper, only system planning is covered. System planning refers to the power, cooling, racks,

and other support infrastructure systems. Planning related to the IT equipment is not discussed here.

Page 5: Essential elements of data center facility operations

Mission Critical Mentality

Code of Conduct

“Mission Critical Mindset” principles Impact

Focused on risk mitigation in all operational and

maintenance activities, work processes, and

procedures

Proactively deals with all potential threats to

system availability and worker/occupant safety

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

Acting with confidence and patience that is an

outgrowth of careful planning and preparation

Prevents risks from becoming problems;

enables faster response times and fewer errors

if problems do arise

Analytical, process-driven approach to risk

avoidance and problem solving

Helps identify and mitigate risk in complex

environments; ensures predictable and safe

operation

Comprehensive understanding of the function and

interconnectedness of facility systems and

components

Quickly identify and resolve potential threats

or actual problems; avoid or reduce system

downtime

Commitment to continuous learning and process

improvement

Increases skills and operational efficiency to

maintain an edge in a constantly changing

environment

Page 6: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Environmental Health and Safety

● Key components include● Injury, illness prevention● Electrical safety● Hazard analysis

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Hazard analysis● Hazard communication

Page 7: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Environmental Health and Safety

Key Program Attributes Description

Safety plans and trainingWritten safety plans must be established that describe the safe work practices and procedures to be observed by all workers. Regular training on the program elements must also be conducted.

Hazard analysisAll operational procedures shall start with an analysis of the possible hazards involved. Risks must be identified and safety measures assigned.

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

involved. Risks must be identified and safety measures assigned.

Lockout/tagout proceduresProper procedures to prevent the unexpected energizing or startup of machines or equipment (or which causes a release of stored energy) shall be used when servicing or maintaining equipment.

Personal protective equipment (PPE)

Appropriate protective equipment should be provided, properly sized, stored, maintained, and utilized as required to mitigate identified safety hazards.

Hazardous material handlingHazardous materials must be properly identified, labeled, stored, maintained, and used in conformance with manufacturer’s requirements, local laws, and ordinances.

Hazard communications programIncludes a list of hazardous chemicals, use of material safety data sheets (MSDS), proper labeling of all hazardous materials containers, and employee training on use of and protection from hazardous materials.

Compliance with all applicable health and safety laws and regulations

Requirements will likely vary by region and by level of government (e.g., local, state, federal).

Page 8: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Personnel Management

● Hiring and training● Competent, team-oriented people with

mission critical mentality● Well-rounded team

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Well-rounded team

● Develop staffing model● Clearly defined roles and responsibilities

Page 9: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Emergency Preparedness and Response

● Develop emergency operating procedures – EOPs – for all high-risk failure scenarios

● Develop, rehearse escalation

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Develop, rehearse escalation procedures

● Conduct regular scenario drills● Formal failure analysis for significant

facility events

See White Paper 199, “Data Center Emergency Preparedness and Response”, for more information.

Page 10: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Maintenance Management

● Key tasks● Asset management● Work order management● Spare parts management

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Ensure power and cooling continual performance

● Improved reliability with● Good asset intelligence● Proactive and preventative predictive

maintenance plan

● Results in● More accurate maintenance budget

forecasts● Minimized TCO and downtime

Page 11: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Maintenance Management > Asset Management

● Accurate, consistent tracking of critical facility assets● Computerized maintenance management system (CMMS)

● Record, track, and manage asset data and maintenance history

● Scope of service (SOS)

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Scope of service (SOS)● Defines maintenance frequency, specific activities, # of man hours● Establishes standard for procurement of

● Service agreements● Maintenance scheduling● Procedure development● Continuous program improvement

Page 12: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Maintenance Management > Asset Management

● Recommended asset management information● Type - top level classification (e.g. electrical,

mechanical, fire system)● Sub-type (e.g. PDU, UPS, CRAH)

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Text description of asset● Make - asset manufacturer name● Model - manufacturer model #● Size or rating● Location ID (room/area)● Trade responsible for maintenance● Manufacturer serial #● Install date● Warranty expiration date● Date asset to be replaced

Page 13: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Maintenance Management > Work Order Management

● Tool for service process management● Allows work to be

● Correctly prioritized

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Correctly prioritized● Assigned the right resources● Complete d on schedule

● Standalone ticketing system OR● Integrated work order module in a

CMS or DCIM system● Provide valuable information to facility personnel

Page 14: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Maintenance Management > Spare Parts Management

● Shortens mean time to recovery MTTR● Inventory should include parts with lead times longer than acceptable

downtime● Maintain spare parts list

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Maintain spare parts list● Stock frequently used items● Re-evaluate annually

Page 15: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Change Management

● Method of Procedure - MOP - process● Detailed checklist of

specified tasks

● MOP helps control work

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● MOP helps control work activity along with● Operational procedure

development and review● Risk analysis and

communication● Structured work practices● Vendor/contractor

supervision

Page 16: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Documentation Management

● Facilitates development of● Accurate procedures● Proper training● Workplace safety

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Process improvement

● Document management software application● System to keep critical infrastructure records

organized, up-to-date● Detailed checklist of specified tasks

● Manual process can also work

Page 17: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Training

● Establish training program that organizes operational and maintenance tasks into categories ● Mapped to capability levels – basic, intermediate, advanced

● Train and evaluate personnel to certify them

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Require annual recertification exams

● Ongoing education keeps personnel current

Page 18: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Infrastructure Management

● System to match facility resources with changing IT requirements● Prevent downtime● Improve resiliency

and response

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

and response● Reduce operating

expenses● Provide a sound

basis for capacity planning decisions

● Three key tasks● Facility monitoring● Capacity management● IT/Facilities integration

Page 19: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Quality Management

● Key components● Quality Assurance (QA): Typified by process and procedure

standardization● Quality Control (QC): Quality checks, inspections, and audits

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Quality Control (QC): Quality checks, inspections, and audits● Continuous Quality Improvement

Page 20: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Energy Management

● Energy typically the single largest data center expense

● 3 core tasks of an effective

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● 3 core tasks of an effective energy management program● Performance benchmarking● Efficiency analysis● Strategic energy sourcing

● Optimized energy sourcing● Reduce exposure to price volatility● Secure pricing that fits budget and business objectives

Page 21: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Financial Management

● Financial-related issues can impact facility’s day-to-day availability and resiliency

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Processes should focus on● Purchasing● Invoice matching● Financial reporting/analysis

● Facility managers and purchasing department should maintain close relationship

Page 22: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Performance Monitoring and Review

● Regularly monitor and review facility performance ● Determines health and effectiveness

of O&M program

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Shows where it is trending● Quality process should incorporate

facility KPIs● Benefits

● Aligns operational activities with business goals

● Positive reinforcement for innovation and process improvement

Page 23: Essential elements of data center facility operations

Common Mistakes

Common Mistakes Description

Maintenance program is not driven

by metrics

Often the result of poor asset management

No linkage made between break/fix maintenance

activities and preventative maintenance

Poor trainingTraining is not formalized and/or is not taken seriously

Over-reliance on technician “shadowing”

No linkage between certification level and tasking

Ineffective change managementInadequate risk analysis

Poor or non-existent procedures

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

Ineffective change management Poor or non-existent procedures

No defined process for performing critical work tasks

Failure to consistently test &

evaluate skills

Existing skills/training level not formally evaluated

Scenario drills are not employed

Incident and drill results are not evaluated

Poor documentationNo coherent sequence of operations

Drawings and schedules are outdated

Lack of revision control and/or lack of digitization

Failure to develop and implement a

quality control system

Lack of governance or resources to measure, monitor,

and review performance

Stuck in manual mode Failure to implement CMMS, EDMS, DCIM, etc

OverconfidenceAssumption that future performance can be predicted

by past experience

Page 24: Essential elements of data center facility operations

Facility Operations Services

Using Outside Vendors for O&M Programs

● Offer services for both existing and new data centers● Advise on● Develop● Implement

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Implement● Operate

See White Paper 198, “How to Write an Effective RFP for Data Center Facility Operations Services”, for more information.

Page 25: Essential elements of data center facility operations

12 Essential Elements of an O&M Program

Performance Monitoring and Review > Recommended Facility KPIs

● Critical load uptime● Load redundancy

maintained● Support system uptime

● Safety policy and procedure adherence

● Procedure development, management and use

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Support system uptime● Maintenance completion● Staffing coverage● Security policy

conformance● Emergency preparedness

drills● Emergency response

procedure adherence

● Quality control/improvement● Training compliance● Process improvement● Operational reporting● Proper event notification and

escalation● Timely and accurate cost reporting

Page 26: Essential elements of data center facility operations

Conclusion

● Efficient Operations & Maintenance program● Mitigates threats, effects of human error

● Focus on 12 essential elements of O&M program● Must have facilities operation team with “mission critical” mindset● Operational philosophy focuses on

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

● Operational philosophy focuses on ● risk mitigation● Preparedness● standardized processes● continuous improvement

Page 27: Essential elements of data center facility operations

ResourcesFacility Operations Maturity Model for Data CentersWhite Paper 197

How To Write an Effective RFP For Data Center Facility Operations ServicesWhite Paper 198

Data Center Emergency Preparedness and ResponseWhite Paper 199

Classification of Data Center Infrastructure Management (DCIM) ToolsWhite Paper 104

Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

Browse all APC white papers whitepapers.apc.com

Browse all APC TradeOff Tools™tools.apc.com

White Paper 104

How Data Center Infrastructure Management (DCIM) Software Improves Planning and Cuts Operational CostsWhite Paper 107

Avoiding Common Pitfalls of Evaluating and Implementing DCIM SoftwareWhite Paper 170