managing the infrastructure of data centers · · 2009-06-22managing the infrastructure of data...
TRANSCRIPT
Managing The Infrastructure Of Data Centers
David CuthbertsonSquare Mile Systems Ltd
Square Mile Background• Develop toolsets, training and
techniques for operational management of complex IT infrastructure
• Focus areas– Data center management– Connectivity management– System change impact analysis – Documentation techniques– Infrastructure visualisation
• All technologies!
Business ProcessesDepartmental, Company
ServicesEnd user, infrastructure, supplier
ApplicationsPC, server, mainframe, SOA
Fixed Infrastructure(Cabling, Power, Cabinets, Buildings)
Hardware InfrastructureNetwork, Servers, UPS, Storage, Other
Virtual InfrastructureNetwork, Servers, Storage, DBMS
Data Center Infrastructure
Common Management Issues• Data Centre is often not visible, nor are the staff• “If it’s not broke” attitude isn’t good,
infrastructure risks need to be managed• IT groups are often task and project orientated,
less focus on operational issues• Getting funds allocated for improving
management techniques is difficult• Skills sets need to evolve with technology and
organisation control requirements
Defining “Management”• Planning what needs to be done to
achieve a particular result• Organising and directing resources.• Controlling and making adjustments as
needed• Motivating all those involved.
Management Maturity
Reactive Repeatable Defined Managed Optimised1 2 3 4 5
Process open to external review and
updated regularly
Process checked and reviewed for
gaps
Individualapproach
Some process, often informal
Process documented
and explained
Where might you be if youa) Didn’t label patch cablesb) Labelled patch cables consistently?c) Audited records against patching documentation?
Why Improve DC Management?1. New technology demands
– Cooling, power, cabling, weight2. Save on capital and operational costs
– Optimise existing facilities– Reduce power and other costs
3. Less tolerance of outages and disruption4. Speed of change5. External need for evidence of control
Changing Requirements
BEFORE AFTERNo. of Servers per cabinet 3-6 30-40Power Disipated per cab. 300-2000W 3kW - 25kWCurrent service to cabinet 16A 2x32 A or 3 phaseTypes of Equipment Servers Blade Servers
Monitor Power Distribution UnitsKVMs MidSpan Boxes
Power Strips Disk Arrays (Storage)UPS Smart Power Strips
Regular Power StripsNetwork types 100M 1G, 10G, SANNo. of Cables Power 1 or 2 2 to 6(per server) Network 1 or 2 5 to 10
Cabinet Total 20-30 300 - 400
New Technology ChallengesSun Blade 8000 Blade Chassis
– 4 Power supplies (N+1) 9kW– 3 chassis per rack – 27kW?
HP C7000 Blade Chassis– Up to 6 Power Supplies 13kW– 4 chassis per rack
Cisco Nexus 7000 Data Center Switch– 3 Power Supplies 12kW– Up to 384 ports
And in the next few weeks?
Starting Well1. Specify and build the infrastructure using
a standards based approach– TIA942 data centre design– Other standards TIA, EN, etc.
2. Test installation for conformance to requirements
3. Handover of documentation, skills transfer and operational procedures to customer
So How Did This Happen?
Different Working Practices
So…You may understand, but you can’t assume
others do
Professionally designed infrastructure will be compromised without professional management practices!
Defining Best Practices• You could define your own best practice
– Authority– Experience– Technically qualified– Best communicator– Management information
• Or you could adopt a framework– Quicker path to end result with less opinions
Management FrameworksITIL / ISO20000Service Management
BS25999 Business Continuity
ISO27001 Information Security
CoBit IT GovernancePlan
Act
Check
Do
All have a continuous process
But no equivalent for data centermanagement!
ISO20000/ITIL V2Service Delivery Processes
Security Management
Service Continuity & Availability Management
Service Level Management
Service Reporting
Capacity Management
Financial Management
Release ProcessesRelease Management
Resolution Processes
Incident Management
Problem Management
BusinessRelationship Management
Supplier Management
Control ProcessesConfiguration Management
Change Management Relationship Processes
Why have a framework?• Common understanding of complex issues
– Terms, Processes, Roles– Measurement, identification of gaps– Communication– Training for individuals and teams
• Focus provided– Easier adoption of industry techniques– Overcomes internal reluctance to change
Example of Best PracticeProcuring a new server
– Policies - sign off, payment– Ordering process – life cycle– Purchase orders – common reference– Roles and responsibilities – specify, order and
approve
Best Practice in Data Centers
• Design– TIA942 standard, Uptime Institute,
manufacturer guidelines• Build and Install
– Standards and regulations TIA, etc.• Operate
– ???– EU Code of Conduct for Data Centers –New!
Different Power Views
LINK 10/100FEATURE
LANSERIAL
CURRENT���������������
ON = I OFF = U
BLINK = REMOTE
OUTLET #I/U TOGGLE
RESERVED
STATUS 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8
100-240V
~
50~60Hz
1.2A
KVM
Servers
What should the working limit be for the power strip?
16A feed 16A feed
LINK 10/100FEATURE
LANSERIAL
CURRENT�������������
ON = I OFF = U
BLINK = REMOTE
OUTLET #I/U TOGGLE
��
So….• Monitoring tools are useful, but they only
tell you what they see• For managing power infrastructure we
may need multiple values– Manufacturer power rating– Derated power – often 60% of manufacturer– Design power– Actual power
Managing Existing Data Centers• Environment limits• Information sets - formal and informal • Working practices - formal and informal• Roles / responsibilities• Current issues• Establish priorities
Establish Design Limits• Room• Architectural and Structural - Weight• Mechanical - Cooling, fire detection
/suppression• Electrical – Power• Cabling standards and limitations
Is this Rack Full?01-07 - FRONT
A005 A006 A007 A008A001 A002 A003 A004 A009 A010 A011 A012 A013 A014 A015 A016 A017 A018 A019 A020 A021 A022 A023 A024 PP01-07-01PWR01-07-APWR01-07-B
PROLIANT
DRIVE SURFACESMAY BE HOTALLOW TO COOLBEFORE TOUCHING
WARNING:
SVR-BHAM-010701
mic r os ys t em s®
1120
UK_BIRM_UX05
mic r os ys t em s®
1120
UK_BIRM_UX06
mic r os ys t em s®
1120
UK_BIRM_UX07
mic r os ys t em s®
1120
UK_BIRM_UX08
Cable Mgmt 01-07-04
It depends on SpaceWeightPowerCoolingConnectivity
Controlling the Environment• Known design limits• A baseline of the current estate• Change approval process • Forward planning for capacity• Regular reviews against limits• Maintenance practices
– Routine– Verification on process adoption
Data Center Documentation• Commissioning documentation
– Project plans and designs– Testing results– Initial systems provision – BMS
• Operational documentation– Various sets for ongoing management
Data Center
A to Z
Different Teams, Different Focus
Fixed Infrastructure(Cabling, Power, Racks, Rooms, Buildings)
Hardware InfrastructurePCs, Network, Servers, UPS, Storage, Other
Virtual InfrastructurePCs, Network, Servers, Storage, DBMS
ApplicationsPC, server, mainframe, SOA
ServicesEnd user, infrastructure, supplier
Business ProcessesDepartmental, Company
ServiceManagement
DataCentre
NetworksLAN/SAN
Applications
Mid-range Servers
Systems
DesktopsIMAC
Different views of a server
Floor Plan
Rack Position
Service impact
Power Supply
Network Connections
BLADE_BIRM01
UK_BIR
M01_BLAD
E-01
UK_BIR
M01_BLAD
E-02
UK_BIR
M01_BLAD
E-03
UK_BIR
M01_BLAD
E-04
BLADE-BIR
M01.BLAD
E-SW1
BLADE-BIR
M01.BLAD
E-SW2
UK_BIR
M01_BLAD
E-05
UK
_BIRM
01_BLADE-09
UK
_BIRM
01_BLADE-10
UK
_BIRM
01_BLADE-12
H/W Build
Recommended Information Sets• Space • Environment (power, cooling)• Connectivity (power, networks)• Asset and Inventory controls • Device management• Service management
Where to Start? Structured cabling only
KVM ArchitectureLAN diagrams
Storage diagramsPatching spreadsheetsInventory list
KVM WAN diagrams Point to Point Cabling
Building wiring diagramsAsset listLegacy systems
Backbone switches
IIS ArchitecturePower distributionEdge switchesPower architecture
Blade switchesComputer room layoutPDUs
Circuit breakers Labelling standards SAN Architecture
PABX port mapping Power strip connectionsLAN Architecture
LAN Connectivity ExampleIdentifying Focus for LAN Baseline Project
4
2
Amount of ConnectionsLow High
3
1
1. Backbone cabling2. Cabinet/Zone cabling3. Floor boxes4. Servers5. Core Switches6. Edge Switches7. Wireless Access Points8. Routers9. Firewalls10.SANs11.Power strips12.KVMs13. IP phones14.Desktops
Use
r Im
pact
of D
isco
nnec
tLo
w
H
igh
To Manage Connectivity1. Document the fixed infrastructure first
– Backbone, power, vertical2. The active components
– Switches, servers, SAN etc.3. Finally the connectivity
– Local, path and endpoints
Defining the Level of Detail1. Local patch?
PatchPanel
2. End to End path?
PatchPanel
PatchPanel
3. All devicesconnected to theswitch?
PatchPanel
PatchPanel
Asset Controls• Lists of all devices and assets• Their current status and location• Previous history and audit trail• Often combined with maintenance and
procurement data• Auto-discovery can help, but often limited
in value in data centers.
Device or Element Management• Network, server, storage monitoring• Configuration systems• Automated deployment / provisioning• Network and other architecture diagrams• Automated discovery and scanning• Backup and failover
Service & Risk Management• Help or service desk system• Project control or workflow system• Services maps
– Devices mapped to critical services• Service monitoring tools• Billing and charging• Recovery planning and testing
DC Capacity Management• Demand management to capture requests• Existing + allocated demand recorded• Capacity Plan and “database”• Reporting and trending on
– Space– Power– Cooling– Network, SAN Port availability– Resource (staff)
• “Green” reporting
Charging and FundingDifferent perspectives
SpacePowerCoolingNetwork Ports usedShared Infrastructure Costs and Support Hardware Maintenance Costs and Support Operations Costs and Support
Meeting the Needs of 3rd Parties• SOX, PCI, FSA, auditors, etc.• Building & planning requirements• Employment and buildings legislation
– Disability– Health and safety– Electricity at work– Carbon tax– And others
• Insurance
What would be sufficient evidence to satisfy them of your controls in most cases?
Energy Issues• EU Regulations already in place
– Energy performance of buildings– Energy using product directive (colour codes on white goods)– WEEE and RoHS directives
• US Green Building Council LEED Program– Leadership in Environmental Design
• The Green Grid programme• EU Code of Conduct for Data Centres
– Completed 1Q2009– Covers all data center, server and equipment rooms
• UK draft climate change bill• Carbon trading
EU Code of Conduct• Aim is to inform and stimulate data center
owners to reduce energy consumption– Understand energy usage– Raise awareness– Communicate practices which will reduce
energy consumption• Voluntary• Available at http://dcsg.bcs.org
EU Code of Conduct• Measurements against best practices for
– Cooling– Power equipment– Other data center equipment– Data center utilisation, management & planning– IT equipment and services– Energy monitoring
• Temperature and humidity requirements for equipment– Suggested limits are 5ºC- 40ºC
EU Code of Conduct• Additional best practices document is
useful for all as it covers– Design– Operate– New equipment and retrofit issues– IT equipment selection– Power, cooling, storage– Monitoring and reporting
Managing Risk
What presents the greatest risk?
Evidence of Conformance• Policies covering control, security etc.• Evidence of processes that support the
policies– Change records– Build and test records– Written material or email trails– Communications– Incident reviews– Access lists
Current IssuesSecurity of data on individual’s financial and
personal lives is becoming high profile. The data in the data center is valuable!
www.idtheftcenter.org
ITRC20090304-01NYPD Pension Fund 3/4/2009A civilian official of the NYPD’s pension fund has been charged with stealing the identities of 80,000 current and retired cops, sources said. He allegedly got into a secret backup-data warehouse on Staten Island last month and walked out with eight tapes packed with Social Security numbers, direct-deposit information for bank accounts, and other sensitive material.
Example
Management Maturity
Reactive Repeatable Defined Managed Optimised1 2 3 4 5
Process open to external review and
updated regularly
Process checked and reviewed for
gaps
Individualapproach
Some process, often informal
Process documented
and explained
What will be different next year?
Managing the Infrastructure• Planning what needs to be done to
achieve a particular result• Organising and directing resources.• Controlling and making adjustments as
needed• Motivating all those involved
Thank you for your attention
Questions or feedback?
David CuthbertsonSquare Mile Systems Ltd
www.squaremilesystems.comwww.assetgen.com