smartcloud monitoring and capacity planning

46
© 2013 IBM Corporation SmartCloud Monitoring Simon Coote 29 May 2013, Copenhagen

Upload: ibm-danmark

Post on 08-Jun-2015

1.874 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: SmartCloud Monitoring and Capacity Planning

© 2013 IBM Corporation

SmartCloud MonitoringSimon Coote

29 May 2013, Copenhagen

Page 2: SmartCloud Monitoring and Capacity Planning

Agenda

Introduction

What is SmartCloud Monitoring? Demonstration

Health Dashboards Predictive Analytics Capacity Planning Reporting

Summary

Page 3: SmartCloud Monitoring and Capacity Planning

Monitoring – Holistic view of performance and availability

Operating Systems

Applications

Hypervisors

Hardware

Tra

nsa

ctio

ns

SmartCloud Monitoring

Page 4: SmartCloud Monitoring and Capacity Planning

What is SmartCloud Monitoring?

Health dashboards to provide an instant, consolidated glimpse into cloud health

Topology views of the key interrelated components of the cloud

Reports on the health trends of cloud components and workloads, powered by Cognos

What-If capacity planning scenarios

Policy-Based optimization to put workloads where they’ll perform best, not just where they’ll fit

Performance Analytics for right-sizing of virtual machines

Integration with industry-leading IBM service management portfolio

Page 5: SmartCloud Monitoring and Capacity Planning

VMware KVM (IBM, Redhat) Citrix XenServer Citrix XenApp, Citrix XenDesktop Hyper-V

•X86-Hypervisors

•Transactions •Middleware•Applications

• Transaction tracking• Response time• Robotic transactions

• Monitoring of zVM and Linux on System z

•IBM z/VM

•Databases

• SAP• Exchange• Lotus Notes• PeopleSoft• etc

• DB2

• Oracle

• SQL

• etc

• WebSphere

• MQ

• WebLogic

• etc

• Monitoring of IBM Power VM (CEC, VIOS, LPARS (AIX, Linux), HMC)

•IBM Power

• Windows• Linux• AIX, Linux on p, i5• zOS, z/VM• Solaris• HP-UX

•OS •Storage/Network

• TPC: IBM Storage, EMC, Hitachi, NetApp

• DFM: NetApp• Ethernet Switches• etc

• Solaris/Zones

•Other Hypervisors

•Integrate with ITM/ITCAM and other tools to extend scope beyond hypervisors

• x86 (IBM non IBM)• IBM Power 5, 6, 7• IBM system z, zEnterprise• Sun (SPARC)• HP• Cisco UCS• etc

•Server Platforms/OS

SmartCloud Monitoring Broad Hypervisor & Platform Support

Page 6: SmartCloud Monitoring and Capacity Planning

Monitor the Virtualization/Cloud Ecosystem

ITMITM

ESX/ESXiESX/ESXiESX/

ESXiESX/ESXiESX/

ESXiESX/ESXiESX/

ESXiESX/ESXiESX/

ESXiESX/ESXiESX/

ESXiESX/ESXiESX/

ESXiESX/ESXi

vCenterServer

vCenterServer

Holistic Approach to Monitoring including storage, networking, hypervisor, etc.

NetApp Storage Agent:

– Provides Monitoring data in ITM

– Integrates into Health Dashboard

TADDM Integration

– TADDM DLA discovers the vCenter environment/topology

– TADDM provides change data to VMware Health Dashboard

IBM Director Integration

– ITM Agent provides integration with the Director Server

– Allows for Management of VMware resources

– Historical Collection of HW data

Tivoli Storage Productivity Center (TPC):

– Agent provides storage metrics in TEP

– Integrates into Health Dashboard

– Warehouse storage metrics for reporting and analysis

Network Monitoring Agent

– Monitor switches used by VMware

– Integrate Network Events into Dashboard

Consider adding application monitoring to the ecosystem

NetAppNetApp

NetApp Agent

NetApp Agent

DFMDFM

TADDMTADDM

Health Dashboard

Health Dashboard

TPCTPC

IBM Storage

IBM Storage

HitachiHitachi

EMCEMC

NetAppNetApp

NetworkSwitchesNetworkSwitches

NetworkSwitchesNetworkSwitches

VI AgentVI Agent

Apps / Midleware

Apps / Midleware

Page 7: SmartCloud Monitoring and Capacity Planning

SmartCloud Monitoring 7.2 – What's new?

Additional VMware Metrics: − Orphaned VMDK files

Completely rewritten VMware Health Dashboard: − Lighter weight/Faster Response Time − More intuitive and easy to navigate − Fewer clicks to drill down to root cause

New DASH user interface: − Single User Interface where multiple Tivoli products are integrated − Includes TCR 3.1 which includes Active Reporting − Sample Active Report Attached Here: − Self-Service dashboarding capabilities. Build dashboards using any ITM data

and data using Tivoli Directory Integrator − Support for tablet devices

Improved VMware Capacity Planning: − Can save existing customization − Can do partial loads of the VMware environment − New VMware Expense Reduction Report and other reports − Improved benchmark matching − Evaluates CPU, Mem, Network I/O, Storage, and Storage Topology

Page 8: SmartCloud Monitoring and Capacity Planning

SmartCloud Monitoring 7.2 – What's new?

Power Systems: − Capacity Planning: What-if scenarios, server sizing, etc. for Power Systems− Enhanced Power Systems Agents including consolidation of UNIX OS Agent

and Premium AIX Agent Enhancements to other hypervisors:

− Other hypervisors such as Citrix and Cisco UCS have been enhanced ITM 6.3 Enhancements:

− OS Dashboard in DASH− OSLC Linked Data for integration with TBSM and other products− 64-bit TEPS that doubles the number of concurrent users and improves scale− Warehouse Range Partitioning

This can dramatically improve performance by eliminating the need for Pruning of the historical data

− Authorization Policy Server To restrict the access for dashboard users to Managed System Groups and

to individual agent managed systems. The ability to grant role-based access control in addition to Access Control

Lists, making access control easier and safer.Role inheritance for scalable management.

Page 9: SmartCloud Monitoring and Capacity Planning

Health Dashboards

Page 10: SmartCloud Monitoring and Capacity Planning

Today’s Agenda

10

High level Vmware dashboard showing all clusters, events, and key KPI’s

Click to drill down

Page 11: SmartCloud Monitoring and Capacity Planning

11

Single Cluster view showing events and KPI’sCan be real-time or historical

Select any of the links below to go to Servers, VMs, or Datastores

Page 12: SmartCloud Monitoring and Capacity Planning

12

Single Cluster view showing VSphere ServersClick on a link to drill down to a single server

Page 13: SmartCloud Monitoring and Capacity Planning

13

Single Server showing historical dataActions allow you to launch in context to TEP or TCR

Select any of the links below to go to Cluster, Servers, VMs, or Datastores

Page 14: SmartCloud Monitoring and Capacity Planning

14

Configuration tab shows config data from TADDM

Bread crumbs allow for easy navigation

Page 15: SmartCloud Monitoring and Capacity Planning

15

Change History data from TADDM

Page 16: SmartCloud Monitoring and Capacity Planning

16

Networking KPI’s for the VSphere Server

Select any of the links below to go to Cluster, VMs, or Datastores

Page 17: SmartCloud Monitoring and Capacity Planning

17

Virtual Machine Page showing real-time or historical KPI’s and Events

Link to OS Dashboard for the VM

Page 18: SmartCloud Monitoring and Capacity Planning

18

OS Dashboard with metrics and Events from the OS Agent

Page 19: SmartCloud Monitoring and Capacity Planning

19

Detailed OS Dashboard Page

Page 20: SmartCloud Monitoring and Capacity Planning

20

VMware DatastoresSelect to drill down

Page 21: SmartCloud Monitoring and Capacity Planning

21

Single Datastore shows real-time or historical data and Events

Select any of the links below to go to connected Clusters, Servers, or VMs

Page 22: SmartCloud Monitoring and Capacity Planning

Predictive Analytics

Page 23: SmartCloud Monitoring and Capacity Planning

Dynamic Thresholding & Adaptive Monitoring View real-time and time aligned

historical data Analyze the trends and see trends vs

anomalies Use Avg, Max, Min, Percentile, Mode Monitors can be defined for shifts or can

be adjusted for seasonal differences

Agents provide static monitoring thresholds

IBM Tivoli Monitoring provides static thresholds and Dynamic Thresholds/Adaptive Monitoring

The system learns the “normal” behaviour for a resource and sets the threshold based on historical data

Page 24: SmartCloud Monitoring and Capacity Planning

24

Performance Analyzer: Predictive Trending

• Hands off capacity monitoring• Automates performance analysis and reporting

• Prediction of application bottlenecks • Creation of alerts for potential service threats.

• “What will my resources look like tomorrow, next week. next month or next year?”

• “What IT Resources should I worry about next?”• “Will I have enough capacity to get me through

Monday?”

Leverage collected data to spot trends and highlight emerging concerns

•Time

•Metric

•Predicted trend

•Threshold •Predicted

•Metric Violation

•Actual Monitor Data

Page 25: SmartCloud Monitoring and Capacity Planning

Key Capabilities of Performance Analyzer Out of the box analytics tasks for:

– OS Agents (CPU, Memory, Disk, Network)

– DB2 (Pool Read/Write, Sort Time, Memory, Tablespaces, etc.)

– Oracle (Cache Hits, Archive Space, Tablespaces, Transactions, etc.)

– Vmware (Physical Server, VMs, Memory, CPU, Datastores, etc.)

– Power Systems (SEA, Network, CPU, Memory, I/O, Entitlement, etc.)

– Response Time (Web, Robotic, and Client Response Time)

Easily Customized to Analyze any numeric data– Arithmetic Modules for calculating data and normalizing data

– Analytic Modules for performing predictive analytics

– Predict the following:

• How many days until I reach a Warning or Critical Threshold

• Predict the value 7, 30, 90 days into the future (customizable)

When defining Situations in Performance Analyzer, define the number of days notice you need

– Take into account the statistical values such as the strength, number of data points, etc.

Page 26: SmartCloud Monitoring and Capacity Planning

Key Capabilities of Performance Analyzer

Select Agent Type

Select Attribute Group

Select Hourly, Daily, Weekly, etc.

Analyze all or a subset of your agents

Page 27: SmartCloud Monitoring and Capacity Planning

Key Capabilities of Performance Analyzer

Define Warning and Critical Thresholds

Page 28: SmartCloud Monitoring and Capacity Planning

Key Capabilities of Performance Analyzer

Define prediction durations

Page 29: SmartCloud Monitoring and Capacity Planning

Performance Analyzer for: Vmware and Power Systems (CPU Trends, Disk Utilization, Memory, Network) out of the box

For Vmware, recommend setting up Analytic Task for: Cluster Percent Effective CPU utilization, Cluster Percent Effective Memory utilization, CPU

Percent Ready Setup Analytic Tasks for other hypervisors (Hyper-V, KVM, Solaris Zones) Linear and non-Linear trending…the tool picks the model that best fits the data

Model ChosenTime to Critical and Warning

Bar Chart represents the historical data and the line graph represents the statistical trend line

Page 30: SmartCloud Monitoring and Capacity Planning

Capacity Planning

Page 31: SmartCloud Monitoring and Capacity Planning

Why is Capacity Management Critical to Cloud Management?

Helps consolidate and reduce IT costs­Reduces HW and labor costs­Fewer physical servers needed­Reduce hypervisor license costs­ Increase VM density to drive Cloud ROI­Predict how many more customers / VMs can be serviced

Helps ensure application availability and reduce risk­Are any resources overloaded? When will physical resources reach their

limits?­Have there been any significant changes in my environment recently?­ Identify trends to predict bottlenecks, or free up space and balance

workloads­Ensure supply can meet demand­Ensure technical and business policies are met to reduce risk

Helps optimize resource utilization­Right size virtual machines and allocate based on usage, over-commit

within known risk limits­Pack VMs on the infrastructure to optimize resources

Page 32: SmartCloud Monitoring and Capacity Planning

Capacity Planning & Analytics - Architecture

…Platforms

Storage Network

HypervisorsServers

Workload Characterization- Establish patterns using historical data- Capture workload attributes to enable optimization policies

Capacity Planning Database

Optimization Engine to size and place VMs

Optimization Engine to size and place VMs

PlanRecommendation

(minimize systems, license, balance)

Business and Technical policies

Copy, Federate

Custom Tagsenhance Config Profiles and

workload relationships

Benchmarking data

Usage profiles, workload relationships

Page 33: SmartCloud Monitoring and Capacity Planning

The Case for Holistic Capacity Analytics

Unbalanced

Can’t move workload to this cluster because it’s almost out of datastore space

On one screen, I can check all of the key resources to see if my workload is balanced

Page 34: SmartCloud Monitoring and Capacity Planning

Unbalanced across Clustersand within a cluster

Need to look at all key attributes to look for bottlenecks and imbalance

Disk I/O Metrics also available

The Case for Holistic Capacity Analytics

Page 35: SmartCloud Monitoring and Capacity Planning

SmartCloud Monitoring Capacity Planning Center

Page 36: SmartCloud Monitoring and Capacity Planning

Planning Centre – applying parameters

Page 37: SmartCloud Monitoring and Capacity Planning

Policy Driven Capacity Planning

Opens a new tab with pre-built rules or a rule editor for customer rules

Choose rules for “what-if” scenario

Out of the box rules

•Colocation/Anti-colocation• Place Win2003 32-bit on the

same ESX server

•Boundary Rules• Place Win2003 32-bit on the

same ESX server

•Utilization Rules

• Provide 20% growth for key business application

Create custom rules

SPECint data used for analysis

Page 38: SmartCloud Monitoring and Capacity Planning

Spec data provides granular capacity planning

Page 39: SmartCloud Monitoring and Capacity Planning

39

Reduce from 9 ESX servers down to 4 while lowering memory and CPU utilization

Utilization projections accommodated growth

Started with 9 ESX servers and 24 VMs

The tool always includes headroom to ensure you don’t run out of capacity

Capacity Planning ResultsRecommendations to optimize workloads; reduce risk while eliminating or reallocating servers

Page 40: SmartCloud Monitoring and Capacity Planning

40

Utilization after optimization

Headroom

Reduce both CPU and Memory Utilization reservations to free up resources

Page 41: SmartCloud Monitoring and Capacity Planning

ROI Case Study – IBM Test & Development CloudCluster consisting of 18 Servers and 1802 Virtual machines

Goal: Analyze an existing, production virtual environment in search of further optimization, and show ROI using management and capacity planningSolution: Used IBM SmartCloud Monitoring to analyze the current environment and perform “what-if” analysisResults: More Optimized environment uses fewer physical servers, which results in savings in hardware, administrator /support, energy, data center floor space andlicense costs, resulting in an additional ROI of 14.4.% over a year, and the ability to accommodate an additional 113 virtual machines.

Optimization of an IBM Internal development and test cloud using IBM SmartCloud monitoring results in an additional ROI of 14.2%

“In order to realize true cost savings from a virtualization or cloud investment, customers need to be able to run virtual machines densely enough to maximize consolidation, yet be assured that their workloads are still running as well as they were before being virtualized, with room for expansion.”

14.4 % Annual ROI

Page 42: SmartCloud Monitoring and Capacity Planning

VMware Expense Reduction Report - Example

Page 43: SmartCloud Monitoring and Capacity Planning

VMware Expense Reduction Report - Example

Page 44: SmartCloud Monitoring and Capacity Planning

VMware Expense Reduction Report - Example

Page 45: SmartCloud Monitoring and Capacity Planning

SmartCloud Monitoring

Virtual Machines | Storage | Networks

Provides greater visibility to cloud health• Track cloud service levels & performance, and predict cloud problems before clients are impacted

• Understand performance and capacity today, and know what it will look like months from now

Lowers total cost of operations • Optimize workload placement to wring maximum capacity and performance out of your cloud investment

• Freedom from expensive hypervisor or OS lock-in with a heterogeneous cloud infrastructure monitoring solution

Optimizes cloud performance• Built-in performance analytics for right-sizing of virtual machines and resource optimization in the cloud

• Real-time proactive & predictive alerts help identify and fix problems quickly

Hea

lth

D

ash

bo

ard

s

Cap

acit

y A

nal

ytic

s

Per

form

ance

O

pti

miz

atio

n Increased Density

Reduced Risk

Minimized Outages

Optimized Workload Placement

Improved ServiceLevels

Years of experience managing mission critical workloads in the World’s largest enterprises yield peerless best practices advice

IBM SmartCloud Monitoring: Optimize your cloud performance and maximize ROI

Page 46: SmartCloud Monitoring and Capacity Planning

Kiitos!

Tack!

Thank you!

Takk!

Tak!