implementing a model for service level management: a practical approach to integrating performance...

31
Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company

Upload: korey-mallatt

Post on 29-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Implementing a Model for Service Level Management:

A Practical Approach to Integrating Performance Tools

Implementing a Model for Service Level Management:

A Practical Approach to Integrating Performance Tools

Steve LewisJ.D. Edwards & Company

Steve LewisJ.D. Edwards & Company

Page 2: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Topics:Topics:

1. Why manage/monitor your infrastructure?

2. What tools must be in place?3. Managing diverse systems, networks,

and applications.4. Key design decisions.5. Implementation experiences, examples,

and lessons.

Page 3: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Why do we need tools?Why do we need tools?

Every IT organization wants to be known for its

proactive monitoring and

automated Service Level Management.

Page 4: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

What is the Cost to Manage?What is the Cost to Manage?

1) Hardware, Software, & Maintenance fees.2) Facilities – building, cooling, electricity, access

control, disaster recovery sites.3) People – design, operations, support.

But what about . . .• Cost avoidance – no addition to bottom line.• Do these costs offset the cost of not managing?

(Under- or Over-utilization, lost productivity, “waste”)

Page 5: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

What can we gain?What can we gain?

1) If you know what resources you have used in the past, you can better plan for the future.

2) Re-active mode vs. Pro-active mode: operating from a pager vs. identifying potential problems before they happen.

3) Quick notification gives a jump to the technical team who repairs the service.

4) Knowledge base = better history on failures; a training tool for new team members.

Page 6: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

How to Move in the Right DirectionHow to Move in the Right Direction

Break down the task into sequential steps.

Build Service Level Management step-by-step from the bottom up.

Page 7: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

The Layers of Service Level MgmtThe Layers of Service Level Mgmt

Automated functionality built in layers according to their dependencies.

Page 8: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

#1 – Technical Infrastructure#1 – Technical Infrastructure

In order for a specific service to be available, all of the technical components must exist:Network Devices & Communication LinksServer Hardware & Operating SystemsApplication Software & Processes

Each device must gather statistics on itself (using SNMP, WMI, syslog, flat files, etc.)

This is where most $$$ and people are allocated!

Network, System, and Application Infrastructure

Page 9: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

#2 – Fault Management Tools#2 – Fault Management Tools

A defined SERVICE may not be available if a network, system, or application component experiences a failure or poor performance.

“Root Cause Correlation” identifies the exact point of failure in the event chain.

Network, System, and Application Infrastructure

Fault Management Tools

Page 10: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

#3 – Information Management Tools#3 – Information Management Tools

This should include tightly integrated tools:Problem ManagementChange ManagementAsset Management

Network, System, and Application Infrastructure

Fault Management Tools

Information Management Tools

Page 11: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Problem Management ToolsProblem Management Tools

If an infrastructure event is detected by the Fault

Management tools, it should be reported to the

Problem Management System: Documenting (trouble ticket & knowledge base) Tracking (status update & workflow) Escalating (service response) Notifying (pager, email, phone, PA system) Generating reports (mean time between failure)

Problem MgmtProblem Mgmt Change MgmtChange Mgmt Asset MgmtAsset Mgmt

Page 12: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Change Management ToolsChange Management Tools

Change Management System: Schedule & approve changes to the infrastructure. Track routine maintenance tasks. The Problem Management tool can check with the

Change Management tool to distinguish between “Planned Outages” & unexpected faults.

Notification & reporting are handled differently for planned outages.

Problem MgmtProblem Mgmt Change MgmtChange Mgmt Asset MgmtAsset Mgmt

Page 13: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Asset Management ToolsAsset Management Tools

Vital information on each technical component -- Asset Management System:

Vendor & maintenance plan Serial number & location Lease expiration & asset owner Responsible support team by shift so the

appropriate group is notified of an event.

Problem MgmtProblem Mgmt Change MgmtChange Mgmt Asset MgmtAsset Mgmt

Page 14: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

#4 – Performance Management Tools#4 – Performance Management Tools

Performance/Capacity Planning statistics. Resource utilization thresholds for proactive

notification when thresholds are exceeded.

Network, System, and Application Infrastructure

Fault Management Tools

Information Management Tools

Performance Management Tools

Page 15: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

#5 – Service Level Policies#5 – Service Level Policies

Technical components grouped into services. “Customer view” transaction monitoring.

Network, System, and Application Infrastructure

Fault Management Tools

Information Management Tools

Performance Management Tools

Service Level Policies

Page 16: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

#5 – Service Level Policies (continued)#5 – Service Level Policies (continued)

Two ways to measure a service:

Monitor each component in the “service chain” – BUT how do you synchronize the data from different monitoring tools?

Generate synthetic transactions from an “end user” viewpoint – BUT how do you isolate troublesome components?

Service Level Policies

Page 17: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

#6 – Service Level Management#6 – Service Level Management

Automated reporting of SLA compliance.

Network, System, and Application Infrastructure

Fault Management Tools

Information Management Tools

Performance Management Tools

Service Level Policies

Service Level Management

Page 18: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Service Level Management is not a unique, isolated function. It is the culmination of ALL

the functions involved in providing the service.

Rick Sturm

#6 – Service Level Management (continued)#6 – Service Level Management (continued)

Page 19: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Difficulty of Service Level ManagementDifficulty of Service Level Management

Collecting the appropriate metrics. Automating the correlation of those metrics.

TechnologTechnologyy

ViewView

CustomeCustomerr

ViewView

Page 20: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Design Decision #1Design Decision #1

Reality: The technical infrastructure

is relatively dynamic, constantly changing, with little centralized control.

Decision: Choose “Self-Configuring”

Tools that detect and adjust to change automatically.

Page 21: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Design Decision #2Design Decision #2

Reality: Cannot afford the intensive

administrative overhead required to maintain most tools.

Decision: Choose “Zero-Admin”

tools that automate or minimize administrative tasks.

Page 22: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Design Decision #3Design Decision #3

Reality: Extensive software

distribution, version control, and cost issues with agent-based tools.

Decision: Choose “Agent-Less” tools

for common metrics (collect with SNMP, WMI, syslog).

Page 23: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Design Decision #4Design Decision #4

Reality: Need a consolidated

“single-pane-of-glass” view of performance and service level statistics.

Decision: Choose “Web-Based”

tools that offer security & customization per user.

Page 24: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Design Decision #5Design Decision #5

Decision: Centralize to provide

a single control point for security, event monitoring, administration, and report generation.

Page 25: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Fault Management Layer:

HP OpenView NNMAdjusts to network configuration changes.Provides up/down status on connected devices.Does “root cause” correlation for events.Ability to define metrics for SNMP collection

and database storage.Serves as SNMP trap destination for

processing application-level events.

Constructing The System (part 1)Constructing The System (part 1)

Page 26: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Constructing The System (part 2)Constructing The System (part 2)

Fault Management Layer:

Magnum Technologies: COORDINATOR

Provides “root cause” correlation for events.Updates its correlation engine when the

OpenView topology changes.Contains an External Command Processor for

parsing event messages, automatically opening trouble tickets, and sending notifications.

Page 27: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Constructing The System (part 3)Constructing The System (part 3)

Performance Management Layer:

Magnum Technologies: CAPTRENDContains internal SNMP & WMI polling

engines to collect basic performance metrics.Stores data for ad hoc reporting; generates

several canned graphical reports.Ability to create performance thresholds that

generate exception events for notification.

Page 28: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Constructing The System (part 4)Constructing The System (part 4)

Performance Management Layer:

BMC Software: PatrolMonitors application metrics at a detailed

level.Ability to generate SNMP traps for

application events which are sent to OpenView and COORDINATOR for processing.

Page 29: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Constructing The System (part 5)Constructing The System (part 5)

Performance Management Layer:

Empirix: eMonitor & OneSightGenerates web-based customer-oriented

transactions (including https authentication).Ability to generate SNMP traps for response

time threshold violations that are sent to OpenView and COORDINATOR for processing.

Page 30: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Still-to-be-AccomplishedStill-to-be-Accomplished

Integration of tools at theInformation Management layer.

Automated reporting from existing agent-based tools at the Performance Management layer.

Tools to correlate technology components and define policies at the Service Level Policy layer.

Page 31: Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Steve Lewis J.D. Edwards & Company Steve Lewis

Lessons LearnedLessons Learned

It always costs more MONEY and takes more TIME than expected.

It is always more difficult than expected to INTEGRATE diverse tools.

Key Success Factors: Management Commitment Business Process Improvement Customer Care Strategy Organizational Flexibility