it service monitoring · 2019-12-03 · machine learning-powered, analytics-driven it operations....

IT Service Monitoring

Go Jackets!

“Just because we built it, doesn’t mean they will come!”

Yes, but…

“Just because we built it, doesn’t mean they will come!”

How do you know?

How do you measure?

So, why are/aren’t they coming?

OK, what can you do?

Escalating IT Complexity…

STORAGE NETWORKING

VIRTUALIZATION

PACKAGED APPLICATIONS

CUSTOM APPLICATIONS

Web Server

Email Databases

Mission

SERVERS

INFRASTRUCTUREAPPLICATIONS

Identity

IP Phone SaaS/PaaS

Analytics

App Server

… Plaguing IT Operations

STORAGE NETWORKING

VIRTUALIZATION

PACKAGED APPLICATIONS

CUSTOM APPLICATIONS

Finance

App Svr

Web Svr

SERVERS

INFRASTRUCTUREAPPLICATIONS

Identity

IP Phone SaaS/PaaS

Complex, silo-based technologies

Disconnected and outdated point solutions

Reactive brute-force problem resolution

Over 80% of time on maintaining, not innovating

IT Stack POV• This is the way many in ‘IT’ think

of their ‘world’• Each layer is a ‘silo’• A dedicated team of experts

(with their domain tools) focus just on the health of that particular layer

• Their view of the ‘health’ of that layer is based on the aggregated ‘health’ of each component in the layer

• If 2 out of 100 DBs are struggling, you are still having a good day

Physical Server

Guest OS (Windows/Linux/*Nix)

Database

Hypervisor

Web Server

App Server

Applications, business/mission services

SAN/NAS Storage

Network

• The aggregated health of the layer is irrelevant.

• Dependencies now matter• The ‘health’ of the app depends

greatly on the health of each component of each layer that that app depends upon.

• If your app depends on one or more of those two (2) ‘struggling’ DB servers, you are about to have a ‘bad’ day!

• What about those VM’s that are ‘yellow’?

Physical Server (1,2,3,4,5,6,7,8,9,10…N)

Guest OS (1,2,3,4,5,6,7,8,9,10…N)

Database (1,2,3,4,5,6,7,8,9,10…100)

VM/Hypervisor (1,2,3,4,5,6,7,8,9,10…N)

Web Server (1,2,3,4,5,6,7,8,9,10…N)

App Server (1,2,3,4,5,6,7,8,9,10…N)

Service/App Claims

SAN/NAS Storage (1,2,3,4,5,6,7,8,9,10…N)

Network

Status

Service/App POV Outage!

Current State of IT Troubleshooting

CHALLENGES

Sprawl of multiple monitoring point solutions

No proactive indication of root cause

Repeated escalations and War Rooms

Rapid pace of change

Limited visibility

PAIN POINTSCONCERNED

Complexity, teams operating in silos, massive infrastructure

AGGRAVATEDLong resolution times,

unhappy users

STRESSEDResource drain and missed deadlines

ANXIOUSMisconfigured tools,

gaps in coverage

UNEASYBlindsided by issues

Servers

Networks

GPS Locatio

Packaged Applications

CustomApplications

Desktops

Storage

Databases

Web Service

Online Service

Security

Transactions

Databases

Networks

Databases

Networks

Databases

Transactions

OPERATIONAL VISIBILITY TOOL

Servers

Networks

GPS Locatio

CustomApplications

Desktops

Storage

Databases

Web Service

Online Service

Security

Transactions

Databases

Networks

Databases

Networks

Databases

Transactions

Servers

Networks

GPS Locatio

CustomApplications

Desktops

Storage

Databases

Web Service

Online Service

Security

Transactions

Databases

Networks

Databases

Networks

Databases

Transactions

WAR ROOM

Networks

Online Service

Servers

Networks

GPS Locatio

CustomApplications

Desktops

Storage

Databases

Web Service

Online Service

Security

Transactions

Databases

Networks

Databases

Networks

Databases

Transactions

THIS IS NOT A LEAN

APPROACH!

WAR ROOM

What Industry Wants To Do About It

Efficient use of people resources - lean

Reduce tool complexity and costs

Become more proactive

Reduce negative organizational impact

One platform and fewer tool administrators

Required Capabilities

Instantly analyze and correlate raw data, machine learning

Accurate indication of root causes to reduce

escalations and eliminate War Rooms

Visibility across all functional areas shared

by everyone

Rethinking and Improving How IT Operates

Traditional IT Data-Driven IT

• Structured data• Brittle tools and integrations• Obsession with “faults” and “traps”• Focus on components parts• Search oriented

• Structured and unstructured data• Robust data integrations• Real-time insights from big data• Focus on the whole service• Machine learning-driven analytics

Machine learning-powered analytics for real-time service insights, simplified operations and root-cause isolation

IT Service Monitoring

What Is Service Monitoring?

Enabling a organizationally-aware ITMeasuring and reporting on indicators that matter

Unlocking operational efficienciesCollaborating across silos to improve service operations

Data-based decision makingSolving problems and anticipating pitfalls with sophisticated analytics and powerful insights

IT Service MonitoringMachine Learning-Powered, Analytics-Driven IT Operations

Simplify service operations

Prioritize incidents with context Redefine the role of IT

Combine events & metrics across silos with ease, flexibility & scale in days

Unify siloed monitoring

Leverage machine learning to detect anomalies & highlight

events that matter

Deliver business & service context to prioritize incident investigation & action

Support decisions & communicate results with powerful service-level insights

A Different Approach

Servers

Networks

GPS Location

CustomApplications

DesktopsStorage

Databases

Web Services

Online Services

Security ROOTCAUSES

Personalized Visualizations of Your Services

• Visualize personalized inter-relationships across service delivery components

• Illustrate business and service activity using indicators aligned with strategic goals

• Drive decisions by monitoring service health against performance indicators

• Create sophisticated dashboards in minutes

Organized View of Performance Indicators

• Organize and correlate KPIs to speed up investigations and diagnosis

• Compare performance over time and in real time to understand trends and identify systemic issues

• Enable broad and deep investigation with contextual drill-downs

Real-Time View of Service and KPI Health Scores

• Get early warning of emerging incidents with a heat map of service health and KPI scores, metrics, sparklines and alerts

• Drill down into service and entity details for in-depth triage

Insights Into the Origin of Service Disruptions

Profile an entity to troubleshoot outages and service degradations

Identify contributing services and entities of the worst performing KPIs

Integrate With Existing Incident Workflows

Automatically initiate defined incident and remediation responses

Integrate with industry-leading ticketing systems to accelerate triage

I think.I know.

Thank You!

it service monitoring · 2019-12-03 · machine learning-powered, analytics-driven it operations....

Documents

redefine ship in cloud

tata sky redefine - copy

prioritize, focus, evolve

the stakeholder model redefine

redefine memory

redefine prosperity!

agile operations keynote: redefine the role of it operations...

redefine conference 2013 booklet

redefine the game - empowerment...

red stampede redefine possible

redefine yourgoals

redefine - research articles summarization

corporate governance report - redefine

redefine your career

prioritize ncp

redefine life

redefine education

and strategic review - redefine

prioritize crash

redefine 2009 flyer & form