it service monitoring · 2019-12-03 · machine learning-powered, analytics-driven it operations....
Post on 21-May-2020
3 Views
Preview:
TRANSCRIPT
Copyright © 2017 Splunk Inc.
IT Service Monitoring
Go Jackets!
“Just because we built it, doesn’t mean they will come!”
Yes, but…
“Just because we built it, doesn’t mean they will come!”
How do you know?
How do you measure?
So, why are/aren’t they coming?
OK, what can you do?
Escalating IT Complexity…
STORAGE NETWORKING
VIRTUALIZATION
PACKAGED APPLICATIONS
CUSTOM APPLICATIONS
Web Server
Email Databases
Mission
SERVERS
INFRASTRUCTUREAPPLICATIONS
Identity
VPN
IP Phone SaaS/PaaS
IaaS
Analytics
App Server
… Plaguing IT Operations
STORAGE NETWORKING
VIRTUALIZATION
PACKAGED APPLICATIONS
CUSTOM APPLICATIONS
HR
Finance
App Svr
DB
Web Svr
SERVERS
INFRASTRUCTUREAPPLICATIONS
Identity
VPN
IP Phone SaaS/PaaS
IaaS
Complex, silo-based technologies
Disconnected and outdated point solutions
Reactive brute-force problem resolution
Over 80% of time on maintaining, not innovating
IT Stack POV• This is the way many in ‘IT’ think
of their ‘world’• Each layer is a ‘silo’• A dedicated team of experts
(with their domain tools) focus just on the health of that particular layer
• Their view of the ‘health’ of that layer is based on the aggregated ‘health’ of each component in the layer
• If 2 out of 100 DBs are struggling, you are still having a good day
7
Physical Server
Guest OS (Windows/Linux/*Nix)
Database
Hypervisor
Web Server
App Server
Applications, business/mission services
SAN/NAS Storage
Network
• The aggregated health of the layer is irrelevant.
• Dependencies now matter• The ‘health’ of the app depends
greatly on the health of each component of each layer that that app depends upon.
• If your app depends on one or more of those two (2) ‘struggling’ DB servers, you are about to have a ‘bad’ day!
• What about those VM’s that are ‘yellow’?
Physical Server (1,2,3,4,5,6,7,8,9,10…N)
Guest OS (1,2,3,4,5,6,7,8,9,10…N)
Database (1,2,3,4,5,6,7,8,9,10…100)
VM/Hypervisor (1,2,3,4,5,6,7,8,9,10…N)
Web Server (1,2,3,4,5,6,7,8,9,10…N)
App Server (1,2,3,4,5,6,7,8,9,10…N)
8
Service/App Claims
SAN/NAS Storage (1,2,3,4,5,6,7,8,9,10…N)
Network
Status
100%
100%
98%
100%
95%
100%
100%
100%
Service/App POV Outage!
Current State of IT Troubleshooting
9
CHALLENGES
Sprawl of multiple monitoring point solutions
No proactive indication of root cause
Repeated escalations and War Rooms
Rapid pace of change
Limited visibility
PAIN POINTSCONCERNED
Complexity, teams operating in silos, massive infrastructure
AGGRAVATEDLong resolution times,
unhappy users
STRESSEDResource drain and missed deadlines
ANXIOUSMisconfigured tools,
gaps in coverage
UNEASYBlindsided by issues
10
Servers
Networks
GPS Locatio
n
Packaged Applications
CustomApplications
Desktops
Storage
Databases
Web Service
s
Online Service
s
Security
Transactions
Databases
Networks
Databases
Networks
Databases
Transactions
OPERATIONAL VISIBILITY TOOL
11
Servers
Networks
GPS Locatio
n
Packaged Applications
CustomApplications
Desktops
Storage
Databases
Web Service
s
Online Service
s
Security
Transactions
Databases
Networks
Databases
Networks
Databases
Transactions
OPERATIONAL VISIBILITY TOOL
12
Servers
Networks
GPS Locatio
n
Packaged Applications
CustomApplications
Desktops
Storage
Databases
Web Service
s
Online Service
s
Security
Transactions
Databases
Networks
Databases
Networks
Databases
Transactions
OPERATIONAL VISIBILITY TOOL
WAR ROOM
13
Networks
Online Service
s
Servers
Networks
GPS Locatio
n
Packaged Applications
CustomApplications
Desktops
Storage
Databases
Web Service
s
Online Service
s
Security
Transactions
Databases
Networks
Databases
Networks
Databases
Transactions
OPERATIONAL VISIBILITY TOOL
THIS IS NOT A LEAN
APPROACH!
WAR ROOM
What Industry Wants To Do About It
14
Efficient use of people resources - lean
Reduce tool complexity and costs
Become more proactive
Reduce negative organizational impact
One platform and fewer tool administrators
Required Capabilities
Instantly analyze and correlate raw data, machine learning
Required Capabilities
Accurate indication of root causes to reduce
escalations and eliminate War Rooms
Required Capabilities
Visibility across all functional areas shared
by everyone
Rethinking and Improving How IT Operates
15
Traditional IT Data-Driven IT
• Structured data• Brittle tools and integrations• Obsession with “faults” and “traps”• Focus on components parts• Search oriented
• Structured and unstructured data• Robust data integrations• Real-time insights from big data• Focus on the whole service• Machine learning-driven analytics
Machine learning-powered analytics for real-time service insights, simplified operations and root-cause isolation
IT Service Monitoring
17
What Is Service Monitoring?
Enabling a organizationally-aware ITMeasuring and reporting on indicators that matter
Unlocking operational efficienciesCollaborating across silos to improve service operations
Data-based decision makingSolving problems and anticipating pitfalls with sophisticated analytics and powerful insights
IT Service MonitoringMachine Learning-Powered, Analytics-Driven IT Operations
Simplify service operations
Prioritize incidents with context Redefine the role of IT
Combine events & metrics across silos with ease, flexibility & scale in days
Unify siloed monitoring
Leverage machine learning to detect anomalies & highlight
events that matter
Deliver business & service context to prioritize incident investigation & action
Support decisions & communicate results with powerful service-level insights
A Different Approach
19
Servers
Networks
GPS Location
Packaged Applications
CustomApplications
DesktopsStorage
Databases
Web Services
Online Services
Security ROOTCAUSES
Personalized Visualizations of Your Services
• Visualize personalized inter-relationships across service delivery components
• Illustrate business and service activity using indicators aligned with strategic goals
• Drive decisions by monitoring service health against performance indicators
• Create sophisticated dashboards in minutes
20
24
Organized View of Performance Indicators
25
• Organize and correlate KPIs to speed up investigations and diagnosis
• Compare performance over time and in real time to understand trends and identify systemic issues
• Enable broad and deep investigation with contextual drill-downs
Real-Time View of Service and KPI Health Scores
• Get early warning of emerging incidents with a heat map of service health and KPI scores, metrics, sparklines and alerts
• Drill down into service and entity details for in-depth triage
26
Insights Into the Origin of Service Disruptions
27
Profile an entity to troubleshoot outages and service degradations
Identify contributing services and entities of the worst performing KPIs
Integrate With Existing Incident Workflows
28
Automatically initiate defined incident and remediation responses
Integrate with industry-leading ticketing systems to accelerate triage
I think.I know.
Thank You!
top related