how to move from monitoring to observability · aws (archive) azure (application 1) vms database vm...

28
© 2018 SPLUNK INC. © 2018 SPLUNK INC. How to Move From Monitoring to Observability Observability: the disingenuous rebranding of monitoring? Dr. Siyka Andreeva | IT Operations Analytics Specialist April 2019

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.© 2018 SPLUNK INC.

How to Move From Monitoring

to ObservabilityObservability: the disingenuous rebranding of monitoring?

Dr. Siyka Andreeva | IT Operations Analytics Specialist

April 2019

Page 2: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Forward Looking Statements

During the course of this presentation, we may make forward-looking statements regarding future events or

the expected performance of the company. We caution you that such statements reflect our current

expectations and estimates based on factors currently known to us and that actual events or results could

differ materially. For important factors that may cause actual results to differ from those contained in our

forward-looking statements, please review our filings with the SEC.

The forward-looking statements made in this presentation are being made as of the time and date of its live

presentation. If reviewed after its live presentation, this presentation may not contain current or accurate

information. We do not assume any obligation to update any forward-looking statements we may make. In

addition, any information about our roadmap outlines our general product direction and is subject to change

at any time without notice. It is for informational purposes only and shall not be incorporated into any contract

or other commitment. Splunk undertakes no obligation either to develop the features or functionality

described or to include any such feature or functionality in a future release.

Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other

brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.

Page 3: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Agenda

What is observability ? And how it differs from monitoring?

Why is observability even a bigger challenge in a multi-cloud and containerized world?

How Splunk can help?

Page 4: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

What is Observability?

the disingenuous rebranding of monitoring ?

monitoring on steroids?

DevOpsifying monitoring?

Page 5: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Observability…the word starts spreadingbecause failure is shifting to application code and in production system behavior

Page 6: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Why the word starts spreading ?

IT Operations monitoring challenges are getting worse in a distributed world:• IT teams know that something is not working -- but not exactly why it’s not working

• Repetitive, manual processes for reactive troubleshooting

• Inability to get to root cause quickly

• Siloed analysis of logs, traces, and metrics

Management Expectations:• Avoid financial impact from fewer system outages

• Accelerate investigation of application performance and system incidents with real-time log and metric analysis

• Consolidate operational tools and/or external services into one observability tool

• Improve collaboration across teams with targeted alerting and tailored visualization increases collaboration across teams

Same for Dev teams:• Gap between perception and the reality

• Dev teams spending too much time observing the dev and pre prod environment instead of prod

Page 7: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Why observability (in IT) ?

Source Wikipedia

Survivorship bias or survival bias is the logical error of concentrating on the people or things that made it

past some selection process and overlooking those that did not, typically because of their lack of visibility. This

can lead to false conclusions in several different ways.

Shot down aircraft don’t

externalize their state

Page 8: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

What is observability?Ask 5 IT experts and you might get 9 different answers

Source Wikipedia

Observability background

“In control theory, observability is a measure of

how well internal states of a system can be inferred

from knowledge of its external outputs. The

observability and controllability of a system are

mathematical duals. ”

Page 9: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

What is observability ? (in IT – not only for serverless or DevOps)

“Focus on what you can’t see, the unknowns.

If the root cause of a failure stays invisible (the bullet

holes) your IT-plane will be shot down again”

Page 10: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

From monitoring to Observabilitywhat’s the difference?

MONITORING OBSERVABILITY

Mostly from Ops DevOps/SRE

It is A verbsomething you do to determine the state of an application a

system, a service…detect problems and anomalies, find the

root cause of problems &gain insights into performance

A noun, a thing you have – a property of a system

Tells you IF the system works

WHY

the system is not working as expected

Use it when You are looking for the overall health of systems, when

the system can externalize their state.

Monitoring falls short if a system or an app don’t adequality

externalize their state (granular insights into the behaviour

of systems along with rich context)

Page 11: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

From monitoring to the three (only three?) pillars of Observability

Inspired from © @copyconstruct

Symptoms (what’s broken?)

Mo

nito

rin

g

Alerting

Service health Overview

Investigation

All

the

tim

e

Pa

ssiv

e

Ops

Causes (why?)

Debugging

Profiling (system behavior)

Dependency analysis (distributed systems tracing infrastructure)O

bse

rva

bili

ty

On

th

e fly

Re

active

De

v

Events ProfilesPillar A

Pillar B

Pillar C

Pillar D

LOGS METRICS TRACES

Page 12: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Why is that important in a multi-cloud environment?2019 trends

Business Logic

Monolithic

Architecture

Billing

Driver mgntUser mgnt

PaymentNotification

User

API

Driver

Trip mgnt

Microservices Architecture

User

API Gateway

Driver

Container

User mgnt

Container

Billing

Container

Notification

Container

Payment

Container

Driver mgnt

Container

Trip mgnt

Microservices

Business Intelligence

Legacy systems

Frontend

Storage

Compute

Security

?

Multi-Cloud

Hardware

OS

Libraries

App.

Bare metal

Hardware

Hypervisor

OS

Lib

App

OS

Lib

App

OS

Lib

App

Virtual

Machines

Hardware

OS

Container Mgr

Lib

App

Lib

App

Lib

App

Containers

Lib

App

Lib

App

Lib

App

Hardware

OS

Libraries

App Mgr

App AppApp

Serverless

(functions)

App AppApp

App AppApp

App AppApp

App AppApp

Containers / Kubernetes / Serverless

Observability in the distributed (and ephemeral)

systems/cloud space is non-negotiable

Distributed location / responsibilities Distributed systems/code

Page 13: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Complexity Is Everywhere

ON PREMISES

Legacy systems

(Mainframe…)

Facilities

Dev/PreProd

Storage

Backup

Archive

DR

Security

VMs

Containers Micro

services

AWS (Application 1)Access / Security

Database

StorageDev

Compute

Containers

GCP(Big Data project 1)

DataflowApp engine

AWS (Archive)

Azure (Application 1)

VMs

Database

VM sets

Traffic mger

SAAS

EVENTS

LOGS &

REPORTS

Elastic Load Balancing

Access LogsAmazon CloudFront

Access logsAmazon CloudTrail logs

Billing Reports

Application Logs Application S3

access Logs

Other service logs AWS configs

snapshots & history

files

METRICS

EMR

ClusterAuto

Scaling

EVENTS

LOGS

RULES/EVENTS

Events

Logs

Push path (via Splunk HEC)

Your IT team

Page 14: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Customer experience???

SAAS

What happens when we stack them? How does this

apply to you and your Ops teams?

ON PREMISES

Legacy systems

(Mainframe…)

Facilities

Dev/PreProd

Storage

Backup

Archive

DR

Security

VMs

Containers Micro

services

AWS (Application 1)Access / Security

Database

StorageDev

Compute

Containers

App engine

GCP(Big Data project 1)

Dataflow

AWS (Archive) Azure (Application 1)

VMs

Database

VM sets

Traffic mger

Page 15: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Customer experience???

SAAS

The consequence: only green lights in the war room

ON PREMISES

Legacy systems

(Mainframe…)

Facilities

Dev/PreProd

Storage

Backup

Archive

DR

Security

VMs

Containers Micro

services

AWS (Application 1)Access / Security

Database

StorageDev

Compute

Containers

App engine

GCP(Big Data project 1)

Dataflow

AWS (Archive) Azure (Application 1)

VMs

Database

VM sets

Traffic mger

Cx

O

BLO

SAAS

CISODevSysAdmin

MKT

??

?

? ?

Page 16: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Splunk for IT Operations

How do we help with Observability everywhere?

Page 17: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

A market leader

ITOM IT Operations Management

Tools to manage provisioning, capacity,

performance and availability of IT

OBSERVE

ITOA IT operations analytics

DECIDE

Practice of monitoring systems, and

gathering, processing, analyzing &

interpreting data from ITOps sources to

guide decisions & predict issues

AIOps

ACCELERATE

AIOps platforms enhance IT operations

through greater insights by combining

big data, machine learning and

visualization.

SIEM

PROTECT

security event information management)

#1

#2#1

SECURITY IT OPERATIONS

Sources: IDC and/or Gartner

#2

Page 18: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

We reached the limits of the traditional approach

Traditional Data Types

Not future proof

Complex

Never Change!

Untapped IT-generated

machine data (logs, metrics, wired data…)

Machine data is messy and unpredictable

Requires massive scale

You don’t always know which questions to ask

80%

Page 19: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

No

t co

nsu

mab

le b

y h

um

an

sC

on

su

mab

le b

y h

um

an

s

Industry Leading Platform For Machine Data

Online

ServicesNetworks

Security

Call Detail

Records

Web

Services

Telecoms

Web

Clickstreams

Tracing

Online

Shopping Cart

Smartphones

and Devices

Custom

Applications

Energy Meters

Storage

Public

Cloud Private

Cloud

Containers

On-Premises

ServersGPS

LocationRFID

Packaged

ApplicationsDatabases MessagingFirewall

Logs Wired DB Mobile IoT APIMetrics

DATA

Any Amount

Any Location

Any Source

No need to “adapt or

structure” the data

No database

No need to filter data

SPLUNKBASE 1600+ Free Apps/add-ons

SPLUNK PLATFORM Custom dashboards

Report & analyze

Monitor and alert

Developer Platform

Ad hoc search

On-prem or cloud

PREMIUM APPS “data scientist in a box”

IT Ops, DevOps Security Business Analytics, IoT

Different people asking different questions on the same data, in real time

3rd Party

Phantom Orchestration

VictorOps Collaboration

CMDB, SNOW…

Data lake

APM

Traces

APM

Tracing

Page 20: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Structure Machine data = fighting a losing battle

Page 21: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Page 22: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

How to find a needle in multiple haystacks?(choose your tool)

Network?

Database?

Middleware?

Hardware?

Wrong

command?

Connection?

Apache?VM?

Mainframe?

Load

balancer?Wrong code

released?

Collect ALL data• Collect from all silos

• Data in original raw format

• Add open sources apps to

ingest data on the fly

• Schema on the fly

• Dynamic thresholding

• Realtime correlation

Clustering & aggregation• Real time event

clustering/correlation

• Reduce alert noise

• Behavioural analytics

• Deduplication

Add context• Measure / report on

indicators that matters

• Add service / business

context

• Add actionable

information to detection

Salessso

Claims

Anomaly detection• Catch issues that thresholds

cannot

• Reduce event clutter

• Deviation from past

behaviour

• Deviation from peers

• Unusual change in features

Assisted deep dive

investigation• Root cause analysis

• Powerful & easy to use

search & investigate

language

?

Predictive

Analytics• Predict service health

• Predict events

• Trend forecasting

• Detect influencing

entities

• Early warning of

failure

70% to 90%Reduction in investigation time

15% to 45%Reduction in high priority incidents

67% to 82%Reduction in business

impact

Page 23: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Unknow

nK

now

n

Aw

are

ness/

Data

Availa

ble

Knowns Unknowns

Understanding

Observability with Splunk

Known Knowns

(Known problem & solution)

Unknown Knowns

(didn’t realize but clear solution)

Known Unknowns

(we see the problem, not the solution)

Unknown Unknowns

(no idea it’ll happen)

Improve the Known-

Knowns

Dynamic thresholding,

automation, schema on fly,

real time dashboards…

Provide auto correlations, real

time search’s, analytics,

business process mining…

See the Known-

Unknowns

Discover the

Unknown-Knowns Anomaly detection, predictive IT… Ingest any data, ask any question,

get answers in real time…

Explore the

Unknown-Unknowns

Page 24: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Easy to read

Interactive,

any question answered

Real-time information

Easily correlated with relational

and reference data

Splunk Dashboards: Complexity made simple

1010101101010

1010110111001

0101010100101

0110100101001

1101010101101

0101010011101

01010110111

Different people looking at different

dashboards on the same data, in real time

Page 25: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

It’s a journey

Search & Monitor(Any) Data collection

Real time monitoring/observability

Centralized Machine Data Search

Business Insights

Business KPIs

Insights to drive experienceOperational visibility

Service Oriented View

Root Cause analysis

Stabilize IT

Predict & Improve

Predict issues

Recommend actions based on prior behaviors

Increase MTBF

Page 26: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

Answer new questions, find new unknownsObserve | Monitor | Analyze | Act

Page 27: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.

What to Do Today!

1. Download Splunk on Splunk.com or Try Free

for 15 Days on AWS Marketplace!

2. Download our Beginner’s guide to Observability

on Splunk.com

Good sources to watch

• Philipp Krenn presentation at DevOps Barcelona 2018• A great post by Cindy Sridharan on Monitoring & Observability.

• How to build observability into Serveless (O’reilly velocity 2018)

• Observability for the real world (by Andi Mann / datamation.com)

• The present and future of Serverless Observability (slideshare- Serverless Computing London, Yan Cui)

• Monitoring unknown unknows by Guy Fighel

Page 28: How to Move From Monitoring to Observability · AWS (Archive) Azure (Application 1) VMs Database VM sets Traffic mger SAAS EVENTS LOGS & REPORTS Elastic Load Balancing Access Logs

© 2018 SPLUNK INC.© 2018 SPLUNK INC.

Thank you