elk for sysadmins

27
ELK Not just for InfoSec any more

Upload: tanner-lund

Post on 10-Aug-2015

72 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Elk for Sysadmins

ELKNot just for InfoSec any more

Page 2: Elk for Sysadmins

Who are we?

Russel Havens● Monitoring Architect for

Adobe Digital Marketing Business Unit

● Adjunct Professor, BYU● 25 years in IT Operations, 2

years consulting, 5 years in software development

Hayden Panike● Former log analytics

undergraduate researcher● Storage Engineer

Tanner Lund● Former log analytics

undergraduate researcher● Service Engineer

Page 3: Elk for Sysadmins

ELK

Page 4: Elk for Sysadmins

● As an developer, I write logs● As an operations engineer, I live in the

log files while troubleshooting● However: in most organizations where

I’ve worked, formal log aggregation and analysis is owned by InfoSec, and access to those were limited (Splunk or Syslog)

ELK background and approach

Page 5: Elk for Sysadmins

● Monitoring is a broad area○ Up/Down○ Historical trending○ Log Analysis

● Using ELK for managing logs of 60+ Nagios servers, web servers, etc.

● Worked with a BYU Capstone team 2013-2014

ELK approach

Page 6: Elk for Sysadmins

Hayden & Tanner- Took over log management project

from previous team- Expanded partnership with BYU OIT to

gather logs from servers, SANs, and network equipment (including Wifi hotspots)

- Expanded cluster size massively

Our ELK History

Page 7: Elk for Sysadmins

We do what we must...

Page 8: Elk for Sysadmins

Partnership with BYU OITProduction ELK Deployment

Every 60 seconds at BYU we ingest event logs:*-2,431 network-430,000 IDS-59,000 wireless

62,300,000 total a day** This represents peak numbers.

Page 9: Elk for Sysadmins

Simplest Architecture

Page 10: Elk for Sysadmins

Common Architecture

Page 11: Elk for Sysadmins

Highly Available Enterprise Architecture

Page 12: Elk for Sysadmins

This process ought to be formalizedProactive monitoring

Logs and Enterprise Monitoring

Page 13: Elk for Sysadmins

Monitoring as a DisciplineSOURCES OF DATA-SNMP -stdout-/proc *stat-LogsSYSTEM PIECES-Collector agents-Aggregator nodes-Analysis platform

PRINCIPLES-Aggregation-Cause Analysis (reactive)-Behavior Analysis (proactive)

Page 14: Elk for Sysadmins

Queries: Simple Yet Elegant● As Operation

Engineers, we already know lots of keywords. These can be easily leveraged for simple queries.

● error● failure● root● port flapping● memory

Page 15: Elk for Sysadmins

Simple Yet Elegant● A simple search for the word “memory” over a 30 day

period.

Page 16: Elk for Sysadmins

Simple search● Nagios 4 can be set for a maximum number of concurrent processes. Here we

see 2 overloaded monitoring servers.

Page 17: Elk for Sysadmins

Simple Yet Elegant

Page 18: Elk for Sysadmins

Simple Yet Elegant

Page 19: Elk for Sysadmins

Simple Bin and Count

Page 20: Elk for Sysadmins

Simple Bin and Count

Page 21: Elk for Sysadmins

Simple Bin and Count

● Campus wide phone OS stats

Page 22: Elk for Sysadmins

Simple Bin and CountBusiness School

Administrative Building

Page 23: Elk for Sysadmins

Simple Bin and Count

Page 24: Elk for Sysadmins

Simple Bin and Count

Page 25: Elk for Sysadmins

Simple Bin and Count

Page 26: Elk for Sysadmins

Simple Bin and Count

Page 27: Elk for Sysadmins

-We are at the tip of a transformative iceberg-Machine Learning and Statisticians needed

A Call to Arms!