elk wrestling (leeds devops)

34
{ title: ‘ELK Wrestling’, author: ‘Steve Elliott’, company: ‘LateRooms.com’, type: ‘DevOpsLeeds, @timestamp: ‘2014-10-13T18:30Z’ }

Upload: steve-elliott

Post on 26-May-2015

362 views

Category:

Technology


2 download

DESCRIPTION

Talk I did on log aggregation with the ELK stack at Leeds DevOps. Covers how we process over 800,000 logs per hour at laterooms, and the cultural changes this has helped drive.

TRANSCRIPT

Page 1: ELK Wrestling (Leeds DevOps)

{title: ‘ELK Wrestling’,author: ‘Steve Elliott’,company: ‘LateRooms.com’,type: ‘DevOpsLeeds,@timestamp: ‘2014-10-13T18:30Z’

}

Page 2: ELK Wrestling (Leeds DevOps)

Featuring Live Demo!

Please tweet!Include: “leedsdevops”

Page 3: ELK Wrestling (Leeds DevOps)

Home growing a metrics cultureNeeded visibility of live issuesHad trialled off the shelf before (Splunk)Hadn’t gained tractionWanted the data still

Page 4: ELK Wrestling (Leeds DevOps)

Options...

Tried Splunk

...Bit pricey, pay for HW and volume of data indexed

Looked at cloud based options, were also expensive

Page 5: ELK Wrestling (Leeds DevOps)

It started with Badger...

Page 6: ELK Wrestling (Leeds DevOps)

Logging and Monitoring Project

Locate and implement the tools we neededStarted with Cube for metrics (wouldn’t recommend)Moved onto Logging

Page 7: ELK Wrestling (Leeds DevOps)

Current tooling...

...Lacking

Page 8: ELK Wrestling (Leeds DevOps)

“But it works”

Page 9: ELK Wrestling (Leeds DevOps)

What can we log?Pretty much anything with a timestamp

Error logWeb logsProxy logsReleases?Tweets?

Page 10: ELK Wrestling (Leeds DevOps)

Logstash

ELK

Page 11: ELK Wrestling (Leeds DevOps)

High level architectural design

Web servers

QueueElasticsearch

Dashboards

Rest of Badger

Page 12: ELK Wrestling (Leeds DevOps)

Real time search and analytics database

Page 13: ELK Wrestling (Leeds DevOps)

Who’s using it?

...Clever people

Certain other hotel website...

Page 14: ELK Wrestling (Leeds DevOps)

Working with Elasticsearch

● RESTful API● JSON● Many libraries to deal with it (new on

ElasticLinq for C#)

Page 15: ELK Wrestling (Leeds DevOps)

Sense Chrome Extension

Page 16: ELK Wrestling (Leeds DevOps)

Clustering

Excellent distributed featuresEasy to useNode Self discoveryDifferent Node Types

(Data, Master, Search, Client)

“Live”SSD

“Archive”HDD

Page 17: ELK Wrestling (Leeds DevOps)

More in depth architecture

IISLogs

Errors

WMI

Collector(e.g. Live Server)

Queue Forwarder

Cube (/TSDB)

Search Analytics

Rabbit MQFilter & Forward

Page 18: ELK Wrestling (Leeds DevOps)

Logstash

Inputs

Filters

Outputs

e.g.HTTP logs, UDP, error logs, tweets.

e.g. UDP, elasticsearch, graphite, IRC

(e.g. Filter, grok, lookup IP, magic…)

Page 19: ELK Wrestling (Leeds DevOps)

Why the Queue?

● Resiliancy● Single source of data for everyone● Logstash used to recommend RabbitMQ,

now they recommend Redis● We still use RabbitMQ, works for us

Page 20: ELK Wrestling (Leeds DevOps)

Kibana

● Easy to build dashboards● Gateway drug to ElasticSearch queries● Examples!

Page 21: ELK Wrestling (Leeds DevOps)
Page 22: ELK Wrestling (Leeds DevOps)
Page 23: ELK Wrestling (Leeds DevOps)
Page 24: ELK Wrestling (Leeds DevOps)

But...

Page 25: ELK Wrestling (Leeds DevOps)

Demo

Page 26: ELK Wrestling (Leeds DevOps)

Mistake: Dashboard Fatigue

Too many dashboards to watch!Need to do more on alerting

Page 27: ELK Wrestling (Leeds DevOps)

Mistake: Using elasticsearch as a TSDB

Lots of graphs just cared about top level values, should use a TSDB (such as graphite) instead

Elasticsearch use case for more in-depth data analysis

Page 28: ELK Wrestling (Leeds DevOps)

Mistake: Trying to keep too much data

● Nodes going out of memory or disk space is bad

● Long GC can cause nodes to drop● Can lead to split brain● More shards = more memory ● usage, watch your scaling

Page 29: ELK Wrestling (Leeds DevOps)

Scaling

Hit two bottlenecks- Ingestion (solved with SSDs)- Search (solved by scaling horizontally)1.4.0 brings stability improvements, should handle oom better

Page 30: ELK Wrestling (Leeds DevOps)

Other MistakesShould have automated sooner(Good chef/puppet support)

Should have used “normal” logstash more

More node

More awesome??

Page 31: ELK Wrestling (Leeds DevOps)

What went right?

● Free and easy access to Data● Doesn’t need to be on elasticsearch, but the

tooling makes it easy● Give people access and they’ll seek out the

data to drive decisions - start the feedback loop

● Dev/Test instance

Page 32: ELK Wrestling (Leeds DevOps)

ELK in the wild

Data Driven QA

Data Driven...Managering

Page 33: ELK Wrestling (Leeds DevOps)

But wait, theres more!

Curator, Kibana 4 (Woo - aggregations), alerting, linkinglogs together…

Too much to cover here!

Page 34: ELK Wrestling (Leeds DevOps)

Thanks for Listening!

More: elasticsearch.org, logstash.net Blog: www.tegud.netTwitter: @tegudGithub: www.github.com/tegud

Come say hi!