why @loggly loves apache kafka, and how we use its unbreakable messaging for better log management

19
| Log management as a service Simplify Log Management Apache Storm Why Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management Infrastructure Engineering Team June 2014

Upload: loggly

Post on 22-Apr-2015

1.745 views

Category:

Technology


2 download

DESCRIPTION

Agenda for this Presentation • The challenges of Log Management at scale • Overview of Loggly’s processing pipeline • Alternative technologies considered • Why we love Apache Kafka • How Kafka has added flexibility to our pipeline  The Challenges of Log Management at Scale • Big data – >750 billion events logged to date – Sustained bursts of 100,000+ events per second – Data space measured in petabytes • Need for high fault tolerance • Near real-time indexing requirements • Time-series index management

TRANSCRIPT

Page 1: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Apache Storm

Why Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management Infrastructure Engineering Team June 2014

Page 2: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

World’s most popular cloud-based log management service

§  More than 5,000 customers §  Near real-time indexing of events

Distributed architecture, built on AWS

Initial production services in 2011 §  Loggly Generation 2 released in Sept 2013

What Loggly Does

Page 3: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Centralized logging and archival

§  Real-time processing, analysis and visualization

§  Monitoring, alerting and troubleshooting

Loggly: Addressing the first big data problem every company faces

Page 4: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  The challenges of Log Management at scale

§  Overview of Loggly’s processing pipeline

§  Alternative technologies considered

§  Why we love Apache Kafka §  How Kafka has added flexibility to our pipeline

Agenda for this Presentation

Page 5: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Big data –  >750 billion events logged to

date –  Sustained bursts of 100,000+

events per second –  Data space measured in

petabytes

§  Need for high fault tolerance §  Near real-time indexing

requirements §  Time-series index

management

The Challenges of Log Management at Scale

Page 6: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Load Balancing

Kafka Stage

2

Loggly Custom Module

Log Management Processing Pipeline: Overview

Page 7: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Load Balancing

Kafka Stage

2

Loggly Custom Module

Collectors Can Easily Outpace Downstream Processes

§  Written in C++ §  Designed to ingest

massive data volumes §  Need to collect

regardless of what’s happening downstream

Page 8: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Load Balancing

Kafka Stage

2

Loggly Custom Module

Solution: Queue That’s External to Collector

§  Based on Apache Kafka

§  Highly performant and reliable

Page 9: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Internal buffering in collectors –  Added complexity

§  Cassandra –  Not as good a queue as Kafka

§  Apache Storm –  In initial Gen2 architecture, removed after launch

Alternate/ Supplementary Approaches Considered

Page 10: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Results: §  Can process sustained rates of

100,000+ events per second per cluster §  Average message 300 bytes

The Secret to Log Management at Scale: Keep It Simple, Stupid

Page 11: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Why We Love Kafka

Page 12: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

What Attracted Us in the First Place

No single point of failure

•  Terabytes  of data move through our Kafka cluster every day without losing a single event

•  We use age-based retention to purge old data on disks Low latency •  99.99999% of the time our data is coming from disk

cache and RAM; only very rarely do we hit disk Performance •  Crazy good!

•  We currently have a bunch of Kafka brokers running on m2.xlarge instances backed by provisioned IOPS.

•  One of consumer group (eight threads) which maps a log to a customer can process about 200,000 events per second draining from 192 partitions spread across three brokers

Scalability •  Ability to increase partition count per topic and downstream consumer threads provides flexibility to increase throughput when desired

Page 13: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

How Our Kafka Crush Has Deepened

Distributed log collection

•  Local pods and collectors spread all over the Internet with local Kafka deployments to collect data from customers located all over world

•  Can collect logs even when we lose connectivity •  When network comes back, Kafka sends the logs

downstream to the rest of the pipeline More efficient, effective DevOps

•  Deploying Kafka throughout pipeline makes it easy to disable certain parts of system (for troubleshooting or upgrades)

•  No worrying that we will lose customer data •  Example: Add support for new log type into our

automatic parsing capabilities by turning off existing parser, deploying new one, and processing logs that Kafka has queued up

Controlling resource utilization

•  Keep collectors as simple as possible for resilience and reliability reasons

•  Add intelligence into our pipelines using Kafka

Page 14: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Resource Utilization Example: “Noisy Neighbors”

Page 15: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Sending many times their “normal” level of logging volume, inadvertently or because their application is in big trouble

§  Routing logs to separate queue minimizes impact on other customers

“Noisy Neighbors” are Inherent to SaaS

Page 16: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Because Kafka topics are very cheap from a performance and overhead standpoint, we can create as many queues as we want §  Scaled to the performance we want §  Optimizing resource utilization across the system

§  Because they can be created dynamically, we can make business rules very flexible

§  Makes us confident that pipeline will scale as customer data volumes do

Kafka Queues Add Flexibility to Loggly Pipeline

Page 17: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Kafka deployment working without us thinking about it

§  Plenty of other things to do to keep our position as the world’s most popular cloud-based log management service!

Conclusion: Kafka Frees Our Development Team to Build Differentiating Features

Page 18: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Does Log Management Sound Hard? It Should!

About Us: Loggly is the world’s most popular cloud-based log management solution, used by more than 5,000 happy customers to effortlessly spot problems in real-time, easily pinpoint root causes and resolve issues faster to ensure application success.

Let us do the heavy lifting for you!

Visit us at loggly.com or follow @loggly on Twitter.

Try Loggly FREE for 30 days

Page 19: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Did you like this presentation?

Head over to our blog for more great content!

Take me to the Loggly Blog